Sofware to capture content
Hi everyone, I was wondering if any of you know about software to capture website content – specifically, to capture online news outlets (CNN, The Washington Post, The New York Times…) as well as blog-types news. We are about to engage in a research involving content coding these sites and were wondering if anybody has information on costs (any free out there?), ease of use, effectiveness in capturing content, time needed to capture content at a point in time, time needed to capture 24-hour content, and any other pertinent information that you may want to share. Thanks in advance to ya all! Eulàlia Puig Abril
Hi Eulàlia, You could try SocSciBot3 http://socscibot.wlv.ac.uk/ which has an application Cyclist that is used for text analysis. This is free, but it's a few years old and a bit buggy. Though it's well worth the investment and has good online tutorials. Though not all functions are covered. Also, you can try one of the many website sucker apps. I've been impressed by web pipe: http://www.crystalsoftware.com.au/webpipe.html It has a modest price, but comes with an expensive ad-on a data mining application. Though I'm not sure what it has to offer over other regular expression apps. There are many on the market, but you'll probably find the extra costs associated with the more expensive ones are worth it. I know SPSS has a bunch of web analytics apps, and I have a fuzzy memory about recent version of NUDIST being useful for web coding, though I haven't had a chance to test them out, so these are just some possible leads. I'd be very interested if you could let me know what you find in the end. Brian -------------------------------------------------------------------- Brian Cugelman, Webmaster, Information Services United Nations Climate Change Secretariat bcugelman@unfccc.int +49 228 815 1521 http://www.unfccc.int Eulalia Puig Abril <epabril@wisc.edu To > air-l@listserv.aoir.org Sent by: cc air-l-bounces@lis tserv.aoir.org Subject [Air-l] Sofware to capture content 01/03/2006 16:00 Please respond to air-l@listserv.ao ir.org; Please respond to epabril@wisc.edu Hi everyone, I was wondering if any of you know about software to capture website content – specifically, to capture online news outlets (CNN, The Washington Post, The New York Times…) as well as blog-types news. We are about to engage in a research involving content coding these sites and were wondering if anybody has information on costs (any free out there?), ease of use, effectiveness in capturing content, time needed to capture content at a point in time, time needed to capture 24-hour content, and any other pertinent information that you may want to share. Thanks in advance to ya all! Eulàlia Puig Abril _______________________________________________ The air-l@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org Join the Association of Internet Researchers: http://www.aoir.org/
For capturing site content, look into RSS which requires some programming skills and site ripper applications such as http://www.httrack.com (free). Most news sites have RSS feeds. I don't know how you plan to code your information but programs like Atlas.ti do have free trial versions. Charlie -----Original Message----- From: air-l-bounces@listserv.aoir.org [mailto:air-l-bounces@listserv.aoir.org] On Behalf Of Eulalia Puig Abril Sent: Wednesday, March 01, 2006 9:00 AM To: air-l@listserv.aoir.org Subject: [Air-l] Sofware to capture content Hi everyone, I was wondering if any of you know about software to capture website content specifically, to capture online news outlets (CNN, The Washington Post, The New York Times ) as well as blog-types news. We are about to engage in a research involving content coding these sites and were wondering if anybody has information on costs (any free out there?), ease of use, effectiveness in capturing content, time needed to capture content at a point in time, time needed to capture 24-hour content, and any other pertinent information that you may want to share. Thanks in advance to ya all! Eulàlia Puig Abril _______________________________________________ The air-l@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org Join the Association of Internet Researchers: http://www.aoir.org/
In the past I've used Teleport Pro. It's a good archiving tool. You can set it up to grab unique pages, to grab the top page and subsequent links off of the "top" page (so you can specify, I want NYTImes.com/index.html and all links off that page, plus all the links off those pages (three deep)). It grabs a page or one level deep of a site very quickly. However, it struggles with really big websites. It also doesn't always grab images (it depends on how the website's file system is set up), but it does usually grab the html formatting fairly well. Here's a link: http://www.tenmax.com/teleport/pro/home.htm. Best part is that Teleport Pro is very cheap: $40.00 or so. Good luck, ~Jenny
Hi everyone, I was wondering if any of you know about software to capture website content specifically, to capture online news outlets (CNN, The Washington Post, The New York Times ) as well as blog-types news. We are about to engage in a research involving content coding these sites and were wondering if anybody has information on costs (any free out there?), ease of use, effectiveness in capturing content, time needed to capture content at a point in time, time needed to capture 24-hour content, and any other pertinent information that you may want to share. Thanks in advance to ya all! Eulàlia Puig Abril _______________________________________________
-- Assistant Professor Department of Communication, SS 340 University at Albany, SUNY 1400 Washington Ave. Albany, NY 12222 518-442-4873 jstromer@albany.edu http://www.albany.edu/~jstromer
I would like to recommend a piece of software called WebSite-Watcher from www.aignes.com - I've used it to monitor and archive changes from online newspapers. The archiving is done with Local Website Archive from the same company. There is a free trial, and the price is reasonable. I've found it easy to use and adjust to my needs, and the developer is really helpful and active on the site forum. When checking and archiving content, it took just under a minute to get through the list of 1000+ bookmarks. HTtrack mentioned earlier is also effective, especially when archiving complete websites. Good luck, Vidar Falkenberg Den 01.03.2006 kl. 16:00:25 skrev Eulalia Puig Abril <epabril@wisc.edu>:
Hi everyone, I was wondering if any of you know about software to capture website content – specifically, to capture online news outlets (CNN, The Washington Post, The New York Times…) as well as blog-types news. We are about to engage in a research involving content coding these sites and were wondering if anybody has information on costs (any free out there?), ease of use, effectiveness in capturing content, time needed to capture content at a point in time, time needed to capture 24-hour content, and any other pertinent information that you may want to share. Thanks in advance to ya all! Eulàlia Puig Abril _______________________________________________ The air-l@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
participants (5)
-
Brian Cugelman -
Charlie Balch -
Eulalia Puig Abril -
Jennifer Stromer-Galley -
Vidar Falkenberg