Hi Kathleen Lucene is good, but there are also some simple options. I like the command line; there you can use wget: http://gnuwin32.sourceforge.net/packages/wget.htm Usage detailed here: http://how-to.wikia.com/wiki/How_to_mirror,_spider,_or_archive_a_website http://blog.moldoveanu.net/2010/11/downloading-an-entire-website-using-wget/ Or you can use a 'spider' extension as part of the firefox webbrowser; Install firefox, www.mozilla.org/en-US/firefox/new/ and then, in firefox, install the a spider addon, either https://addons.mozilla.org/en-US/firefox/addon/spiderzilla/ or https://addons.mozilla.org/en-US/firefox/addon/foxyspider/ Write back if you have any problems. Cheers Dennis On 02/13/2012 04:48 PM, Wojciech Gryc wrote:
Hi Kathleen,
Apache Lucene is the best resource for something like this, in my opinion. Available here: http://lucene.apache.org/
Requires some programming knowledge though.
Thanks, Wojciech
On Mon, Feb 13, 2012 at 12:33 AM, Kathleen Stansberry <kpontius@uoregon.edu>wrote:
I¹m working on a project that involves conducting a cluster analysis (type of textual analysis based on Kenneth Burke¹s work) on the content of five different websites. I want to download the full content of these five sites so I have hard copies to work from during the rather arduous process of going through and categorizing the text.
Can anyone recommend a good program to download full websites (to a page depth of at least 3)? I¹ve been using SiteSucker but am finding it a bit buggy.
Thank you! Katie
Kathleen Stansberry Ph.D. Candidate University of Oregon School of Journalism and Communication http://katiestansberry.com kpontius@uoregon.edu (541) 228-5576 _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
_______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/