New subject: [Air-l] archiving Google's cache

20 Nov 2002

      I have a new take on the old problem of archiving a web site. The problem is
that the site I need to archive has already been taken off line. (An object
lesson in why it¹s important to archive web sites you¹re depending on in
your research....) Fortunately, the site is still available through Google¹s
cache, but this is a difficult way to access the content. (The site in
question is an e-mail list archive, and so each message is a separate page,
which means I need to download more than 2000 pages.)

I¹ve tried various archiving programs, inputting the URL that Google
generates for its search result page as the root page for the archive. But
so far no luck  I think because Google creates a separate URL for each page
of the search results, and because it¹s difficult to figure out the right
³depth² of archive. (I need it to go 200 pages deep to get to the last page
of search results, but I only want it to go 2 pages deep to get each
message.) 

Does anyone have any experience trying to archive a Google cache?  Or any
suggestions?

Thanks,

Alex
-- 
Alexandra Samuel
samuel@fas.harvard.edu
http://www.alexandrasamuel.com

archiving Google's cache

Alexandra Samuel

Karim R. Lakhani

tags

participants (2)