Re: [Air-l] Archiving web sites
Frank Schaap <architext@fragment.nl> wrote... [WGET]
need to dig into it a bit further, but it seems almost perfect...
And for everyone else who doesn't like command lines: check this out: http://www.jensroesner.de/wgetgui/ Yes, it's ugly. But it lets you click boxes instead of memorizse letters.
it however doesn't appear to be able to follow and archive pages hidden behind javascript pop-up code. any inside hints or tips about that?
Not in particular, I'm afraid. This is a Known Problem, and from a social historian's point of view tells a story about the adoption of the technology. Originally, wget was meant for system administrators to back up their websites. And so it offered a lot of commands of the form "copy all documents from this site." Then the search engine spiders got at it, so it got commands like "figure out how deep we've traversed, and copy it too." But it hasn't yet really Made It as an end-user tool. Javascript has a problem that it's a programming language. It can create URLs on the fly, based on your user name, my astrological sign, and the date at that very moment. Which is why the wget authors decided not to deal with it. Now, most (but not all) javascript tags use something like javascript:command( ...., URL, ....) and they hardcode the command into the text. In which case, IF YOU ARE UP TO COMPILING C CODE (!), see this http://www.geocrawler.com/archives/3/409/2000/4/100/3543604/
participants (1)
-
Danyel Fisher