Dear *, thank you for the answers. I think there should be an open implementation of a (standalone) viewer for WARC files which would also allow to use another archiving system to store these files. In addition it would be possible to view / browse single WARC files (pages stored in WARC files). I also would see the need to "export" a single page with all components e.g., to proof how a web page look at a certain point in time (e.g., for legal reasons, historic research, etc.). Speaking of Heritrix: I was reading the manual and I have a little problem understanding how I can set up a crawl job. My task would be to archive only certain pages in a crawl job, i.e., I want to give Heritrix a list of URLs referring to one page each and I want them to be collected (including all components of that page (e.g., PDF files, images, ...). Anyboy here which could give me a hint / sample job definition? Thank you and Kind regards sws On Thu, Feb 18, 2010 at 1:06 AM, Baden Hughes <baden.hughes@gmail.com>wrote:
WARC's are a standard web archiving file format (http://www.digitalpreservation.gov/formats/fdd/fdd000236.shtml); its an open standard.
Usually you would use a web archiving tool like Wayback Machine or the underlying open source software (the Heretrix web crawler to collect web content, the NutchWAX indexing engine to provide search services, and Wayback to provide the user interfaces), or a service from Archive-IT (subscription to custom web archiving service - www.archive-it.org) to view these files.
I don;t know of a specific viewer for WARCs.
Baden
On Thu, Feb 18, 2010 at 10:06 AM, Steffen Schilke <steffen.schilke@gmail.com> wrote:
Dear *,
could you kindly recommend me a viewer for WARC files (web page archiving).
Kind regards
. _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/