[Air-L] Heritrix - crawl job for single pages referenced by URL

19 Feb 2010

      Dear All,

Heritrix: I was reading the manual and I have a little problem
understanding how I can set up a crawl job. My  task would be to archive
only certain pages in a crawl job, i.e., I want to give Heritrix a list of
URLs referring to one page each and I want them to be collected (including
all components of that page (e.g., PDF files, images, ...). Anybody here
which could give me a hint / sample job definition?

Thank you very much in advance

.

[Air-L] Heritrix - crawl job for single pages referenced by URL

Steffen Schilke