19 Feb
2010
19 Feb
'10
1:20 p.m.
Dear All, Heritrix: I was reading the manual and I have a little problem understanding how I can set up a crawl job. My task would be to archive only certain pages in a crawl job, i.e., I want to give Heritrix a list of URLs referring to one page each and I want them to be collected (including all components of that page (e.g., PDF files, images, ...). Anybody here which could give me a hint / sample job definition? Thank you very much in advance .