Re: [Air-L] on the Wayback Machine (was public/private [part 1 of 2])

14 Aug 2007

      Say I place content on a "publicly accessible" webpage without
...
creating any incoming links or notifying anyone. Web crawlers won't
find it. A search engine won't index it. While on the open and public
Internet, unless a random URL-generator happens to guess the precise
address of the page, no one will ever read it. Is this content "fair
game for researchers"?
I have not read the full thread so please forgive me if I am repeating the
same information.

A web crawler will find you, that's the point. There are a finite number of
IP addresses, 4,294,967,296 (232) , these are what get resolved from a URL.

If you don't want to be crawled create a robot.txt file on your web server
and search engines will skip you.

http://www.robotstxt.org/wc/norobots.html

Martin.

Re: [Air-L] on the Wayback Machine (was public/private [part 1 of 2])

Martin Garthwaite