Re: [Air-L] on the Wayback Machine (was public/private [part 1 of 2])

13 Aug 2007


      On Aug 13, 2007, at 11:06 AM, Michael Zimmer wrote:
...
This has been an interesting discussion, and mention of IA's Wayback
Machine prompts interesting questions which I'm sure others on the
list can help answer:
(a) Are there other media forms (current or historical) where
publishing content means that it is automatically scanned and
archived by external aggregators (search spiders, Internet Archive,
etc)? [If I posted a note on "The Wall" at Yale Law School, no one
routinely takes a snapshot of the wall to keep a permanent record of
it, right?]
Journals, newpapers, magazines come to mind as archived externally  
and internally.
...
(b) If examples for (a) exist, are typical publishers of said content
aware that their works are being aggregated and archived in such a
way?
yes, and they try to get as much profit out of the arrangement as  
they can, I think, but alas... it isn't always such an arrangement.
...
Would a new user know this? Are they notified?
In the case of Newspapers, there were a few court cases a few years  
ago dealing with the NYT archiving and distributing itself online,  
but i don't recall anyone complaining about third party distribution  
such as through firstsearch or similar tools.  I think that there is  
now a standard contract in place for much of this in the publishing  
industry.
...
[My concern here
is that while many realize that search engines might crawl their
content, few realize they keep a cached copy, and even fewer realize
that even deleted content is archived by Wayback Machine]
...
(c) Also, if examples of (a) exist, what means are provided to
prevent such automatic archiving? Is it opt-in or opt-out? How
technically proficient must one be? [Concern here is that even if you
know about Internet Archive, you have to be proficient with
robots.txt standards in order to keep them out]
dunno, most organizations seem to want to participate, but only under  
the best terms they can get
...
(d) Given (a), how can someone remove past items from such archives?
[Wayback Machine will remove all domain-specific content already in
its archive if you place a robots.txt file to block it going forward]
I guess what I'm wondering is why there seems to be a presumption
that just because I posted something on a website in 1999 I want it
to always be accessible. Just because bits don't degrade like paper
doesn't mean they -must- persist, does it?
no, but shouldn't we preserve as much as we can?   I appreciate the  
will to destroy, that's fine.  But for the people who do not care,  
the content that they have contributed constitutes evidence of many  
things.
...
Keep up the good discussion,
michael

Re: [Air-L] on the Wayback Machine (was public/private [part 1 of 2])

Jeremy Hunsinger