On Aug 13, 2007, at 11:06 AM, Michael Zimmer wrote:
This has been an interesting discussion, and mention of IA's Wayback Machine prompts interesting questions which I'm sure others on the list can help answer:
(a) Are there other media forms (current or historical) where publishing content means that it is automatically scanned and archived by external aggregators (search spiders, Internet Archive, etc)? [If I posted a note on "The Wall" at Yale Law School, no one routinely takes a snapshot of the wall to keep a permanent record of it, right?]
Journals, newpapers, magazines come to mind as archived externally and internally.
(b) If examples for (a) exist, are typical publishers of said content aware that their works are being aggregated and archived in such a way?
yes, and they try to get as much profit out of the arrangement as they can, I think, but alas... it isn't always such an arrangement.
Would a new user know this? Are they notified?
In the case of Newspapers, there were a few court cases a few years ago dealing with the NYT archiving and distributing itself online, but i don't recall anyone complaining about third party distribution such as through firstsearch or similar tools. I think that there is now a standard contract in place for much of this in the publishing industry.
[My concern here is that while many realize that search engines might crawl their content, few realize they keep a cached copy, and even fewer realize that even deleted content is archived by Wayback Machine]
(c) Also, if examples of (a) exist, what means are provided to prevent such automatic archiving? Is it opt-in or opt-out? How technically proficient must one be? [Concern here is that even if you know about Internet Archive, you have to be proficient with robots.txt standards in order to keep them out]
dunno, most organizations seem to want to participate, but only under the best terms they can get
(d) Given (a), how can someone remove past items from such archives? [Wayback Machine will remove all domain-specific content already in its archive if you place a robots.txt file to block it going forward]
I guess what I'm wondering is why there seems to be a presumption that just because I posted something on a website in 1999 I want it to always be accessible. Just because bits don't degrade like paper doesn't mean they -must- persist, does it?
no, but shouldn't we preserve as much as we can? I appreciate the will to destroy, that's fine. But for the people who do not care, the content that they have contributed constitutes evidence of many things.
Keep up the good discussion, michael