analysis of how much of the web wayback machine is really archiving
Apologies for cross-posting. Thought many of you would find of considerable interest some of the statistics from my new analysis, out this morning, of what's really in the Internet Archive's Wayback Machine and the oddities and skew of how its crawlers ingest the web: http://www.forbes.com/sites/kalevleetaru/2015/11/16/how-much-of-the-internet... One of the biggest themes that emerges is the need for greater transparency and understanding of the algorithms and collection processes of large web archives and dialog with the scholarly research community around what they collect and the impacts of those decisions on how and in what ways the archives can be used for research on the evolution of the web. ~Kalev http://kalevleetaru.com/ http://blog.gdeltproject.org/
participants (1)
-
kalev leetaru