analysis of how much of the web wayback machine is really archiving

16 Nov 2015

      Apologies for cross-posting.  Thought many of you would find of
considerable interest some of the statistics from my new analysis, out this
morning, of what's really in the Internet Archive's Wayback Machine and the
oddities and skew of how its crawlers ingest the web:

http://www.forbes.com/sites/kalevleetaru/2015/11/16/how-much-of-the-internet...

One of the biggest themes that emerges is the need for greater transparency
and understanding of the algorithms and collection processes of large web
archives and dialog with the scholarly research community around what they
collect and the impacts of those decisions on how and in what ways the
archives can be used for research on the evolution of the web.

~Kalev
http://kalevleetaru.com/
http://blog.gdeltproject.org/

kalev leetaru

tags

participants (1)