Some statistical work was done for UNESCO here last year that indicates that there is vastly less language diversity on the internet than is routinely claimed.
And I'm talking about whole orders of magnitude of difference, here, not just a few percentage points.
Elijah, that sounds quite interesting -- can you describe the methodology?
My vague recollection is that John and I started with dumps of data from Global Reach - which included jupiter mediametrix, nielsen netratings, et cetera - and compared it to language population size data from SIL's Ethnologue, UNESCO's own data, and a few other things. Including internet host numbers from the Netcraft people. Population and language diversity data is a real mess - there is no single authoritative source for numbers. And you have problems that crop up with countries like Taiwan - which clearly exists as a seperate entity than mainland china, but does not appear in any of the UN's data because of the political difficulty of giving it any kind of recognition as a seperate political body. Sorry that I don't remember more - I didn't write the paper, just cleaned a bunch of the data. And this has been almost 18 months ago, now. It is an interesting lump of work. elijah