Using Wikipedia as a case to further the discussion (1) The history of Wikipedia logo: From English only to International identity .....and some mistakes along the way... http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_logos http://meta.wikimedia.org/wiki/Wikipedia/Logo (2) Unsung hero (in my personal view, open to debate) Autrijus Tang's effort in Perl Internationalization http://www.perl.com/pub/a/2005/09/08/autrijus-tang.html Tang is a Taiwanese hacker. (3) Unicode's support in Wikipedia I have problem to locate the version control file to see when Unicode began to be supported and fully supported. http://meta.wikimedia.org/wiki/Wikipedia_timeline (not mentioning Unicode here) However, according to the entry of "Chinese Wikipedia" in English Wikipedia, we have the following paragraphs: ========================== The Chinese Wikipedia was established along with 12 other Wikipedias in May 2001. At the beginning, however, the Chinese Wikipedia did not support Chinese characters <http://en.wikipedia.org/wiki/Chinese_character>, and had no encyclopedic content. It was in October 2002 that the first Chinese-language page was written, the Main Page <http://zh.wikipedia.org/wiki/>. The first registered user of the Chinese Wikipedia was Mountain. A software update <http://en.wikipedia.org/wiki/Software_update> on October 27 <http://en.wikipedia.org/wiki/October_27>, 2002 <http://en.wikipedia.org/wiki/2002> allowed Chinese language input. ..... In order to accommodate the orthographic differences between simplified Chinese <http://en.wikipedia.org/wiki/Simplified_Chinese> and traditional Chinese <http://en.wikipedia.org/wiki/Traditional_Chinese> (or Orthodox Chinese), from 2002 to 2003, Chinese Wikipedia community gradually decided to combine the two originally separate versions of Chinese Wikipedia. The first running automatic conversion between the two orthographic representation starts from December 23, 2004, with MediaWiki 1.4 release. The needs from Hong Kong and Singapore were taken into accounts in MediaWiki 1.4.2 release, which made conversion table for zh-sg default to zh-cn, and zh-hk default to zh-tw.^[2] <http://en.wikipedia.org/wiki/Chinese_Wikipedia#cite_note-1> ^========================== Overall, from the above evidence, it could be argued that Wikipedia's internationalization is a clear effort to adopt the Unicode standards by mostly the Unicode-needed crowd. It is worth pointing out that around 2001 and 2002, the major operating systems such as Microsoft and Mac that most normal PC users used at that time seem to be not Unicode available yet, which makes such development in Wikipedia more interesting. Again, coming back to the original question. Why Wikipedia wants to be Unicode? or....Why not Wikipedia choose other solutions to deliver interoperability? -- Han-Teng Liao PhD Candidate Oxford Internet Institute http://www.oii.ox.ac.uk/people/students.cfm?id=123 Han-Teng Liao (OII) wrote:
Running the risk of taking your comments out of the context, I have listed the following responses. Mike Stanger wrote:
......The use of Unicode believing that it solves the interoperability issues and/or is a communication about the intent of the programmer is much the same sin, in my view. Not sure about "not using" Unicode can solve the interoperability issues. If the use of Unicode is one of the more attractive solutions that can deliver some interoperability solutions (as Google, Wikipedia, Youtube, etc. try to do, then I do not know whether the two belief is "much the same sin". ...... However, just using unicode isn't going to resolve all of the interoperability issues (eg. reading direction, and other unique features of the written form of a particular language, etc.). Agree, using Unicode by itself cannot save the world. Still, do you mind showing me not using Unicode or other alternatives would solve the issues better? If such solution or vision does exist, why Google, Wikipedia, Microsoft, Linux, Mac, etc., adopts the Unicode? I am not citing these examples to refute your argument. I am genuinely intrigued to find out why they come to certain solution but not others (including maintaining the status quo by not deploying Unicode to some extent). Ultimately though, what data storage in Unicode does provide almost automatically is the preservation of the appropriate data (unless it gets transformed of course), and its use /could potentially/ signal the intent by the author to enable the coexistence of mixed language content as a politically friendly gesture. I would agree that character encodings could potentially send a signal about the /intent/ to be good internet citizens, or that the /intentional/ use of something other than unicode could be seen as a statement of political position (eg. mainland China's use of jianti character sets in a particular code page vs. a codepage that supported fanti). Agree, good will matters. Still, efforts to deliver that good will matter as well. I will exhibit some evidence in another email that inside Perl (the programming language that supports MediaWiki which makes Wikipedia possible) and the logo of Wikipedia and Chinese Wikipedia, most of the efforts are requested and done by those who need Unicode support. Then it is not only a picture of good will but some kind of push and pull.
However, I think often programmer intent is lost in the end-product. It would be encouraging to see a movement where programmers stated that their /active decision/ to use Unicode is a deliberate recognition of the multitude of languages as a 'politically friendly' gesture. Politically friendly or politically correct could be a bit patronizing. I will argue that Wikipedia benefits more from other language versions (ranking higher in search results, better webometric position, etc.).
I also assume that there are many coders who are using unicode, but doing so less than deliberately, perhaps even as a side-effect of the development environment that they use (eg. Java's native character/string support), /mirroring the use of ASCII in earlier environments/. These applications may well support Unicode at the character level, but because the programmer's use of Unicode is a sort of side-effect, the end product may not actually interoperate with other languages properly or completely. So while I agree that the use of Unicode is a step forward in interoperability, I'd argue that the work to be done is not so much about the use of Unicode, but the '/publicly' stated intent to be interoperable./ Unicode may be one tool that can assist in that goal if used properly, but the use of Unicode alone says little about intent. I slightly disagree on the meaning of interoperability. If interoperability means a certain linguistic space can still use a non-Unicode standard, then it may create a linguistic hierarchy. For example, Chinese can use GB2312 through out in their user-generated websites, and then Tibetans and traditional Chinese characters cannot have a voice. Again imagine Youtube cannot automatically take the content contributed by Arabic or Persian users, but only some kind of "interfaces" to promise the interoperability. To me it is not about a full support of Unicode at this moment, but it is the awareness that the fact that Unicode is arguably the most open linguistic infrastructure receives little attention.
Then the sharp question will be, can Beijing, Washington, London, Tokyo deliver their government services and communicative spaces by sticking to their linguistic ghetto without using Unicode or other open linguistic architecture?