First of all, I have to reframe the question in different way. Is the problem of ASCII or the problem of Unicode we are talking about? On the one extreme we can argue that would it be nice that every domain names, hyperlinks and URL should stay in English alphabets (which enters the ICANN multilingual issue which I aim to avoid in this discussion), on the other extreme we can argue that there would be no problems if everyone is using Unicode now (which implies a coercive force to impose that without the usual technology diffusion). I cannot speak for all those open source contributors out there. I did not even try to find the regional and linguistic demographics of open source community. Though I am a big fan of "good will" in Reagle's thesis, I cannot overlook the potentials of competitions and creative conflicts among all branches of open source projects. Then the question would be, who should make this efforts? I will argue that the weight is overwhelmingly weighted on people who has to use Unicode. In practice, it easily becomes a favor to be asked from those who need Unicode, and extra work to be done by the IT support. Then Unicode the solution becomes a problem. I am not saying there is no problem in Unicode implementation. The reason why I raise the problem here in the AOIR mailing list, not in the Unicode mailing list is not to reaffirm the perception that adoption of Unicode could be difficult, but rather raise the relevant research issues around it. Imagine Wikipedia project does not manage to implement the Unicode when it is hard. Imagine Chinese Wikipedia does not manage to negotiate the simplified and traditional Chinese entry title and URL. Wikipedia will never be the same. It is not a favor that we (who need Unicode support) ask. We (internet researchers) need empirical research to see why and how the Unicode support is implemented in various projects. It is not merely a issue that we should provide better support for programmers. Again, I am not arguing that the transition from non-Unicode to Unicode is easy and could be done overnight, and hence I have no intention to imply that it is all programmers' unwillingness and laziness to finish the mundane jobs. It is the opposite. If we lay out why, how much and how Wikipedia, youtube, Google and etc. invest in Unicode deployment (exploiting the open nature of Internet), we can better understand the richer dimensions of techno-linguistic polices. It is not my intention to play blame game (the west versus east or the programmers versus users). It is the opposite. Why Baidu supports simplified Chinese versions of services, excluding Tibetans, Hong Kongese and even Taiwanese whom Beijing try to represent while Google and Youtube do much better jobs in creating a space where East Asians can fight with each other on the same page. I hope this case shows my intention to make this an interesting research issue for mutli-discinplinary research than blaming any particular groups of people. I hope we are debating on "Information wants to be ASCII or Unicode" versus "Information wants to be digital", not "Information moving from ASCII to Unicode is difficult". Then the issue would be clearer. Who decides what digital standards should be selected and deployed. What is the negotiation process. And why? Operating systems, global websites, regional websites, e-government services, citation databases etc are all the domains we should ask. Mike Stanger wrote:
Just as a bit of evidence of how difficult it can be to grok character issues: Unicode is not "an encoding" itself, but a repertoire of characters, their names, and (abstract) code points (i.e., UCS), plus a set of encodings (i.e., UTF-8, UTF-16), extra properties, and algorithms. And I'm sure a Unicode geek could pick some wholes in what I've said!
True enough :-) Part of the problem in discussing Unicode (and other things) is that one can speak to it at a 'standards' level or an 'in practice' level at whatever level of practice the person encounters Unicode. By encoding I wasn't intending to imply that it was like dealing with a codepage equivalent, but that there are assumptions that are part of using Unicode that may not be visible to the people using it.
I'm thinking that the stated intent by a programmer, say in an open source project, that the project is using unicode for the purposes of being 'politically friendly' and interoperable would have the effect of not only making the statement, but encouraging people to help guide the programmer(s) in actually achieving that goal -- those who have a deeper understanding of the issues informing those who are looking for the practical goal of interoperability.
Mike _______________________________________________ The Air-L@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers: http://www.aoir.org/
-- Han-Teng Liao PhD Candidate Oxford Internet Institute http://www.oii.ox.ac.uk/people/students.cfm?id=123