Re: [Air-L] Information wants to be ASCII or Unicode? Tibetan-written information cannot be ASCII anyway.

16 Jul 2009

      I agree that "Information wants to be digital", and that is why we 
should start a honest conversations among programmers, IT support, 
academics and policy makers. 

I disagree that the notion that the technical support of Unicode source 
is confusing for programmers.  Please refer to the following blog post:

The Absolute Minimum Every Software Developer Absolutely, Positively 
Must Know About Unicode and Character Sets (No Excuses!)  by Joel Spolsky
http://www.joelonsoftware.com/articles/Unicode.html

We can debate about the technical implementation on and on (but I hope 
the above link has settled the technical debate).  However, it would be 
better to ask first whether we need, for example, Korean, Japanese and 
Chinese to *coexist* on the same page, or alternatively, Jewish and 
Arabic to *coexist* on the same page.  If the social and communicative 
need across languages is among our priority to support a better Internet 
environment, then the answer is obvious.  Again, the reason why Unicode 
is supported and maintained by industry and experts points to the fact 
that Google, youtube, facebook and other websites supports Unicode 
probably for a simple reason: they want to reach other local markets. 

My short-cut and simplified understanding of the whole software industry 
movement in i18n (internationalization) and l10n (localization) is as 
follows.  The industry (and along with open source community who 
actually excels in i18n and l10n) has proposed Unicode by first 
imagining there is limitless space (codepoints) for 
alphabets/scripts/strokes/characters to be assigned.  And then the 
industry can compete to implement them and satisfy any potential markets. 

So I am of the opinion that Unicode is actually market-friendly and 
potentially programmer-friendly.  It takes more effort to make it 
politically-friendly rather than merely politically-correct.  I hope my 
starting point is not about multicultural or multilingual correctness, 
but about an open nature of Internet....

All languages want to be digital.  We have enough space for them.

Mike Stanger wrote:
...
At the risk of sounding like an apologist for a particular 
linguistic-centricism, English or otherwise, from a programmer's 
standpoint there are challenges beyond simply the choice to use 
Unicode or some language specific codepage.
Just using Unicode doesn't guarantee that the application viewing the 
content will have the appropriate fonts, for one, even if the proper 
unicode character sequences are sent (much as marking your pages as 
GB2312 doesn't give the end-user's machine the automatic ability to 
display the content), so it's questionable that the end-user usability 
will actually improve just by using Unicode and I would expect that at 
some levels it makes it more difficult to guarantee interoperability 
when the incoming stream is arbitrary content of an arbitrary language 
or set of languages.
When I'm coding, I'm actually much more comfortable knowing that I 
have a specific codepage to address rather than just knowing that I'll 
have a Unicode stream, for example, because I'll know exactly what my 
application should support. Unicode really tells me nothing other than 
the content could be any known character, including the famous 
"snowman" symbol  :-)  If I'm trying to mash-up a site and my code 
sees that it's in GB2312 I can take appropriate steps to support it, 
or report back that the feed is incompatible.  If I get a Unicode 
source, I have to be constantly aware that the feed might at some time 
have some requirements that I haven't yet addressed.
I might suggest that rather than restricting the phrase to linguistic 
elements and suggesting that "Unicode" is a superior term to "ASCII" 
in this case, I'd broaden it out and say "Information wants to be 
Digital" -- I think that's more the heart of the matter, but the term 
ASCII conveys more meaning about language/etc. and likely helps makes 
the implication of the argument more direct.
YMMV - There are of course libraries of routines to address such 
issues in code, but I think that actually points to the fact that 
sometimes Unicode is not a simple, direct answer to a problem as 
people might expect it to be.
Mike
On 14-Jul-09, at 12:21 AM, Han-Teng Liao (OII) wrote:
...
Dear all,
Running the risk of trolling and misrepresenting the famous motto 
"Information wants to be ASCII", I want to raise the question of the 
difference between "Information wants to be ASCII" versus 
"Information wants to be Unicode" from a multilingual perspective.
It should be pointed out when  Lev Manovich declare "Information 
wants to be ASCII"  when talking about remix and remixability of 
information, it was in 2005 when the adoption of Unicode was just in 
the early adoption period globally.  So I do not intend to raise the 
question to make lazy criticism against the America-centric 
implication inside ASCII, but rather raise the question about remix 
and remixability across linguistic boundaries.
  Why the Unicode is not universally deployed yet?  How can we 
measure the remixability across linguistic boundaries simply because 
the information are encoded not in Unicode?  Why so many 
user-generated content websites in China are only using their 
simplified-Chinese-only kind of "national standard" (GB2312) even 
when Hong Kong (using traditional Chinese not included in GB2312) is 
part of China and Beijing claims Taiwan is part of China?  What about 
Tibetan-written information: is it want to be Unicode or GB 
18030-2000?  Tibetan-written information cannot be ASCII anyway.
I really like to hear from you.
Best regards,
-- 
Han-Teng Liao
PhD Candidate
Oxford Internet Institute
http://www.oii.ox.ac.uk/people/students.cfm?id=123
_______________________________________________
The Air-L@listserv.aoir.org mailing list
is provided by the Association of Internet Researchers http://aoir.org
Subscribe, change options or unsubscribe at: 
http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
Join the Association of Internet Researchers:
http://www.aoir.org/
-- 
Han-Teng Liao
PhD Candidate
Oxford Internet Institute
http://www.oii.ox.ac.uk/people/students.cfm?id=123