Innovation: Computers Learn New ABCs

By Michael Erard
September 2003

For tens of millions of people around the world—from West Africa to Southeast Asia to the Middle East—the Internet’s not such a friendly place. That’s because many of the world’s writing systems still aren’t encoded in software, which means millions of people can’t write e-mail, build Web sites, or search databases in their native scripts. A group of linguists at the University of California, Berkeley, is trying to change that, by making sure that nearly 100 additional scripts have a place in a crucial international standard that lets computers render, process, and send text data.

That standard is called Unicode, which assigns a unique ID number to every written character, symbol, and punctuation mark in a written language. The ID numbers mean that characters won?t get misinterpreted as data move between software programs or across the Internet?a problem that sometimes shows up as a string of question marks on your screen and can cripple the ability of whole populations to communicate via the Internet. For example, Unicode is enabling radical economic transformations in Vietnam. Before this year, computer and software manufacturers had come up with 43 different ways to encode Vietnamese text, which meant computers couldn?t reliably swap data. Then, early this year, the Vietnamese government adopted Unicode as its national standard.

article is here