Re: Unicode Tibetan (Was: syllable level encoding in unicode)

On Jun 2, 2004, at 8:09 PM, suzmccarth wrote:

> This seems to me to be the only logical answer. I can't think why
> else Tamil couldn't have precomposed characters for aksharas since
> Canadian Syllabics works for 3 languages with fairly different
> syllable structure, Western Cree, Naskapi and Inuit.
>

Well, it's a bit more complex historically.

All of the Indic scripts in Unicode are derived from ISCII, which
represents a native Indian approach to encoding them. The main
criticism against ISCII is that it encodes all the various Indic
scripts in parallel, something which works better for some scripts than
for others.

In any event, in its earliest days Unicode was far more inclined to
encode syllabaries analytically, and let rendering technology put the
pieces together. The first Ethiopic proposal for Unicode did this, for
example. Certainly Unicode's original approach to Korean was analytic;
the presence of the less-than-preferred monster hangul precomposed
syllable set was is the result of of enormous political pressure from
Korea as well as to meet the technical needs of Unicode implementers,
most notably Microsoft.

As a rule, the preferred approach to use when encoding a syllabic
writing system depends on a lot of factors, most notably the overall
size of the repertoire and whether or not it can be broken down
structurally. In the case of Korean, the hangul have a
straightforward, natural breakdown into smaller atomic units. Ethiopic
was borderline in this regard (as was Yi), because there are some
regularities between different syllabic glyphs, but a number of
irregularities as well. Other syllabic scripts such as Japanese kana,
Cherokee, UCAS, and Linear B have no significant regular glyphic
features to use as the basis for a sub-syllabic encoding.

Inasmuch as Unicode requires a shaping engine which separates the
graphic from the data layers and encourages the use of input methods or
other sophisticated input techniques which separate the input from the
data layer, there is absolutely no reason why any end user *must* be
required to be aware of how the text is actually encoded. On the Mac,
at least, there would be nothing to prevent someone from creating a
syllabic-oriented keyboard, and I assume the same is true on Windows.

========
John H. Jenkins
jenkins@...
jhjenkins@...
http://homepage.mac.com/jhjenkins/