Suzanne McCarthy <suzmccarth at yahoo dot com> wrote:

> Therefore, a large part of the Tamil population are still
> keyboarding in the legacy fonts and using a conversion utility to
> convert to Unicode. Is it possible for Unicode to encode the
> Tamil CV elements also in order of visual sequence and provide
> an equivalency table between the two?

Nope. I mean, yes, it is theoretically possible for Unicode to
establish a second encoding model for Tamil to co-exist with the
existing model, but it's not ever going to happen because of all the
problems and confusion that arise.

Latin and Greek have both (a) separate letters and diacritics and (b)
precomposed letters-with-diacritic, which have to be matched via
normalization. Hangul has (a) separate leading-consonant, vowel, and
trailing-consonant letters, (b) precomposed syllable blocks, and (c) a
separate set of compatibility letters like (a), except that the leading
and trailing consonants are unified.

These are warts in Unicode, not features. Ten or so years of experience
has shown what a mistake it is to introduce two or more encoding models
for a single script. The Latin, Greek, and Hangul situations exist
because there were standardized character sets in wide use, with which
it was felt that Unicode had to maintain 1-to-1 convertibility, not
because it was felt they would aid keyboard input in some way.

> Tamil script needs to be recognized as a syllabic script with
> provision for glyph-based input as exists in Japanese, Chinese
> and Cree. How this is done is not be important to me. That this
> is done is vital.

How the Tamil script is described in scholarly works, and modeled and
encoded in Unicode, is not relevant to how Tamil keyboard input methods
are designed. Please understand this.

If I wanted to design a keyboard for Latin this way, I should be able to
do so regardless of the encoding. Suppose I want to have keys for
lowercase L (l), lowercase C (c), and lowercase reversed-C (or open-O)
(ɔ), and I want to use these components to create lowercase d and b
"visually" as follows:

c + l = d
l + ɔ = b

Whether I can do this or not is totally unrelated to the encoding model,
and totally unrelated to the way scholars of writing systems view Latin
(and totally unrelated to whether d and b really consist of those
components, historically).

> Why are Tamil researchers working on
> handwriting recognition? They should have some simple visual
> sequence input that is appropriate for their script.

Everyone is working on handwriting recognition.

-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/