--- In qalam@yahoogroups.com, Andrew Dunbar <hippietrail@...> wrote:
> --- i18n <i18n@...> wrote: > suzmccarth wrote:
> >
> > > --- In qalam@yahoogroups.com, "Peter T. Daniels"
> > <grammatim@...>
> > > wrote:
> > > > suzmccarth wrote:
> > > >
> > > > > I think ... that Tamil should be thought of as
> > > > > a syllabary for some
> >
> > I have been following this thread and learning. I
> > don't know specifically about Tamil, but I wonder
> > this:
> >
> > If Suz wants to teach less literate people to type
> > Tamil in a specific way, then rather then deabte
> > terminology, why shouldn't she implement (or enable
> > to be implemented) and IME that works how she
> > envisons it?
> >
> > Certainly it can be done in Windows by almost any
> > developer armed with a set of specifications, and it
> > can be done for other other OS's perhaps even easier
> > bcause whole chinks of code can be lifted from other
> > projects.
>
> Not so simple. The programmer also needs knowledge of
> the language or 100% specifications backed up with
> linguistic data and of course totally unambiguous
> terminology throughout.
>
> I can envisage two types of IMEs for Tamil based on
> what I've been reading here.
>
> 1) Thai-style which takes maps a character to a key,
> and just rearranges character sequences when the user
> types vowels in the wrong order, and prevents invalid
> sequences. But does not offer a conversion list.
...
> Number 1 exists for Thai but doesn't ship with Windows
> because most people don't seem to need it. The
> programmer needs to know what rearranging needs to
> happen in *all* cases, and what combinations are and
> are not legal combinations.

Thai comes with Windows XP home Edition as standard. Its use does,
however, have to be explicitly enabled from an 'administrator'
account. It also came with the version of Linux I have, at least in
some packages.

The Thai script is a South Indian script in origin, and presents the
same issues as Tamil. Pre-Unicode practice (TIS-620) has been
adopted for Unicode - Thai is entered in the order it is typed on a
manual typewriter. (Thai codepoints are TIS-620 codes + 0D60.) Lao
followed suit, but phonetic order seems to have been imposed on the
Unicode representation of other Tai scripts.

One possible solution to the problem might indeed be to enter Tamil
as Thai! The drawback is that one would have to use a special fonts
to render Thai characters as Tamil glyphs, and some of the
correspondences would be unnatural if we wish to preserve the
sorting order. As far as possible I have chosen the corresponding
Thai symbol.

The independent vowels and symbols that are neither letters nor
vowels do not need to be modified! They may be entered as Tamil.

For the letters and vowels, I suggest replacing the encodings as
follows (best viewed with a non-proportional font):

Unicode Name New code Same code as
0B95 Tamil letter ka 0E01 Thai character ko kai
0B99 Tamil letter nga 0E07 Thai character ngo ngu
0B9A Tamil letter ca 0E08 Thai character cho chan
0B9C Tamil letter ja 0E0A Thai character cho chang
0B9E Tamil letter nya 0E0D Thai character yo ying
0B9F Tamil letter tta 0E0F Thai character to patak
0BA3 Tamil letter nna 0E13 Thai character no nen
0BA4 Tamil letter ta 0E15 Thai character to tao
0BA8 Tamil letter na 0E19 Thai character no nu

# Replacment chosen to preserve alphabetic order!
0BA8 Tamil letter nnna 0E1A Thai character bo baimai

0BAA Tamil letter pa 0E1B Thai character po pla

# The following codes may need to be shifted - see limitations below.
0BAE Tamil letter ma 0E21 Thai character mo ma
0BAF Tamil letter ya 0E22 Thai character yo yak
0BB0 Tamil letter ra 0E23 Thai character ro rua
0BB1 Tamil letter rra 0E24 Thai character ru # To preserve
alphabetic order!
0BB2 Tamil letter la 0E25 Thai character lo ling
0BB3 Tamil letter lla 0E26 Thai character lu # To preserve
alphabetic order
# If shifted, we would have:
0BAE Tamil letter ma 0E1F Thai character fo fan
0BAF Tamil letter ya 0E20 Thai character pho samphao
0BB0 Tamil letter ra 0E21 Thai character mo ma
0BB1 Tamil letter rra 0E22 Thai character yo yak
0BB2 Tamil letter la 0E23 Thai character ro rua
0BB3 Tamil letter lla 0E25 Thai character lo ling

# The following two assignments are unfortunate, but offer the only
way of preserving order.
0BB4 Tamil letter llla 0E27 Thai character wo waen
0BB5 Tamil letter va 0E28 Thai character so sala

0BB7 Tamil letter ssa 0E29 Thai character so rusi
0BB8 Tamil letter sa 0E2A Thai character so sua
0BB9 Tamil letter ha 0E2B Thai character ho hip
0BBE Tamil vowel sign aa 0E32 Thai character sara aa
0BBF Tamil vowel sign i 0E34 Thai character sara i
0BC0 Tamil vowel sign ii 0E35 Thai character sara ii
0BC1 Tamil vowel sign u 0E38 Thai character sara u
0BC2 Tamil vowel sign uu 0E39 Thai character sara uu
0BC6 Tamil vowel sign e 0E40 Thai character sara e
0BC7 Tamil vowel sign ee 0E41 Thai character sara ae # The non-
Sanskrit vowel
0BC8 Tamil vowel sign ai 0E44 Thai character sara ai mai malai

# 0BCA Tamil vowel sign o - write as Thai vowel sara ee, consonant,
Thai vowel sara aa
# 0BCB Tamil vowel sign oo - write as Thai vowel sara ae,
consonant, Thai vowel sara aa
# 0BCC Tamil vowel sign au - write as Thai vowel sara ee,
consonant, Thai vowel sara am

# There are two semantically matching candidates for virama, and one
visually matching candidate.
# 0BCD Tamil sign virama 0E3A Thai character phinthu
0BCD Tamil sign virama 0E4E Thai character yamakkan # (Not on
Thai keyboard!)
# 0BCD Tamil sign virama 0E3D Thai character nikhahit # Actually
anusvara!
# Collation sequences need to be checked. Yamakkan might be treated
specially.

# To preserve alphabetic order, the following must follow sara aa:
0BD7 Tamil vowel sign au length mark 0E33 Thai vowel sara am
# The superscript vowel 0E37 Thai character sara uee might be
worth considering.

Limitation:
Thai characters ro ru and lo lu are actually 'independent vowels',
but may be used as though dependent. Notepad will allow vowels to
be applied to them, but the Thai version of Word 2002 does not allow
vowels to be added to them. If other software also prevents these
combinations, they cannot be used as consonants, and more arbitrary
encodings will be required.

Now, fire away. Why won't such a scheme work?

Richard.