Re: How about a typology for input methods

suzmccarth wrote:

> [someone else wrote]:
> > >> Everyone, especially the computer guys, keeps telling you that
> > >> script typology has nothing whatsoever to do with input methods.
> > >
> > > Actually the 'computer guys' kept telling me that the unicode
> > > layer and the editing layer are different.
> > > The codepoints and the input and display are different.
>
> Particularly Marco, on June 3, (thank you, and to many others on and
> > > off the list) explained that Indic input has "a sort of 'syllabic
> > > editing' functionally very similar to a Chinese input method."

Did I say this? I guess I have oversimplified a lot...

Actually, Typing Tamil is (and should be) much more similar to typing
English that to typing Chinese.

The only big difference is that English vowels are (and are treated as)
independent letters; this means that, once typed, they can be manipulated
(e.g., deleted) separately. Tamil vowels are (and are treated as) diacritic
marks; this means that, once typed, they can only be manipulated in
conjunction with the letter on which they are applied.

The Italian diminutive of your name, "Susi", is spelled with the sequence of
letters (in left-to-right order): <capital S>, , <s>, . To type it,
there is no doubt that the sequence of keys should be: <Shift+S>, , <S>,
. If you move between "u" and "s" hit <Delete>, you are left with "Sui":
the fact that vowel "i" is no more preceded by a consonant is in no way a
problem or an error.

In Tamil, the same name is spelled with this sequence of signs (in
left-to-right order): <letter S with vowel mark U below>, <vowel mark I>,
<letter S>.

As the second syllable is spelled <I+S> but is pronounced <S+I>, there are
two conceivable typing sequences:

1) <letter S>, <vowel sign U>, <vowel sign I>, <letter S>;

2) <letter S>, <vowel sign U>, <letter S>, <vowel sign I>.

Sequence 1 (called "visual input") matches the way the word is written with
a pen or with a typewriter; sequence 2 (called "phonetic input") matches the
way the word is pronounced and encoded in Unicode.

I always thought that sequence 1 should be more natural for Tamils. However,
it seems that for some native speakers sequence 2 is more "logical" on the
basis that is not a letter on its own right but just a vowel mark, a
modifier, which is "attached" to letter <S>.

(Of course, Microsoft and other software vendors are very happy for this
preference for sequence 2, because that is much simpler to implement,
although laymen often have the opposite impression...)

A consequence of the "i" in "Susi" being as a diacritic sign is the fact
that hitting the <Delete> key with the cursor placed between syllable "su"
and syllabe "si" deletes the whole second syllable. That similar to putting
the cursor before "ñ" and hitting <Delete>: you expect the whole "ñ" to
disappear, not just the "~" or just the "n".

> It is too painfully obvious to me that the typology of the script
> has nothing to do with the input method.

(??? I guess you forgot a "not" somewhere here.)

> [...]
> The Tamil have chosen to ignore all this and use code conversion or
> transliteration.

You have claimed this several times but never gave the tiniest proof of it.

You have merely shown a link to a site where you can type Tamil
transliterated in Latin script and have it in local script: this is just an
amusing toy (perhaps useful too, if one doesn't have a Tamil-enabled
computer at hand), but in *NO* way a demonstration of how Tamil speakers
wish to type their language on computers.

Please have a look at this other toy:

http://oss.software.ibm.com/cgi-bin/icu/tr

You can type something in virtually anything and have it transliterated in
virtually anything else. E.g., I just typed in something in Arabic script
and had it transliterated in Cyrillic script: the existence of this cute
demo does *NOT* demonstrate that typing in Arabic is the input method
preferred by Russians.

> Tranliteration is a good compromise if there is a
> disjunct between code and input. One reason why is because a
> representative of Microsoft originally said that Indic scripts did
> not require an IME.

I am sorry to admit that Micro$oft representative was *TOTALLY* right.

(And, BTW, I am starting to think that terms such "abjad" or "abugida" are
causing a bigger mess than they were suppose to clarify. I am wondering
whether it wouldn't be simpler and *safer* to go back and call "alphabets"
all those things now called abjads and abugidas. You can always add later on
that, however, alphabet "Tom" handles its vowels in such and such way, which
is different to what happens with both alphabet "Dick" and alphabet
"Harry".)

_ Marco