Re: How about a typology for input methods

--- In qalam@yahoogroups.com, Marco Cimarosti <marco.cimarosti@...>
wrote:

> suzmccarth wrote:
> > [someone else wrote]:
> > > >> Everyone, especially the computer guys, keeps telling you

that

> > > >> script typology has nothing whatsoever to do with input

methods.

> > > >
> > > > Actually the 'computer guys' kept telling me that the unicode
> > > > layer and the editing layer are different.
> > > > The codepoints and the input and display are different.
> >
> > Particularly Marco, on June 3, (thank you, and to many others on

and

> > > > off the list) explained that Indic input has "a sort

of 'syllabic

> > > > editing' functionally very similar to a Chinese input

method."

>
> Did I say this? I guess I have oversimplified a lot...
>
> Actually, Typing Tamil is (and should be) much more similar to

typing

> English that to typing Chinese.

Well, on typewriter, it used to be like typing English. One letter
after another in order of visual sequence.

>
> The only big difference is that English vowels are (and are

treated as)

> independent letters; this means that, once typed, they can be

manipulated

> (e.g., deleted) separately. Tamil vowels are (and are treated as)

diacritic

> marks; this means that, once typed, they can only be manipulated in
> conjunction with the letter on which they are applied.
>
> The Italian diminutive of your name, "Susi", is spelled with the

sequence of

> letters (in left-to-right order): <capital S>, , <s>, . To

type it,

> there is no doubt that the sequence of keys should be: <Shift+S>,

, <S>,

> . If you move between "u" and "s" hit <Delete>, you are left

with "Sui":

> the fact that vowel "i" is no more preceded by a consonant is in

no way a

> problem or an error.
>
> In Tamil, the same name is spelled with this sequence of signs (in
> left-to-right order): <letter S with vowel mark U below>, <vowel

mark I>,

> <letter S>.

Thank you for personalizing this, Marco. You must know by now that
I can type in Tamil, on several different keyhboards, actually, real
Tamil words. I can also keyboard and google in many other
languages, from Greek and Russian to Hebrew, Punjabi and Chinese
(Pinyin input). No I don't speak these languages but can use the
writing systems in unicode.

> As the second syllable is spelled <I+S> but is pronounced <S+I>,

there are

> two conceivable typing sequences:
>
> 1) <letter S>, <vowel sign U>, <vowel sign I>, <letter S>;
>
> 2) <letter S>, <vowel sign U>, <letter S>, <vowel sign I>.
>
> Sequence 1 (called "visual input") matches the way the word is

written with

> a pen or with a typewriter; sequence 2 (called "phonetic input")

matches the

> way the word is pronounced and encoded in Unicode.
>
> I always thought that sequence 1 should be more natural for

Tamils. However,

> it seems that for some native speakers sequence 2 is

more "logical" on the

> basis that is not a letter on its own right but just a vowel

mark, a

> modifier, which is "attached" to letter <S>.

I have read many Tamil lists and have read that they have not
accepted typing in order of phonetic sequence. I am not going to cut
and paste from these lists for now because some of them sound as if
they think they are in private (little do they know). Maybe later, I
will reassess this. I think that phonetic sequence as 'logical
order' was introduced by the ISCII standards committee so that all
over India the Inscript keyboard could be used. But the Tamil never
accepted ISCII, they used TSCII. And still today they input in
TSCII or earlier legacy fonts, (TAB and Bamini fonts) and use code
conversion. These are not toys. The Tamil have not switched from
visual sequence to phonetic sequence input yet. The Tamil Anjal
keyboard (very popular)is Romanized so the alphabet mediates. Tamil
using the Anjal keyboard already know how to keyboard English. Most
other popular methods are transliterations.

>
> (Of course, Microsoft and other software vendors are very happy

for this

> preference for sequence 2, because that is much simpler to

implement,

> although laymen often have the opposite impression...)

No, Microsoft was asked explicitly to provide an IME in order to
enable input of each visually distinct akshara and Microsoft
refused. But now there is one - just not very good yet. This has
created a big fuss in India - now some research centres are moving
to handwriting input and speech inut because they are so
dissatisfied with trying to input in order of phonetic sequence.

>
> A consequence of the "i" in "Susi" being as a diacritic sign is

the fact

> that hitting the <Delete> key with the cursor placed between

syllable "su"

> and syllabe "si" deletes the whole second syllable. That similar

to putting

> the cursor before "ñ" and hitting <Delete>: you expect the

whole "ñ" to

> disappear, not just the "~" or just the "n".
>
> > It is too painfully obvious to me that the typology of the

script

> > has nothing to do with the input method.
>
> (??? I guess you forgot a "not" somewhere here.)

FOR ME, typology and input method ARE related. The way I think
about it, I see Tamil as having syllabic characteristcs and then I
can look for the syllabic IME. And Bhashaindia has one but it needs
work. Maybe it isn't the best input method but there is no
consensus right now for Tamil input. So a framework for me should
predict something and it did but I would like to discuss it with
likeminded people.

It is painfully obvious to me that FOR EVERYONE ELSE, typology and
input are NOT related. So we disagree. Does this mean that I am
wrong? My way of thinking correctly predicted the existance of the
syllabic IME.

> > [...]
> > The Tamil have chosen to ignore all this and use code conversion

> > transliteration.
>
> You have claimed this several times but never gave the tiniest

proof of it.

Different people have told me privately, it is all very recent.

>
> You have merely shown a link to a site where you can type Tamil
> transliterated in Latin script and have it in local script: this

is just an

> amusing toy (perhaps useful too, if one doesn't have a Tamil-

enabled

> computer at hand), but in *NO* way a demonstration of how Tamil

speakers

> wish to type their language on computers.
>
> Please have a look at this other toy:
>
> http://oss.software.ibm.com/cgi-bin/icu/tr
>
> You can type something in virtually anything and have it

transliterated in

> virtually anything else. E.g., I just typed in something in Arabic

script

> and had it transliterated in Cyrillic script: the existence of

this cute

> demo does *NOT* demonstrate that typing in Arabic is the input

method

> preferred by Russians.
>
> > Tranliteration is a good compromise if there is a
> > disjunct between code and input. One reason why is because a
> > representative of Microsoft originally said that Indic scripts

did

> > not require an IME.
>
> I am sorry to admit that Micro$oft representative was *TOTALLY*

right.

What about the consumers of the product? What about native
speakers? I just can't believe nobody cares if a system can be
used.

> (And, BTW, I am starting to think that terms such "abjad"

or "abugida" are

> causing a bigger mess than they were suppose to clarify. I am

wondering

> whether it wouldn't be simpler and *safer* to go back and

call "alphabets"

> all those things now called abjads and abugidas. You can always

add later on

> that, however, alphabet "Tom" handles its vowels in such and such

way, which

> is different to what happens with both alphabet "Dick" and alphabet
> "Harry".)

Abjads, like alphabets, segment and sequence in a linear manner.
Abugidas have visually distinct units on the level of the syllable.

So, an abjad can be input like an alphabet, and an abugida cannot
without training in using an alphabet first.

Microsoft representative: "an IME is not necessary for Indic input.
It has been suggested that an IME could contain all the necessary
code point combinations (and their respective glyphs), thus
circumventing the shaping engine altogether. This is generally heard
from customers working with complex script languages who feel that
they need to have all visual variants of a code point on an input
method." Who are these customers and why were they, probably native
speakers of the language, not worthy of consideration? Why not have
all the visual variants of a code point on an input method. Chinese
does. Mind you, this IME has been produced but it doesn't really
work yet.

Suzanne

> _ Marco