Re: How about a typology for input methods

--- In qalam@yahoogroups.com, Andrew Dunbar <hippietrail@...> wrote:

> I can envisage two types of IMEs for Tamil based on
> what I've been reading here.
>
> 1) Thai-style which takes maps a character to a key,
> and just rearranges character sequences when the user
> types vowels in the wrong order, and prevents invalid
> sequences. But does not offer a conversion list.

If this scheme is to be used for editing text, there may be a
problem. The Thai method ensures that the text recorded as having
been input is valid at each stage of the input. For example, suppose
the text before input reads <kaka> (encoded ka,ka) and one wants to
insert <te> between the aksharas by typing, in visual order,
<e>,<ta>. On inserting <e>, if one stored ka,e,ka, that would
display as <keka>. Inserting <ta> between the two aksharas would
then yield ka,e,ta,ka, which would display as <ketaka>.

If we used a special holding character, say e', then we could allow
the intermediate storage to be ka,e',ka, say displaying as <kake>
(i.e. <ka><e><ka>), but allowing the cursor to be postioned between
<e> and <ka>. On inserting <ta>, we could have a rule that previous
<e'> plus consonant was 'normalised', so instead of the sequence
ka,e',ta,ka being stored, the sequence becoming ka,ta,e,ka, which
displays as <ka><e><ta><ka> i.e. <kateka>. However, there would be a
problem of sequences such as ka,e',ka being left. Possibly the glyph
for <e'> would have to be different to the regular <e>, incorporating
a visual prompt for a consonant. May be one would need a 'got
consonant already' key that cleaned up such glyphs.

<e'> could be stored using a coding taken from 'private use area';
the naughty option would be to stick them at the unused high end of
the Tamil region.

I'm not sure how to generalise this for use with scripts that have
conjuncts, such as Devanagari, where there is a similar issue with
preposed <i>. Are there any issues with how Sanskrit is written in
Tamil? (It seems crazy that it is, but it is!) As an example to fix
the mind, how is the name 'Draupadi' written in the Tamil script?

It might conceivably be easier if Unicode added visually ordered
preposed vowels for which the stored sequence {visually ordered
preposed vowel} + consonant would normalise to consonant + vowel. I
don't know whether one would need different extra vowels for each
script, or whether one could use a common set. One could even bring
Thai and Lao into the standard scheme! To do this, one would add
phonetically ordered vowels corresponding to the current preposed
vowels, which also precede in logical order. (A new, logically
ordered THAI CHARAACTER SARA AU would not work, because Thai <e><ph
phan><l><aa> (à¹à¸à¸¥à¸²)can be /phlau/ 'axis' or /phe:la:/ 'time, meal'.=

The two words are sorted identically in Thai.) One problem is that
the normalisation required to eliminate visually ordered preposed
vowels is not of the form of the current normalisations. A second
problem is that the normalisation would have to pass through all the
consonants of an akshara.

> 2) CJK-style with an input window and a conversion
> list. Characters do not need to map 1:1 with keys.
> This is like the IME which suzmccarth has found.

Of course, if Tamil is to be seen as having become a syllabary, there
should be no objection to method 3:

3) 'Dead keys', as for entering accents. If one types a preposed
vowel, nothing should happen until the following consonant is
entered; then consonant and vowel can be sent in that order. (Is
this feasible?)

As to my own preferences, I liked the legacy input method where you
had overstriking glyphs. When correcting Thai, it annoys me that I
lose the superscript or subscript vowel and tone mark when I try to
change a consonant. At least in Thai (I'm not sure about other
systems), you can delete the vowel or tone mark without losing the
consonant. All this is a big difference from an alphabet, which Lao
(which is almost the same script, certainly as much as English and
Old English use the same script) appears to be by some of the
definitions floating around here.

Richard.