suzmccarth wrote:
> [...]
> The Tamil 99 keyboard only has the independent vowels displayed on
> the keyboard and these can be used to input both the independent or
> dependent vowels, depending on context.
Well, also the US keyboard only display capital letters, but you can use it
to enter both capital and small letters...
> So now I am not sure why the independent and dependent vowels
> have been encoded separately in Unicode.
For the same reason why Latin capital and small letters have been encoded
separately: they have been considered different (although related)
characters.
> I suppose there must be a reason. If anyone knows i would
> be interested in hearing how it came about. What about the final
> forms of consonants in Hebrew of sigma in Greek.
That's inherited to preexisting practice.
The Greek, Hebrew and Arabic blocks in Unicode derive from the corresponding
sub-sections of the ISO 8859 8-bit encoding standard. In the Arabic section
of ISO 8859, contextual forms are non encoded separately, whereas in the
Greek and Hebrew sections they are.
So the question is, why did ISO 8859 handle Arabic differently from Hebrew
and Greek?
I guess that a possible answer might be in the numbers: *all* the 28 letters
of the Arabic script have at least two contextual forms, and 22 of them have
up to four forms (which, if entered manually, would require four different
shift combinations on the keyboard!). So, it made sense, even on old and
slow computers, to put in place a software layer to automatically handle
contextual forms.
On the other hand, the Greek and Hebrew scripts only have one and,
respectively, six letters whith contextual forms, and each one of these
letters only has two contextual forms. So, in this case it becomes
relatively easier to choose contextual forms manually while typing.
BTW, notice that in all three scripts the selection of contextual forms
cannot be 100% automatic. E.g., the non-final Greek sigma is also used as a
numerical symbol; the initial Arabic "jiim" is also an abbreviation for
"plural"; I don't have an example for Hebrew on the top of my mind, but I
definitely recall that non-final forms may be used at the end of a word
(perhaps that was in acronym) in either Hebrew or Yiddish.
This means that if contextual forms are implemented automatically, as in the
Arabic encodings, there must be a way to override the default behaviour,
i.e., the encoding must have the invisible characters called "zero width
joiner" and "zero width non joiner", and the keyboard must allow a key for
these control characters.
--
Marco