--- In qalam@yahoogroups.com, "Nicholas Bodley" <nbodley@...> wrote:
> On Sat, 19 Mar 2005 19:18:55 -0500, Richard Wordingham
> <richard.wordingham@...> wrote:

> > The discussion on Tamil is almost totally invalidated because of a
> > belief that /hoo/ is encoded <<ee>><<h>><<aa>>.

> [ I see doubled "angle brackets" surrounding transiterated letters and
> such. I'm wondering why they are doubled; there must be a good reason.]

Amateur linguist at work! Linguists use different brackets to
indicate at what level they are talking - / / for phonemes (the
underlying contrasting units in speech), [ ] for phones (the actual
sounds that emerge), < > for spelling, // // for archiphonemes
(phonemes except that the context denies the possibility of saying
precisely which one, as in the lack of contrast between /s//d/ and
/s//t/ at the start of an English word) and /// /// for morphemes
(elemental bits of word with meaning, as in <meaning> = ///mean///
///ing///). The brackets may be omitted if the context does not
require it.

In this case I was using < > for the sequence of glyphs, and << >> for
the sequence of codepoints. I was bracketing every element as the
elements were not guaranteed to be represented by single characters.

> > Is their encoding published? It could get very unwieldy if they
have to
> > add new combinations.
>
> That's one thing I don't remember from reading; it's more than late,
else
> I'd go have a look. (Also have a huge backlog of e-mail.)
>
> They did mention plain 16-bit code points, which leaves a *lot* of
space!
> 16 bits gives you 65,536 unique numbers. I would expect them to use
one of
> the "higher" planes; no room in the BMP, any more, afaik, if they are
> thinking thousands.

I don't think they had any intention of being compatible with Unicode.
There's enough private use area to accommodate them in the the
higher planes. I did start wondering if they should be offered an
equivalent of the Korean Hangul encoding area (as opposed to jamo
area), but I recall Michael Everson saying that that area has only a
political justification.

> I think I read that the newest Uniscribe(s) do shaping and joining of
> Indic scripts.

Indeed, and if you've been playing with the syllabic editor Suzanne
and I have been discussing, you should see some conjuncts on the
Devanagari keyboards. On the 'Chhahari extended keyboard', the three
rightmost light blue keys on the bottom row contain the conjuncts
k.sa, tra and jña, all defined as Unicode sequences.

Richard.