Re: Unicode and Abugida Saga (was: wikipedia)

On 22/10/05, Doug Ewell <dewell@...> wrote:

> Andrew West <andrewcwest at gmail dot com> wrote:
>
> > I'm afraid SC UniPad didn't do a very good job -- in your "decomposed"
> > version all the small letters with oxia (e.g. U+1F71 GREEK SMALL
> > LETTER ALPHA WITH OXIA in πράγμαθ') remain undecomposed.
> >
>
> You're right, technically. UniPad's "decomposition" only converts
> between precomposed letters and combinations of base letter plus one or
> more combining marks. It does not perform true normalization, in the
> sense of converting a character like U+1F71 to its singleton equivalent
> U+03AC.
>
> In terms of what Suzanne was looking for, the example was probably
> sufficient, because I got the impression she was interested in the
> ability of her system to display precomposed letters vs. base letters
> with combining marks. Singleton decompositions don't change that.
>

Ah, but in this case they do, because U+03AC canononically decomposes
to <03B1 0301>. The result is that your "decomposed" text is still a
mixture of composed and decomposed Greek characters (18% of the text
is undecomposed), whereas if subject to NFD normalization, *all*
characters would be fully decomposed into either base characters or
combining characters.

Andrew