Re: Searching for Multiscript Words

From: John Cowan
Message: 855
Date: 2002-09-24

=?iso-8859-1?B?4Yuz4YqV4Yqk4YiNIOGLq+GLleGJhuGJpQ==?= scripsit:

> I was just considering word boundaries and thought that a change of
> script should indicate the start of a new word (assume a space somehow
> vanished). But are there any exceptions to the rule?

In Kurdish, Q and W are Latin letters, while all else is Cyrillic.
You may think Unicode is in error here (I do), but that's the way it
currently is encoded.

Coptic has only six unique letters and otherwise recycles Greek. Unicode
is committed to making this go away eventually.

--
Evolutionary psychology is the theory John Cowan
that men are nothing but horn-dogs, http://www.ccil.org/~cowan
and that women only want them for their money. http://www.reutershealth.com
--Susan McCarthy (adapted) jcowan@...

Previous in thread: 854
Next in thread: 856
Previous message: 854
Next message: 856

Contemporaneous posts     Posts in thread     all posts