On Dec 12, 2003, at 7:51 AM, Peter T. Daniels wrote:

> Then let the computer engineer not try to impose that solution on
> writing systems theory; or let them have consulted with experts in the
> field _before_ laying out the "Roadmap."
>

Well, the roadmap isn't writ in stone and is constantly under revision,
and linguistic experts in the field *are* consulted. This is
increasingly the case as we get into increasingly obscure scripts where
it's impossible even to enumerate the basic units of the writing
without consulting experts.

The problem, as Marco says, is that in the end Unicode is an
engineering solution to a practical problem, and as such has to break
with theoretical structure in order to get things to work. And,
because of the need for Unicode to interoperate with older engineering
solutions which have even *less* linguistic theory underlying them, it
gets saddled with a log of unpleasant bric-a-brac. And because Unicode
needs to get solutions to tomorrow's problems in place yesterday, we
can't always wait for the experts to come to a complete consensus
before we proceed.

(The term "ideograph" is a case in point, as it happens.)

Certainly, it's wrong to impose the engineering solutions from Unicode
on writing systems theory; I agree entirely with this. Unicode
organizes like characters into blocks, and there is a strong tendency
to equate these blocks with scripts. And Unicode does define the
concept of "script," largely because as a formal schema it needs
precise definitions of the terms it uses, but also because we keep
getting requests from end-users to list Unicode "scripts." But I
wouldn't want Unicode's roadmap to be anything more than a first
approximation at counting the number of "scripts" in the real world.

But at the same time, it means that we are sometimes two camps
separated by a common language which we use in slightly different ways.
What to call the letters A through Z (and friends) is a case in point.
The character encoding community has adopted here the terminology of
typography, where "roman" is used for a style of type which contrasts
with "italic," and therefore perforce must use Latin. In a different
context, I have no problems with "Roman" being used.

Meanwhile, one of the things that Unicode is fighting is the naive
assumption on the part of a lot of end-users that scripts/writing
systems/what have you are distinct with no overlap, and that there is a
strong link between language/writing system/place. This was one of the
reasons why ISO/IEC 10646 originally had Japanese kanji separate from
Chinese hanzi; there was a desire to be able to have a user tell from
the code-point alone whether a character was being used to write
Japanese or Chinese. In fact, we *still* get questions from people
asking how to do that. But that's not the way people communicate. I
have a picture of a sign I once found in the middle of Hong Kong's New
Territories written in a mixture of Latin/Roman, Devanagari, and
Chinese. Bangkok is rife with street signs in Thai, Chinese, and
Latin/Roman. Japanese texts freely quote Chinese authors writing
Chinese using Japanese forms for the characters. Orson Scott Card has
a friend make up a Chinese character to make his books look more
cooler. And spoken language is even worse. I always get a kick about
the free use of Chenglish on the streets of Hong Kong.

People are people. It's the job of the computer engineer to enable
them to do what they want. It's the job of the theoretician to explain
how and why they're doing it.

========
John H. Jenkins
jenkins@...
jhjenkins@...
http://homepage..mac.com/jhjenkins/