Tex Texin wrote:
> Michael Everson wrote with respect to Urdu:
> > In Unicode terms it is the same script.
> Now that is an interesting comment, and I probably should
> know this, and a quick look didn't turn up the answer:
> What is the criteria by which Unicode determines what is
> in or out of a script?

My *outsider*'s impression is that there is no pre-defined criteria: each
case of possible "unification" of scripts is debated, and the decision
depends on a variety of factors, not last the political pressure of national
standard bodies.

Some case of unifications are borderline cases, e.g. Latin vs. Fraktur vs.
Irish (which have been unified) and between Latin vs. "Old Italic" (which
have been disunified).

There is at least one case of unification -- the one of the Greek and Coptic
alphabets -- on which the Unicode Consortium is probably changing it's mind,
encoding a separate Coptic script.

> I realize Unicode labels characters with a script
> identification, but it hasn't occurred to me before
> this to ask the criteria. I had presumed the script
> was either obvious or determined by others, but
> of course there are these grey areas.
> I know the labels are from ISO 15924 (and I know you are
> involved with that) and that Unicode is the registration
> authority, but I didn't see any criteria for distinguishing
> scripts.
> How is it determined that Urdu is the same as Arabic?

In the case of the Urdu language, I would say that it is one of the
"obvious" cases. Nakanishi is the only source I have seen which treats Urdu
as entry from Arabic, but the entry starts with something like (quoting from
memory) "The Urdu alphabet is in fact the same as the Arabic alphabet, in
the beautiful Nastaliq style". Probably, he only does this because his book
is bases on samples of daily newspapers, and he had a very good-looking
sample of Pakistani newspaper that he did not want to drop...

As for where the Unicode Standard says that the Urdu script is considered
the same as the Arabic script, that is in section 8.2, page 189 of the old
version 3.0 book: "The Arabic script is used for writing the Arabic language
and has been extended for representing a number of other languages, such as
Persian, *Urdu*, Pashto, Sindhi, and Kurdish. *Urdu* is *often* written with
the ornate Nastaliq script *variety*."

_ Marco