On Thu, 19 Oct 2000 Marco.Cimarosti@... wrote:

[This is a follow-up to Marco's 2000.19.19 post on the unicode mailing
list; however, I can't seem to find the post at messages/unicode to give
as a reference, but it should have been somewhere before the #4290's. -TC]

[MC]
> Of course the position of components has to be known, in a way or another.
> One way is to specify it *explicitly*, and IDS is a good example of how this
> could work: the dozen of IDC operators (U+2FF0 to U+2FFB) can describe very
> precisely how components fit within the character square.

[TC]
IDS is useful for the majority of cases, but I think there are still some
flaws with it--I am aware it comes straight from GB 13000.1, so Unicode is
not directly to blame.

[TC]
U+2FFB, the "OVERLAID" operator, is not fine-grained enough for a machine
to understand, but sufficient if both the human writing using it and the
human reading it are familiar with the character (hopefully only one)
being described. For an extreme example, see the fifth entry (Mojikyou
#084546) on the "shinji" page
(http://member.nifty.ne.jp/Gat_Tin/kanji/sinji.htm), part of the
"Kanji no shashin jiten" site
(http://member.nifty.ne.jp/Gat_Tin/kanji/kaindex.htm).
Perhaps it is not properly a "character", and breaks some "rules" we
expect in character construction, but I think it shows how inventive some
characters can be--a number of others I see in _Hanyu Da Zidian_.
(Incidently, the last character on the former webpage is one that I've
seen on "Abercrombie and Fitch" brand clothing, as well as on a t-shirt
worn by actress Jennifer Aniston on an episode of the tv sitcom
"Friends"--amazing that anyone came up with that, given the amount of
basic mistakes and errors made in US pop culture usage of characters.)

[TC]
There are also no operators for performing transformations, such as
rotation, which by rotating liao3 (U+4E86) 180 degrees can create diao3
(Hanyu Da Zidian 10049.01; is this in Unicode?). Nor deletion, which
can take you3 'to have' (U+6709), delete the middle two strokes, and
arrive at the Cantonese dialect character yau (Mand. mao3) 'to not have'
(U+5187)--a very common character. Deletion could also generate U+4E52
and U+4E53 for 'ping pong' from bing1 'soldier' (U+5175), although they
could probably be alternatively created by composition-based operators.
Another example I see is qian3 (Hanyu Da Zidian 10281.01), which is like
yan2 'speech' (U+8A00) but with the last stroke deleted. Deletion, at the
stroke level, would also be useful for creating "safe" versions of taboo'd
characters. Yet another missing operator is reflection, which can turn
swastika wan4 U+5350 into sauvastika wan4 U+534D. (Perhaps rotation is
not strictly neccessary, as two reflections can perform the task.)

[TC]
Lest I be accused of dragging out bizarre characters, these may be found
in dictionaries, and some are in CNS 11643 in the 1992 edition, which can
be seen at places like the CCDICT site
(http://www.chinalanguage.com/CCDICT/index.html).

[TC]
Unless the components one needs are already encoded somewhere, the
composition-based operators don't work for all cases, unless perhaps one
gets to the level of working with strokes, but that breaks the intuitive
connection and ease for the human reader that one seeks in using IDC's. I
suppose one could always define new "wen".


[MC]
> But, for the great majority of ideographs, this information could also be
> *implicit*, because same components (often corresponding to the "radical" in
> the dictionary, and normally being the first one in the ideograph, writing
> order) often sit in a fixed default position within the ideograph (left,
> top, around, etc.), while the rest has to fit in the remaining free space
> (right, bottom, inside, etc.).

[TC]
I agree here, and there are some components, especially if you encode
radical and non-radical forms separately, that are only valid when they
are in the left or right halves of a character, etc.


[MC]
> A few operators à la IDS are still needed for fixing special cases, i.e.
> when a "radical" sits in an unusual position.

[TC]
Oops, didn't see this when I wrote my speech above...


Thomas Chan
tc31@...