RE: CJK combining components

Thomas Chan wrote:

> I should first say that I'm not a regular user of post-1950's
> characters
> used in the PRC, so my perception of characters and their
> constituent is
> based on an understanding of the "traditional" forms.

Are you talking about the so-called "simplified" characters? These are the
hanzi that I've been taught when I studied Chinese, so I am more familiar
with them than with the traditional ones.

My impression is that exactly the same combining rules are used in both
orthographies. Of course, some components are totally different, and many
hanzi use a smaller number of components, but the way they combine don't
change.

> I suppose it'd depend on what one considers to be a component in more
> recently created characters, such as some post-1950's PRC
> creations, as we
> don't have epigraphy and other sources to consult the
> entymology of the
> pieces. e.g., how many components are in dong1 'east'
> (U+4E1C) or che1
> che1 'cart' (U+8F66)? Is the right half of han2 'Korea'
> (U+97E9) one or
> many components? What about the right halves of zhuan3 'to
> turn' (U+8F6C)
> or chuan2 'to transmit' (U+4F20)? If they are made of more than one
> component, then I'd think they are "glued" together in new ways of
> assembly--perhaps some kind of overlapping or merger of
> certain strokes?

U+8F66 (车), strictly corresponding to traditional U+8ECA (車), is definitely
an atomic component (it's also a radical in all the dictionaries I have
seen). BTW, it is also one of the best examples of how "simplification" was
done: U+8F66 is clearly derived from a strongly cursive version of U+8ECA.

The other graphemes that you mention should probably be considered as atomic
components as well. Nevertheless, further analysis of shapes like the right
part of U+4F20 (传) could be possible, in a system that allows for overlaid
components.

But I don't understand why you see special combining rules here. Similar odd
composition exist also in traditional hanzi. See, for instance, some
compounds of radical U+5F13 (弓): U+5F14, U+5F17, U+5F1F (弔, 弗, 弟).

> On this note, perhaps one might want to try to fit the more
> radical (and abortive) "Second Scheme" of PRC simplifications
> in the late
> 70's and early 80's into one's system for fanatic completeness.

Interesting; what is that? Do you have any on-line samples? Or can you scan
printed matter?

> Is there one for trios? A "macro" of sorts for a pyramid structure--a
> dozen or two occur in the AD 100 _Shuowen Jiezi_ (U+8AAA U+6587 U+89E3
> U+5B57)--would also be handy for some composition schemes (rather than
> a combination of a "top-to-bottom" and a "left-to-right"
> operator). Ditto for quads.
>
> x y y
> x x y y

Are you asking about a specific system? Which one?

Within the Unicode "Ideographic Description Characters", the only
3-component IDC's are for side by side juxtaposition and for vertical
stacking.

The "pyramid" structure would indeed be a sequence like <TTB x1 LTR x2 x3>
in Unicode IDS (Ideographic Description Sequences).

The quad structure is interesting, because it can be represented by two
competing sequences: <TTB LTR y1 y2 LTR y3 y4> vs. <LTR TTB y1 y3 TTB y2
y4>.

BTW, the last time I discussed the issue of "Han decomposition" on the
Unicode List, this fact of quads (and many other structures) having several
possible analysis was mentioned as one of the big dangers of the whole idea:
imagine searching your <TTB LTR y1 y2 LTR y3 y4> in a text file, and not
finding it because it was written as <LTR TTB y1 y3 TTB y2 y4>...

Adding a specific "QUAD" operator could sound like a solution to this
problem, but it isn't -- it just adds one more occasion for
"misspellings"...

About the "pyramid" structure: have you noticed that, in a lot of cases,
this structure is used with the *same* component repeated three times? I
always wondered why this is so common; does anyone have an "etymological"
explanation for this?

_ Marco