On Tue, 31 Oct 2000 Marco.Cimarosti@... wrote:

> Thomas Chan wrote:
> > I should first say that I'm not a regular user of post-1950's
> > characters
> > used in the PRC, so my perception of characters and their
> > constituent is
> > based on an understanding of the "traditional" forms.
>
> Are you talking about the so-called "simplified" characters? These are the
> hanzi that I've been taught when I studied Chinese, so I am more familiar
> with them than with the traditional ones.
> My impression is that exactly the same combining rules are used in both
> orthographies. Of course, some components are totally different, and many
> hanzi use a smaller number of components, but the way they combine don't
> change.

Yes, I'm talking about those, but I'd like to make a distinction between:

1) characters newly created in the post-1949 PRC (or in usage in
Communist-controlled areas prior to 1949) such as such as wei4 'to guard'
(U+536B < U+885B) and xiang1 'village' (U+4E61 < U+9109).

2) characters created by converting cursive forms to print forms, such as
tou2 'head' (U+5934 < U+982D) and nan2 'difficult' (U+96BE < U+96E3).

3) pre-existing characters during the Republican era (1912-1949) and
earlier times that were variants, "vulgar" forms (su2 U+4FD7), ancient
forms, specialist use forms, etc such as wan4 'ten thousand' (U+4E07 <
U+842C), guo2 'country' (U+56FD < U+570B), bao4 'to report' (U+62A5 <
U+5831), zhong4 'crowd' (U+4F17 < U+773E), and yin1 'feminine' U+9634 <
U+9670). This category could probably be broken up even further.

It's the first and second categories that I had questions about their
components and assembly.


> > I suppose it'd depend on what one considers to be a component in more
> > recently created characters, such as some post-1950's PRC
> > creations, as we
> > don't have epigraphy and other sources to consult the
> > entymology of the
> > pieces. e.g., how many components are in dong1 'east'
> > (U+4E1C) or che1
> > che1 'cart' (U+8F66)? Is the right half of han2 'Korea'
> > (U+97E9) one or
> > many components? What about the right halves of zhuan3 'to
> > turn' (U+8F6C)
> > or chuan2 'to transmit' (U+4F20)? If they are made of more than one
> > component, then I'd think they are "glued" together in new ways of
> > assembly--perhaps some kind of overlapping or merger of
> > certain strokes?
>
> U+8F66 (车), strictly corresponding to traditional U+8ECA (車), is definitely
> an atomic component (it's also a radical in all the dictionaries I have
> seen). BTW, it is also one of the best examples of how "simplification" was
> done: U+8F66 is clearly derived from a strongly cursive version of U+8ECA.
> The other graphemes that you mention should probably be considered as atomic
> components as well. Nevertheless, further analysis of shapes like the right
> part of U+4F20 (ä¼ ) could be possible, in a system that allows for overlaid
> components.
> But I don't understand why you see special combining rules here. Similar odd
> composition exist also in traditional hanzi. See, for instance, some
> compounds of radical U+5F13 (弓): U+5F14, U+5F17, U+5F1F (弔, 弗, 弟).

Are there any cases where multiple components in a "traditional" form have
merged into a new, single component in a "simplified" form?


> > On this note, perhaps one might want to try to fit the more
> > radical (and abortive) "Second Scheme" of PRC simplifications
> > in the late
> > 70's and early 80's into one's system for fanatic completeness.
>
> Interesting; what is that? Do you have any on-line samples? Or can you scan
> printed matter?

I don't have any primary source materials, but there is a small section in
Ping CHEN's _Modern Chinese: History and Sociolinguistics_ (Cambridge:
Cambridge University Press, 1999) about an abortive "Second Scheme" in
December 1977 of 853 characters, of which 248 were to be used immediately,
and the remaining 605 for trial use. He gives only four examples.

a) gan3 'to feel' (U+611F) is changed to something that looks like U+5E72
on top of U+5FC3

b) yu2 'stupid' (U+611A) is changed to soemthing that looks like U+4E8E
on top of U+5FC3

c) shi4 'thing' (U+4E8B) is changed to something that looks sort of like
U+310B

d) gao1 'tall' (U+9AD8) is changed to something that looks like U+310B
with a dot (like in the top of U+5BB6) over it


> > Is there one for trios? A "macro" of sorts for a pyramid structure--a
> > dozen or two occur in the AD 100 _Shuowen Jiezi_ (U+8AAA U+6587 U+89E3
> > U+5B57)--would also be handy for some composition schemes (rather than
> > a combination of a "top-to-bottom" and a "left-to-right"
> > operator). Ditto for quads.
> >
> > x y y
> > x x y y
>
> Are you asking about a specific system? Which one?
> Within the Unicode "Ideographic Description Characters", the only
> 3-component IDC's are for side by side juxtaposition and for vertical
> stacking.

No system in particular, although the IDS of Unicode (and GBK) are the
most interesting because they are ones we might end up using.


> The "pyramid" structure would indeed be a sequence like <TTB x1 LTR x2 x3>
> in Unicode IDS (Ideographic Description Sequences).
> The quad structure is interesting, because it can be represented by two
> competing sequences: <TTB LTR y1 y2 LTR y3 y4> vs. <LTR TTB y1 y3 TTB y2
> y4>.
> BTW, the last time I discussed the issue of "Han decomposition" on the
> Unicode List, this fact of quads (and many other structures) having several
> possible analysis was mentioned as one of the big dangers of the whole idea:
> imagine searching your <TTB LTR y1 y2 LTR y3 y4> in a text file, and not
> finding it because it was written as <LTR TTB y1 y3 TTB y2 y4>...
> Adding a specific "QUAD" operator could sound like a solution to this
> problem, but it isn't -- it just adds one more occasion for
> "misspellings"...

If I had to, I would analyze the quad as two rows that had two elements in
each, rather than as two columns with two elements in each, since in
characters that have a doubled form, "left to right" arrangement seems to
be more common or more natural than a "top to bottom". Additionally, in
characters that have a tripled form, the dominant (perhaps the only?)
arrangement seems to be a "row" of one character, on top of a "row" of two
instances of that character. Going from a single form to a doubled form
to a tripled form, and to a quadrupled form (not all the intermediary
forms neccessarily exist) it seems more consistant to not have to
rearrange the decomposition when "developing" the quadrupled form from the
doubled form via the tripled form.


> About the "pyramid" structure: have you noticed that, in a lot of cases,
> this structure is used with the *same* component repeated three times? I
> always wondered why this is so common; does anyone have an "etymological"
> explanation for this?

At first glance it seems to imply a progression of "multitude" of some
kind, like zhong4 'crowd' (U+4F17) composed of three ren2 'person'
(U+4EBA), or the well-known (and much-abused) mu4 'tree' U+6728 >
lin2 'grove' U+6797 > sen1 'forest' U+68EE. However, there are also
ones like shan1 'odor of sheep' composed of three yang2 'sheep' (U+7F8A)
and biao1 'dogs running fast' (U+72AC) composed of three quan3 'hound'
(U+730B).


Thomas Chan
tc31@...