Attachments :
Jon Babcock wrote:
> The problem with describing a 'code' that depends solely on
> how a character
> appears in a particular place and time is there are too many
> variations. You
> end up with nothing better than something like the current
> version (3) of
> Unicode. Dozens, if not hundreds, of variations in the form
> of presentation,
> can take place (every font design will have its own, there
> will be national
> preferences) but the red thread throughout them all is that,
> well, each one is
> just another instance of the *same character*. I find the
> attempt to define a
> super class for each character more interesting at present
> than trying to
> account, in the 'code', for each and every variation.

It depends where you draw the line!

I agree that a good level of abstraction is needed, but this does not
necessarily have to be based (entirely) on etymological principles.

I a prefer sticking to graphological distinctions normally perceived by that
modern user (including "popular etymologies" with no historical basis),
rather than relying on historical data that are just known by scholars. The
reason for is that I am imagining a system intuitive enough to be used by
the average user.

However, using graph(olog)ical criteria does *not* mean that any tiny
difference in proportions, direction of strokes, etc. would be taken into
consideration; nor that the distinction should be based on any particular
font or handwriting style.

I attached a few small samples of what I would consider intuitively "common
sense" (a windows BMP. Sorry: my handwriting is ugly even with a brush, but
with a mouse it's really terrible).

I think that you probably would disagree only with cases 2 and 6. The reason
why I would consider these different are:

- They look so different...

- They are considered different "radicals" in many systems (e.g., modern PRC
dictionaries).

- In cases like 1 vs. 2, disunifying helps avoiding those "positional
operators" that we were talking about (i.e.: each component specializes for
a specific position).

- In cases like 5 vs. 6, disunifying helps maintaining distinctions that are
meaningful for users (in this case, the distinction between "simplified" and
"traditional" hanzi: if you unify those, the distinction would only rely on
the chosen font. So, in situations where I can not control the font choice,
I would never know whether my readers will see simplified or traditional
components).

_ Marco