[Thomas Chan (Re: CJK combining components (was RE: "Giga ...)) writes:]
>> On Sat, 28 Oct 2000, Jim Breen wrote:
>> > >> It would seem that any new system would not help in using the existing
>> > >> dictionaries. Or, do you mean to extend the coverage of the dictionaries you
>> > >> are making and provide an alternate system of organizing the kanji?
>> >
>> > Well, au contraire, it can lead directly into several dictionaries. For example, my
>> > dictionary files of the "JIS" kanji (6,355 + 5,801) have the indices for about half
>> > a dozen dictionaries and instructional books, including the Daikanwajiten. Thus with
>> > appropriate software you can use the fragments to identify an obscure kanji, and
>> > have immediate pointers to it the dictionaries.
>>
>> I suppose "instructional books" says it all, but is such a system intended
>> for less advanced users?

I don't think "instructional books" says it all, at all. I know plenty
of people extremely fluent and literate in Japanese who don't
pretend to know the readings of the JIS 208 kanji (especially the ones
that don't really have any), and who find the multi-fragment technique
useful to access the raw information about them.

>>I occasionally use the "multi-radical" utility
>> in NJStar Communicator while typing, which seems similar to your system,
>> but I despair when I have to specify a character by chopping it up into
>> too many components, and not being able to specify basic positioning
>> information. As most characters (97% is one figure I've seen before--I
>> believe at the time of the Han dynasty U+6F22) are of the
>> signific-phonetic variety, it seems more intuitive (at least from a
>> Chinese language background) to describe a character like qun2 'dress'
>> (U+88D9) as signific yi1 'clothing' (U+8863) and phonetic jun1
>> 'lord' (U+541B), rather than chop the right half into 'mouth' and
>> other pieces.

Perhaps it's because I am not approaching this from a Chinese language
background, but that kanji, which is rarely used in Japanese, is easily
found from two fragments: U+53e3 and the left side of U+521d. This
combination matches with 9 JIS X 208 kanji.

>> Also, I'd like to ask how effective a decompposition system has been for
>> the ~6300 kanji of JIS X 0208--does it break down and yield too many
>> matches when the pool of characters is too large, like the SKIP system?
>> If so, what is this threshold? (I wonder how the 4 Corner system holds
>> up, as it is used for the near-50,000 characters in the _Dai Kanwa
>> Jiten_.)

For the benefit of the others on this list, I'll point out that Thomas &
I have already discussed this on the sci.lang.japan group. Yes, all
these coding/decomposition techniques are less effective when the pool
is large. For that reason, I provide in WWWJDIC both a stroke-count
range filter, and the ability to constrain the search to just the 1945
"jouyou kanji", or the full 12,000+ in JIS 208+212.

I don't use the Four-Corner system, so I don't know how it holds up. I
tend to use SKIP most of the time when there is a clear horizontal or
vertical division. I don't remember the enclosure or "other" rules that
well, so for those cases I go for the multi-fragment method.

Jim

--
Jim Breen [j.breen@... http://www.csse.monash.edu.au/~jwb/%5d
Computer Science & Software Engineering, Tel: +61 3 9905 3298
Monash University, Fax: +61 3 9905 3574
Clayton VIC 3168, Australia $B%8%`!&%V%j!<%s(J@$B%b%J%7%eBg3X(J