[Marco.Cimarosti@... (Re: CJK combining components: MOVING TO OTHER ML) writes:]
>> On the Unicode List (the Unicode Consortium's public mailing list:
>> http://www.unicode.org/unicode/consortium/distlist.html), we have been
>> discussing recently about the *analysis* of Han logographs in smaller pieces
>> (variously named "components", "radicals", "hemigrams", "holograms", etc.),
>> and how (and whether) this analysis could be useful for encoding text on
>> computers, building software fonts, and other computer-related fall downs.

One very useful application of decomposing CJK characters into component
parts is that they can be used to carry out the "what hanzi/kanji/hanja
is that?" task.

Some years ago, the 6,355 kanji in JIS X 0208 were "decomposed" into
their visual elements for use in such software. A number of dictionary
packages use the technique. The targets for the decomposition are not
the 214 classical radicals; instead a set of common elements (which has
about a 70% overlap with the radicals) is used.

If you want to see this in action, it is implemented in my WWWJDIC
server. (http://www.csse.monash.edu.au/~jwb/wwwjdic.html) It is the
"Multi-Radical Selection" option on the front menu. If your browser is
not Japanese-capable, start with the "Use Graphics.." option to engage
a mediator server which inserts images instead of raw Japanese.

Cheers

Jim
>>
>> Then I (Marco Cimarosti) wrote:
>> > Anyway. I think that everybody probably had quite enough of this
>> > daydreams of mine et al. So, if anyone wishes to go on chatting
>> > about this, shouldn't we do it privately?
>>
>> And Jon Babcock replied:
>> > Yes, quite right.
>>
>> And Thomas Chan wrote (privately):
>> > If I may, I'd like to participate in the private discussion on CJK
>> > components.
>>
>> So, as there is quite a general consensus that the discussion is quite
>> off-topic for the Unicode List, but a few people would like to continue it,
>> I propose that we move it to another mailing list. The list is called Qalam,
>> and its owner agreed to host the discussion.
>>
>> Here you find all the instructions to (un)subscribe to Qalam:
>> http://www.egroups.com/group/qalam
>>
>> Those who just want to peep in the discussion, but not take part, can read
>> the messages on the web:
>> http://www.egroups.com/messages/qalam
>>
>> Qalam members can retrieve the first part of the discussion from the
>> messages archive of the Unicode List:
>> http://www.egroups.com/messages/unicode
>>
>> The discussion branched from this message:
>> http://www.egroups.com/message/unicode/4263
>>
>> and specifically from this paragraph (Doug was talking about GCS, an odd
>> encoding technology from Taiwan):
>>
>> > An article in the October 12, 2000 issue of Linux Weekly News
>> > <http://lwn.net/bigpage.php3> tries to explain the benefit: "Many
>> > Asian characters are composites, made up of one or more simpler
>> > characters. Unicode simply makes a big catalog of characters, without
>> > recognizing their internal structure; GCS apparently handles things in
>> > a more natural manner." However, the article does not go on to specify
>> > just what is better, more efficient, or more "natural" about the GCS
>> > approach.
>>
>> The discussion then continued in the following messages whose subject begins
>> by either 'Re: "Giga Character Set": Nothing but noise' or 'Re: CJK
>> combining components'.
>>
>> When the discussion on Qalam will be over, if it resulted in anything of
>> interest for Unicoders, we can perhaps send a summary back to the Unicode
>> List.
>>
>> _ Marco
>>
>>
>> www.egroups.com/group/qalam - world's writing systems.
>> To unsubscribe: qalam-unsubscribe@egroups.com
>>


--
Jim Breen [j.breen@... http://www.csse.monash.edu.au/~jwb/%5d
Computer Science & Software Engineering, Tel: +61 3 9905 3298
Monash University, Fax: +61 3 9905 3574
Clayton VIC 3168, Australia $B%8%`!&%V%j!<%s(J@$B%b%J%7%eBg3X(J