This came from the SuSE Linux English e-mail list. I thought it might be of
interest to some of you on the Old Norse List. I am doing my best to convert
to exclusively using UTF-8/16/32 in all my computing activities. I intend to
do this in such a way as to help others cooperate with my efforts. This is,
in my mind, a very important issue. UTF-X should make working with 'special'
characters much easier for everybody (once we learn how to use it.)

Here's a brief description of what I've learned so far:

On my SuSE Linux 8 box here's what things look like:

(Sorry, I don't do Windoze.)

This shows the current locale settings:
<screen>
bash> locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE=POSIX
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
bash>
</screen>

To get these options I went into:
YaST2 -> System -> Sysconfig Editor -> Base-Administration -> Localization ->
rc_lang
and set the RC_LANG field to "en_US.UTF-8".
I probably also made a change in my KDE preferences, but I don't recall what
that might have been right now.

This is where the files defining en_US.UTF-8 live:
<screen>
bash> locate en_US.UTF-8
/usr/X11R6/lib/X11/locale/en_US.UTF-8
/usr/X11R6/lib/X11/locale/en_US.UTF-8/Compose
/usr/X11R6/lib/X11/locale/en_US.UTF-8/XI18N_OBJS
/usr/X11R6/lib/X11/locale/en_US.UTF-8/XLC_LOCALE
bash>
/usr/X11R6/lib/X11/locale/en_US.UTF-8/Compose
</screen>

This is how to learn the weird key strokes necessary for typeing the special
characters. (Note: I have not learned all about "dead keys" yet, but I think
these are the keys that M$ users call the 'Win key', etc.)
View the "Compose" file:
<screen>
less /usr/X11R6/lib/X11/locale/en_US.UTF-8/Compose [enter]
#
# $XFree86: xc/nls/Compose/en_US.UTF-8,v 1.1 2001/11/02 23:29:28 dawes Exp $
#
# latin alphabet
#
# abovedot
<dead_abovedot> <b> : "ḃ" U1e03
<dead_abovedot> <B> : "Ḃ" U1e02
<dead_abovedot> <c> : "ċ" cabovedot
<dead_abovedot> <C> : "Ċ" Cabovedot
...
...
<Multi_key> <Cyrillic_zhe> <comma> : "җ" U0497
<Multi_key> <Cyrillic_ZHE> <comma> : "Җ" U0496
<Multi_key> <U04af> <minus> : "ұ" U04b1
<Multi_key> <U04ae> <minus> : "Ұ" U04b0
</screen>

If you want to type an "eth" locate the line in the "Compose" file defining
that key sequence:
<screen>
bash> grep eth /usr/X11R6/lib/X11/locale/en_US.UTF-8/Compose
<Multi_key> <d> <h> : "ð" eth
bash>
</screen>
On my box, this means hit the _right_ 'Win key'(don't hold it down) and then
hit 'd' followed by 'h'.

This is the home page of the Unicode Consortium:
http://www.unicode.org/

The 'O' with a tail is Ox01EA, and the lower case is Ox01EB. I don't know how
to type explicit escaped hex values as characters yet. This is where the
character is defined:
http://www.unicode.org/charts/PDF/U0180.pdf

Here are some other interesting character sets:

Runic:
http://www.unicode.org/charts/PDF/U16A0.pdf

Gothic:
http://www.unicode.org/charts/PDF/U10330.pdf

International Phonetic Alphabet (IPA):
http://www.unicode.org/charts/PDF/U0250.pdf

Steven

---------- Forwarded Message ----------

Return-Path: <suse-linux-e-return-115281-hattons=speakeasy.net@...>
Delivered-To: hattons@...
Received: (qmail 2095 invoked from network); 17 Sep 2002 12:51:59 -0000

[snip the header junk]

Subject: [SLE] Re: UTF8/16 is GOOD!
Christopher Mahmood <ckm@...> writes:
> * Steven T. Hatton (hattons@...) [020913 20:44]:
>> Unfortunately, I wouldn't even know how to begin switching to UTF8 for
>> everyting. Is Linux going in that direction at all? I never really
>> thought about this until I started working with a bunch of different XML
>> tools. What do others think about this?
>
> We have a multilingualization devoted to this topic that you might
> interested in--m17n@.... Send an email m17n-subscribe@...

Yes, please subscribe to m17n@..., this is the best place to
discuss UTF-8.

--
Mike Fabian <mfabian@...> http://www.suse.de/~mfabian
睡眠不足はいい仕事の敵だ。