Re: UTF-8 vs. ISO-8859-1(5) ?

On Thursday 20 February 2003 07:45 pm, Arlie Stephens wrote:

> On Thu, Feb 20, 2003 at 06:21:20PM -0500, Steven T. Hatton wrote:
> > I've been informed that others may have difficulty reading my posts
> > because I've been using UTF-8 encoding. Is this true?
>
> Some of your posts arrive as illegible gooblety gook on my linux system.

If you open an xterm and execute the `locale` command, you should get a
listing of your current settings. I'll bet good money you will see posix for
everything.

> > I don't do Windoze so I really don't understand why people would would
> > not be using, or atleast be able to use UTF-8, and switch between these
> > encodings as required.
>
> I have no idea what I would need to do to understand UTF-8. Given the
> number of people who apparantly can't read it, I wouldn't want to do
> anything that might result in my system sending out messages in that form.

> I'm running redhat linux 7.1, using mutt as my mail client, with North
> American settings.

My guess is, it's not that hard. If I get a message that I can't read due to
encoding, I go up to the "view|set encoding" and select what I think they
would be using. Mutt probably has a command for that. I use KMail.

> Why is UTF-8 better than whatever I'm getting by default? (One of the ISO
> standards; I'm having trouble remembering the number right now, but it's
> probably the ISO-8859-1 that you mention below.)

For one, if you use UTF-8, it doesn't matter what language, or alphabet you
are reading, you don't need to switch between encoding systems. It's hard to
explain what that is so important. It just makes life much easier for doing
language processing. Here's an example:

http://www.hclrss.demon.co.uk/unicode/runic.html

Here's another example:

http://titus.fkidg1.uni-frankfurt.de/texte/etcd/ind/aind/ved/rv/mt/rv.htm

The part I'm not really understanding is why so many clients don't convert
seamlessly between encodings. I rarely have a problem with KMail.

I fully understand not wanting to mess with this stuff. I hate it! That's
why I want everything in UTF-8. That way, my programs don't have to guess
what kind of encoding to use. The problem for me is that XML and Java are by
default UTF-8, and when I use a different encoding, I have to specify that
somehow in the code. If I don't realize the file is in the wrong encoding, I
can spend hours debugging my parsers.

The reason others, who are programmers might want to use UTF-8 is because it
supports the future potential to transcribe things such as the
Neibelungenlied into a natural looking text to be desplayed online. It's
probably worth it for people who plan on working in archaic languages to gain
some understanding of this issue.

'Nuf said, I'll try to remember to send my ON text in ISO-8859-1.

--
STH
Hatton's Law: There is only One inviolable Law.