{This message should come to you encoded in ISO 8859-15}

Summary: Being strict about character encoding in Qalam messages.

On Tue, 22 Mar 2005 23:22:22 -0500, Peter T. Daniels
<grammatim@...> wrote:

>
> Richard Wordingham wrote:

>> Amateur linguist at work! Linguists use different brackets to
>> indicate at what level they are talking - / / for phonemes (the
>> underlying contrasting units in speech), [ ] for phones (the actual
>> sounds that emerge), < > for spelling, // // for archiphonemes
>> (phonemes except that the context denies the possibility of saying
>> precisely which one, as in the lack of contrast between /s//d/ and
>> /s//t/ at the start of an English word) and /// /// for morphemes
>> (elemental bits of word with meaning, as in <meaning> = ///mean///
>> ///ing///). The brackets may be omitted if the context does not
>> require it.
>
> Respectively / /, [ ], < > (better Ð ð), | |, { } in traditional
> notation.

That helps, and also mystifies.[1]

Something looks weird, here. Peter (Daniels') message was sent, according
to its full header, encoded in Latin-1, but when I read it, I saw
"(better Ð ð)" (quotes added here, of course), which looks improbable.
What I see after "better" is a capital eth (decimal 208), followed by a
space and a small eth (Decimal 240). Surely, those are not delimiters?

OK: by any chance, were « and » meant?

If, perhaps, single guillemets (139 decimal and 155) were meant, please
send in utf-8!

Those numerical codes, 139 and 155, are specific to Windows-1252 (and
probably several other Windows 125x encodings). Unicode 3.0 says that 139
(U+008b) is ISO 6429 "partial line down" (my quotes), and, wonder of
wonders, 155 (U+009B) (!) is the famous "control sequence introducer" (!),
beloved by printer-control-code hackers of the past. My recollection is
that it was often entered from the keyboard as two sequential keystrokes,
<Esc>, then [ .

<mode chat>
Until now, I had thought that Microsoft had avoided redefining ISO 6429 C1
control codes that were likely to be commonly used. I dare say that CSI,
U+009B, might be the most-commonly-used '6429 C1 control character. Having
two entirely-different interpretations, one printing and the other
non-printing, must have been a trap for the unwary more than rarely. I
still own a dual-encoding printer (Seikosha MP-1300AI)that works with both
Codepage 437 and Latin-1, selected somehow; have forgotten details. It has
a third mode that does a hex dump of all data sent to it. Ribbons are
probably impossible to find. </kitty mode>

(Btw, I'm using "delimiter" in the sense of computer programming and
engineering notation; I hope and trust it has the same meaning to most
Qalamites.)

I can't easily fail to point out that the Opera browser's e-mail composer,
while just barely adequate (I'm being a bit kind, there) from the text
editor standpoint, has been internationalized really nicely, at least in
part. When I tried to enter codepoint 139/U+008B, the Opera composer
politely gave a short explanation, and offered to change my encoding to
utf-8.

(I had been using utf-8 as a default, but one correspondent I send to
fairly often, via a YahooGroups list, was receiving messages through old
software somewhere along the line that was transforming every line ending
into (iirc) "=20".) I'll think about adding a polite suggestion to my .sig
about upgrading...)

[1] For what it's worth, not a lot, I use [ ] to define search-engine
strings in messages (easier than saying "without quotes", the more-common
case); also occasionally for other delimiters, in preference to " "
followed by a disclaimer. I use { } for my in-line brief editorial
comments, because those are quite unlikely to be found in documents I
normally deal with. Then, of course, I use < > to delimit URLs/URIs.

My best to all,

--
Nicholas Bodley /*|*\ Waltham, Mass.
The curious hermit -- autodidact and polymath
who has three extra fingers on each hand
(Great for playing lutes and harps!)