On Wed, 22 Jun 2011 08:11:43 -0400, Mark E. Shoulson <mark@...> wrote:

> Huh. I sent this using the extended Arabic-Indic digit characters, but
> it looks like Yahoo groups' machinery somewhere along the line
> translated them back to ordinary ASCII digits. Not really right to
> refer to them as "non-text portions of this message", but maybe that's
> what that means...? Go figure.

Uh-oh. I18n is still quite unfinished. (One correspondent, until recently,
asked for ASCII only in e-mail. Mojibake is not totally restricted to
Japanese, either.)* I just posted a long semi-tutorial to another
Yahoogroup with specimens of both [Iranian] and [Arabian] numerals. Lord
only know how they will become mangled. Subscribers to that list might
well have ancient mail software, as well, or lack fonts. (Late thought:
The way is now clear for Unicode in URLs. I hope not, but we might be in
for a real mess, not to ignore creative fraud.)

*Years ago, I was discussing i18n in a mailing list. Somebody had used
Microsoft "smart quotes", and UTF-8 (I guess) had created brief mojibake
out of them. Each time the text with the corrupted punctuation was copied
in subsequent messages, the number of mojibake characters created by the
"smart quotes" doubled! I think the runaway stopped at 16 characters per
quote; end of thread, or somebody substituted ASCII.

I use Gucharmap (not by choice, but afaik it's the only comprehensive
Unicode-access application (GNOME) in openSUSE.)
I was able to transcribe the [Arabic] digits without trouble, but the
[Iranian] digits invoked the BiDi algorithm (! Arrgghh!)
When I copied them [en bloc](?) into the message text, their order was
reversed!

I'm not accustomed whatsoever to using the BiDi algorithm, and, while I do
know what's going on, it can be horribly confusing if one is changing
between directions, especially unexpectedly, where numbers simply must be
rendered (and entered) LtoR. I'm not guessing where/why BiDi popped up
when trying to transcribe the [Iranian] numerals. Surely, when only
numerals are involved, it seems to me that to invoke RtoL behavior is
simply wrong.

(Of course, I'm resorting to inaccurate and unofficial, if concise, ad hoc
usage to designate the two different numeral sets. I know that.)

Best regards,

--
Nicholas Bodley _.=|*|=._ Waltham, Mass.
Sign of a new trend? Looks really hopeful:
<www.fairindigo.com> Apparel made by
decently-paid workers, fair traded.