--- In qalam@yahoogroups.com, "Nicholas Bodley" <nbodley@...> wrote:
> On Wed, 22 Jun 2011 08:11:43 -0400, Mark E. Shoulson <mark@...> wrote:

> Uh-oh. I18n is still quite unfinished. (One correspondent, until recently,
> asked for ASCII only in e-mail. Mojibake is not totally restricted to
> Japanese, either.)* I just posted a long semi-tutorial to another
> Yahoogroup with specimens of both [Iranian] and [Arabian] numerals. Lord
> only know how they will become mangled. Subscribers to that list might
> well have ancient mail software, as well, or lack fonts. (Late thought:
> The way is now clear for Unicode in URLs. I hope not, but we might be in
> for a real mess, not to ignore creative fraud.)

> *Years ago, I was discussing i18n in a mailing list. Somebody had used
> Microsoft "smart quotes", and UTF-8 (I guess) had created brief mojibake
> out of them.

Sounds like Yahoo assuming text is Windows-1252 and destructively straightening the quotes.

> I use Gucharmap (not by choice, but afaik it's the only comprehensive
> Unicode-access application (GNOME) in openSUSE.)
> I was able to transcribe the [Arabic] digits without trouble, but the
> [Iranian] digits invoked the BiDi algorithm (! Arrgghh!)
> When I copied them [en bloc](?) into the message text, their order was
> reversed!

They seem to work fine using LibreOffice (recently forked from OpenOffice by distrusters of Oracle) and Firefox. Incidentally LibreOffice (and so I presume OpenOffice) can insert any character you have in some font - unlike the Windows XP character map, which only accepted characters it thought were in Unicode.

There is one funny, though. The 'extended' Arabic-Indic digits have a BiDi classification of 'European Numerals', whereas the plain Arabic-Indic digits have the BiDi classification of 'Arabic Numerals'. The only case where I think it should make a difference is if a higher-level protocol makes right-to-left the default paragraph 'level'. Then $ followed by digits should see $ on the right for plain Arabic-Indic and on the left for extended Arabic-Indic. At least, that's what I calculate and what LibreOffice does.

> I'm not guessing where/why BiDi popped up
> when trying to transcribe the [Iranian] numerals. Surely, when only
> numerals are involved, it seems to me that to invoke RtoL behavior is
> simply wrong.

See above for bizarre effects. I hope the BiDi rules can be made more comprehensible - the current wording seems Byzantine.

Richard.