That sounds really interesting. Thanks alot for the link.
Patrick Hall <
pathall@...> wrote: Hello,
I have been working on and off on some Javascript typing aids for
typing in various scripts through the browser (especially when the
user is on a public terminal, etc). Essentially, it applies a series
of string substitutions; here's an excerpt from the rules for
Esperanto (chosen because it's quite simple):
'cx' : 'Ä',
'gx' : 'Ä',
'hx' : 'Ä¥',
'jx' : 'ĵ',
So that works fine: all strings on the left hand side are converted to
corresponding strings on the right hand side (in order).
But as I'm sure many people here are aware, it's not always that
simple. As a test case for a more significant script, I tried building
an input method for Amharic, which is of course far more complex than
the short list of accented characters in Esperanto. For instance, in
some case I had to use rules that included characters from the target
script on the left hand side of the rule:
"q":"á
",
"á
e":"á",
"á
u":"á",
"á
i":"á",
"á
a":"á",
With the help of Daniel Yacob I ended up with a fairly workable system.
But after a period of testing I became aware of a problem; there was a
particular Ethiopic character that my rules could never output. Such
problems are difficult to debug when dealing with a complex script.
So here's my question: is there any work out there on a formalization
of transliteration? How can one define a transliteration system in a
such a way that it's provably complete?
So far digging around has turned up a few good resources, particularly
Bill Poser's documentation for his Xlit package:
http://billposer.org/Software/XlitManual/Manual.html
It seems to me that if it were possible to write a "validator" for a
transliteration system, many existing schemes and scripts could
quickly be converted to a common format, which would help to reduce
duplicated effort.
Best regards,
-- Patrick Hall
Blogamundo
---------------------------------
Sponsored Link
Don't quit your job - take classes online and earn your degree in 1 year. Start Today
[Non-text portions of this message have been removed]