Re: Theory of transliteration?

--- In qalam@yahoogroups.com, "Patrick Hall" <pathall@...> wrote:

> I have been working on and off on some Javascript typing aids for
> typing in various scripts through the browser (especially when the
> user is on a public terminal, etc). Essentially, it applies a series
> of string substitutions;...

> For instance, in
> some case I had to use rules that included characters from the target
> script on the left hand side of the rule:
>
> "q":"á",
> "áe":"á",

> But after a period of testing I became aware of a problem; there was a
> particular Ethiopic character that my rules could never output. Such
> problems are difficult to debug when dealing with a complex script.

I'm sorry a general discussion of transliteration has created more
heat than light. However, I do have two observations that may be
helpful for your particular issue

1) In general transliteration, you may need a disjunctor character
that distinguishes a digraph from an accidental combination. A common
example is digraphs in 'h' and sequences of consonant followed by 'h'.
If you have a disjunctor in your input scheme, for which the sole
transform is its final elimination as the last substitution, and your
substitutions are not context sensitive, then it should be easy to
check that you can generate all possible strings in the target script.

2) The outer loop of your processing seems to be over substitution
rules. If, however, your outer loop was over characters, and you
ordered the rules so that the rules with longest input were considered
first, then it would again be relatively easy to check that you could
generate all appropriate sequences in the target script. At its
simplest, you check that none of your encodings for characters is a
substring of a longer substituted substring or is the continuation of
the end of one of the longer strings. If your scheme fails this
check, then proving completeness may get complicated.

My Javascript input pages work in the second way, but that is because
I translate characters as they are input. This effectively results in
dead keys when sequences are being processed. I like to process
characters as they are input as this allows one to mix scripts before
cutting and pasting to the target.

Note that the specific example I gave of 'blao' and 'be:la:'
representing the same Thai text is not relevant to your way of
working. You are working with transliterations for people who are
know the target script, so an input like 'eblA' would probably be
appropriate for your application, and might well be what one would
type using a real 'phonetic' keyboard.

Richard.