The Apostrophe, etc.

From: navako
Message: 1291
Date: 2005-09-19


Although I agree that this is not an insoluble problem, it is more complex
than this, Rett:

>> how do you come up with a program that can interpret and
>>correctly remove the apostrophe in PTS-style Pali?
>
> It's very simple, I believe. The program,
>
> 1) deletes the apostrophe
> 2) deletes any intervening space between the apostrophe and the immediately preceding or following word.

Firstly, you notice that there is already an "or" in your proposed solution.
You would have to create a program that can judge (based on the context of
the apostrophe) how to truncate the adjacent words, and how to combine which
vowel with which consonant.  Unicode Indic scripts do not have separate,
standing vowels (as you know), but involve an annoying sequence of
codepoints along the line of "+ Hal Akuru + vowel + space" etc.  This
problem can be solved, but it would take a talented programmer to make sure
that it didn't produce errors in the way that a simple "find + replace"
certainly would (in parsing the entire tipitaka).

Changing "... a.m u ..." into "... mu ..." in various unicode scripts would
be another little challenge.  Yes, a talented computer programmer (working
with a patient Palicist) can come up with a definitive list of "if/then"
variable to solve the program --however, getting two such people to work
together is a rare event.  Nobody has done it yet; thus, it is not as easy
as you may suppose.  I would encourage you to experiment with the available
tools for wrangling Romanized text into Unicode Asian scripts --there are
currently many difficulties.  All of these can be solved --but they haven't
been, yet.

> For example: PTS gacchaam' aha.m >  gacchaamaha.m

You see, in this example the program would actually completely delete the
vowel "a" in converting to some scripts (e.g. Sinhalese), would leave it in
place in Roman, and would have other behaviours for other vowels in other
contexts.

To use an example of my own: you might consider that the velar 'n is
sometimes a superscript in Burmese (sometimes not), and so "decoding" PTS
Roman text into sequences of consonants in which the velar 'n combines with
the correct consonant "as a vowel" could be tricky.  Complex, compounded
sequences of consonants in different scripts relate to Romanized text in
different ways (e.g., there is no superscipt velar 'n in Sinhalese).

E.M.


--
A saying of the Buddha from http://metta.lk/
View Streaming Dhamma Video http://dharmavahini.tv/
The Bhikkhu who has retired to a lonely abode, who has calmed his mind, who
perceives the doctrine clearly, experiences a joy transcdending that of men.
Random Dhammapada Verse 373

Previous in thread: 1290
Next in thread: 1292
Previous message: 1290
Next message: 1292

Contemporaneous posts     Posts in thread     all posts