RE: The Apostrophe, etc.

From: Phra Noah Yuttadhammo
Message: 1294
Date: 2005-09-19

Dear Friends,


-----------quote---------------
EM wrote:

Changing "... a.m u ..." into "... mu ..." in various unicode scripts would
be another little challenge.  Yes, a talented computer programmer (working
with a patient Palicist) can come up with a definitive list of "if/then"
variable to solve the program --however, getting two such people to work
together is a rare event.  Nobody has done it yet; thus, it is not as easy
as you may suppose.  I would encourage you to experiment with the available
tools for wrangling Romanized text into Unicode Asian scripts --there are
currently many difficulties.  All of these can be solved --but they haven't
been, yet.

-----------endquote------------

Please see the following forum topic for an example of how we really can
find Palicists and programmers working together (I suppose I could be
considered both now :) ). 

http://www.lioncity.net/buddhism/index.php?showtopic=18767&st=0&p=262122

The kind of thing you are talking about we are already doing in javascript
with an off-line version of the PTS's PED and extra grammatical tools.  I
don't see what would be difficult about what you are saying.  If i.m and a.m
are single characters in a certain script, all you do is run a search for
i.m and a.m in the manuscripts like this:

var a = 'buddha.m di.t.thi.m'
a = a.replace(/a.m/g, '&#xxxx;');
a = a.replace(/i.m/g, '&#yyyy;');
(etc.)

Where xxxx is the unicode number for a.m in the particular script, and yyyy
is the one for i.m.  And do another one for .t and so on.  You can look at
my code if you download the PEDShell - I knew nothing about javascript a
month ago.

Best wishes,

Yuttadhammo



-----Original Message-----
From: palistudy@yahoogroups.com [mailto:palistudy@yahoogroups.com] On Behalf
Of navako
Sent: Monday, September 19, 2005 1:33 PM
To: palistudy@yahoogroups.com
Subject: [palistudy] The Apostrophe, etc.


Although I agree that this is not an insoluble problem, it is more complex
than this, Rett:

>> how do you come up with a program that can interpret and correctly
>>remove the apostrophe in PTS-style Pali?
>
> It's very simple, I believe. The program,
>
> 1) deletes the apostrophe
> 2) deletes any intervening space between the apostrophe and the
immediately preceding or following word.

Firstly, you notice that there is already an "or" in your proposed solution.

You would have to create a program that can judge (based on the context of
the apostrophe) how to truncate the adjacent words, and how to combine which
vowel with which consonant.  Unicode Indic scripts do not have separate,
standing vowels (as you know), but involve an annoying sequence of
codepoints along the line of "+ Hal Akuru + vowel + space" etc.  This
problem can be solved, but it would take a talented programmer to make sure
that it didn't produce errors in the way that a simple "find + replace"
certainly would (in parsing the entire tipitaka).

Changing "... a.m u ..." into "... mu ..." in various unicode scripts would
be another little challenge.  Yes, a talented computer programmer (working
with a patient Palicist) can come up with a definitive list of "if/then"
variable to solve the program --however, getting two such people to work
together is a rare event.  Nobody has done it yet; thus, it is not as easy
as you may suppose.  I would encourage you to experiment with the available
tools for wrangling Romanized text into Unicode Asian scripts --there are
currently many difficulties.  All of these can be solved --but they haven't
been, yet.

> For example: PTS gacchaam' aha.m >  gacchaamaha.m

You see, in this example the program would actually completely delete the
vowel "a" in converting to some scripts (e.g. Sinhalese), would leave it in
place in Roman, and would have other behaviours for other vowels in other
contexts.

To use an example of my own: you might consider that the velar 'n is
sometimes a superscript in Burmese (sometimes not), and so "decoding" PTS
Roman text into sequences of consonants in which the velar 'n combines with
the correct consonant "as a vowel" could be tricky.  Complex, compounded
sequences of consonants in different scripts relate to Romanized text in
different ways (e.g., there is no superscipt velar 'n in Sinhalese).

E.M.


--
A saying of the Buddha from http://metta.lk/ View Streaming Dhamma Video
http://dharmavahini.tv/ The Bhikkhu who has retired to a lonely abode, who
has calmed his mind, who perceives the doctrine clearly, experiences a joy
transcdending that of men.
Random Dhammapada Verse 373


------------------------ Yahoo! Groups Sponsor --------------------~--> Get
fast access to your favorite Yahoo! Groups. Make Yahoo! your home page
http://us.click.yahoo.com/dpRU5A/wUILAA/yQLSAA/GP4qlB/TM
--------------------------------------------------------------------~->


Yahoo! Groups Links










Previous in thread: 1293
Next in thread: 1295
Previous message: 1293
Next message: 1295

Contemporaneous posts     Posts in thread     all posts