--
Peter T. Daniels grammatim@...



----- Original Message ----
From: Richard Wordingham <richard@...>
To: qalam@yahoogroups.com
Sent: Saturday, November 25, 2006 5:07:36 PM
Subject: Re: Theory of transliteration?

--- In qalam@... com, "Peter T. Daniels" <grammatim@. ..> wrote:

> (preposing chevrons requires inserting hard line breaks as well)

Matter of taste. I'm happy if the paragraphs are left intact. A lot
depends on the mailer.

I've also included my reply to
http://tech. groups.yahoo. com/group/ qalam/message/ 6695 here.

> Peter T. Daniels grammatim@.. .

>> From: Richard Wordingham <richard@... >

>> --- In qalam@... com, "Peter T. Daniels" <grammatim@ ..>
wrote:

>> It did raise one interesting question. How are casing scripts
>> transliterated into non-casing scripts?

> MAYBE WITH A NOTE THAT SENTENCES AND PROPER NAMES BEGIN WITH A
CAPITAL? FOR ENGLISH, PROPER ADJECTIVES ALSO, FOR GERMAN COMMON NOUNS ALSO

That doesn't enable you to recover the original capitalisation of
'diesel' or 'doppler'. There are also ambiguous cases like 'the
Queen' and the third person pronouns for the divinity.

THEN YOU'LL HAVE TO COME UP WITH SOME DIACRITIC WHEN YOU TRANSLITERATE THE ROMAN ALPHABET INTO THAI.

> WHY WOULD YOU WANT TO EXCLUDE THE WORLD'S HANDFUL OF SAMARITANS FROM
READING SOMETHING IN ENGLISH? SAMARITAN IS MERELY A GRAPHIC VARIANT OF
HEBREW, SO THE CONVENTIONS ALREADY EXIST.

I was excluding freemasons who are native English speakers.

EH? WHAT'S FREEMASONRY GOT TO DO WITH SAMARITAN? THEIR LITTLE "SECRET ALPHABET" DOESN'T LOOK LIKE SAMARITAN HEBREW/ARAMAIC.

> A _TRANSCRIPTION_ TELLS YOU EXACTLY HOW TO PRONOUNCE IT.

Is there a technical term (as opposed to a derogatory term) for
something that resembles a transcription but drops tone marks, vowel
length, and merges a pair of vowels and a pair of consonants?

ASCII?

> A _TRANSLITERATION_ TELLS YOU EXACTLY HOW TO SPELL IT.

>> > A transliteration is a 1-to-1 correspondence between the characters
>> > of one script and the characters of another script.

>> I thought the key feature of a transcription was that it was
>> reversible, i.e. you could get back to the original. (I would allow
>> tagging used for 'context-sensitive' information to be lost.)

> ABSOLUTELY NOT.

> A TRANSLITERATION, HOWEVER, IS PERFECTLY REVERSIBLE.

Agreed. I accidentally wrongly wrote 'transcription' for
'transliteration' .

[I REPLIED FROM THE BOTTOM UP, SO AT THE BOTTOM THERE'S A NOTE THAT I DELETED ALL YOUR THAI & KHMER NOTES BECAUSE I DON'T KNOW ANYTHING ABOUT THEM (with a suggestion).

>> a 1-to-1 correspondence of characters raises the question of what you
>> do with preposed vowels and multi-part vowels in Indic scripts. There

> IN INDIC, THERE IS ONLY ONE POSSIBLE PLACEMENT OF EACH VOWEL MARK,...

Thank you for that clarification. I'd been wondering whether CVC and
CCV aksharas could be written differently when the vowel went above

NO SUCH THING AS A CVC AKSHARA. AKSHARAS DON'T CORRESPOND TO SYLLABLES, JUST TO SEQUENCES OF CONSONANTS.

and the second consonant ascended to the baseline from which the
consonants hang. I know of a Khmer font where they are displayed
differently, but I wasn't sure whether that was a design flaw.

I have seen a difference in anusvara placement in a CVCV akshara.

A FORTIORI, NO SUCH THING AS A CVCV AKSHARA.

I've taken this as unwelcome evidence that in that writing style the
sole such combination of anusvara and final vowel was a vowel symbol
in its own right.

> ... EVEN THOUGH THE VOWEL DESIGNATED BY SUCH A VOWEL MARK ALWAYS
GOES AFTER THE LAST CONSONANT IN ITS GROUP. HENCE NO AMBIGUIITY IN
EITHER DIRECTION.

The problem most clearly lies in the Indic 'o' vowel. In the Bengali
script, South Indian scripts and many SE Asian scripts, it is composed
of the preposed symbol for 'e' and the postposed symbol for 'a:'
(basically the length mark). An extra mark above, variously
interpreted, turns it into the 'au' vowel. For the two-part compound
vowel, do we have one symbol or two symbols? Unicode gives different
answers for different scripts:

HISTORICALLY, THE <O> IS AN <E> PLUS AN <A>, BUT SYNCHRONICALLY IT ISN'T. THE HISTORY ISN'T INTERESTING HERE.IT REPRESENTS NOTHING OTHER THAN AN /O/ PRONOUNCED AFTER THE CONSONANT(S) IT SURROUNDS.

one character: Khmer, Limbu
two characters: Myanmar, Thai, (Lao)
up to you: Bengali, Kannada, Malayalam, Oriya, Sinhala, Balinese

I've bracketed Lao because that only has a three part symbol.

<...>

However, are you prohibiting the reordering of symobls? That's a bit
difficult when some scripts advance characters in two directions,
possibly sometimes even with branching.

NO; Y0U MENTIONED THE INDIC VOWELS THAT CAN GO ON ALL FOUR SIDES BUT ARE TRANSLITERATED FOLLOWING THE CONSONANTS.
>
>> For English a transliteration might choose
>> to differentiate homgraphs such as 'sow', 'lead' and 'read'. A
>
> THEN IT'S NOT A TRANSLITERATION.

Do you have a name for it?

i DON'T ACTUALLY KNOW WHAT YOU'RE PROPOSING!

>> secondary point is that it does allow one to reject sequences of
>> characters that do not appear in any way to be English.

I'M CUTTING OUT ALL YOUR THAI AND KHMER NOTES BECAUSE THEY DON'T MEAN ANYTHING TO ME. HAVE YOU TRIED RICHARD SPROAT, WHO DEALT WITH INDIC IN THE ANTWERP WORKSHOP JUST PUBLISHED IN WLL 9/1?

For your Chinese example, do you mean more than the pinyin
transcription? You would also need a disambiguating number for
homophones. Are there not variant readings even for Chinese? Would
you select one 'arbitrarily' ?

TRUE. CHINESE GETS TRANSCRIBED RATHER THAN TRANSLITERATED. BUT SINCE THE TRANSCRIPTION CAN BE CONVERTED INTO CHARACTERS WITH ALMOST NO DOUBTFUL CASES, A SMALL AMOUNT OF CONTEXT SUFFICES TO DISAMBIGUATE IDENTICAL SYLLABLES.

Waht would be the Japanese analogue? Many (most?) kanji have variant
readings.

WE DON'T OFTEN SEE TRANSLITERATED JAPANESE, DO WE!