Re: Automatic clustering of languages

From: Richard Wordingham
Message: 48157
Date: 2007-04-01

--- In cybalist@yahoogroups.com, "mkelkar2003" <swatimkelkar@...> wrote:
>
>
http://delivery.acm.org/10.1145/150000/146390/p339-batagelj.pdf?key1=146390&key2=6896835711&coll=&dl=ACM&CFID=15151515&CFTOKEN=6184618
>
> This is an unusal study in terms of the selctionof langauges. The
> position of Persian and Albanian are notable.

Ah! The Sino-Celto-Albanian group!

> Persian is no where
> close to Indic languages.

The alogrithms are too crude and the data too limited to pick out the
remoter connections. It didn't pick up Indo-European in general or
Austronesian. (It picked up Malay and Indonesian, but they're the
same language.)

The data, of course, is a mess. There are some very odd 'Sanskrit'
forms. There may also be some transcription effects - I was surprised
that Tunisian Arabic and Hebrew came out close than Maltese, which
originated from Arabic. I suspect this may be due to ayin being
transliterated as zero for Arabic and Hebrew, as opposed to 'gtr' for
Maltese. ('tr' is there ASCIIfication of Maltese 'h with stroke' -
Planck's constant divided by two pi for many of us.)

Richard.