Re: Automatic clustering of languages

From: mkelkar2003
Message: 48167
Date: 2007-04-01

--- In cybalist@yahoogroups.com, "Richard Wordingham" <richard@...> wrote:
>
> --- In cybalist@yahoogroups.com, "mkelkar2003" <swatimkelkar@> wrote:
> >
> >
>
http://delivery.acm.org/10.1145/150000/146390/p339-batagelj.pdf?key1=146390&key2=6896835711&coll=&dl=ACM&CFID=15151515&CFTOKEN=6184618
> >
> > This is an unusal study in terms of the selctionof langauges. The
> > position of Persian and Albanian are notable.
>
> Ah! The Sino-Celto-Albanian group!
>
> > Persian is no where
> > close to Indic languages.
>
> The alogrithms are too crude and the data too limited to pick out the
> remoter connections. It didn't pick up Indo-European in general or
> Austronesian. (It picked up Malay and Indonesian, but they're the
> same language.)
>
> The data, of course, is a mess. There are some very odd 'Sanskrit'
> forms. There may also be some transcription effects - I was surprised
> that Tunisian Arabic and Hebrew came out close than Maltese, which
> originated from Arabic. I suspect this may be due to ayin being
> transliterated as zero for Arabic and Hebrew, as opposed to 'gtr' for
> Maltese. ('tr' is there ASCIIfication of Maltese 'h with stroke' -
> Planck's constant divided by two pi for many of us.)
>
> Richard.


They claim that the results are similar to the more well know study by
Kruskal, Dyen and Black.

"We can mention that clusters we found with cluster analysis are very
close to the
language families established in linguistics (Kruskal, Dyen, and Black
1971)."

Tamil and Kannada are clustering with Hindi and Sanskrit! (rather
than Persian)

M. Kelkar