Re: Automatic clustering of languages

From: mkelkar2003
Message: 48171
Date: 2007-04-02

--- In cybalist@yahoogroups.com, "Richard Wordingham" <richard@...> wrote:
>
> --- In cybalist@yahoogroups.com, "mkelkar2003" <swatimkelkar@> wrote:
>
> > --- In cybalist@yahoogroups.com, "Richard Wordingham" <richard@>
wrote:
>
> > They claim that the results are similar to the more well know study by
> > Kruskal, Dyen and Black.
> >
> > "We can mention that clusters we found with cluster analysis are very
> > close to the
> > language families established in linguistics (Kruskal, Dyen, and Black
> > 1971)."
>
> The clusters they found were Slavic, Germanic, Romance and Indic.
> They say nothing about the clusters they didn't find.

Perhaps because those clusters are so weird.

They should
> have been disappointed by the failure to cluster Arabic and Maltese.
> That puts the failure to pick up Indo-Iranian into context. Note also
> that their performance with Celtic (Welsh and Irish) is not good.
>
> > Tamil and Kannada are clustering with Hindi and Sanskrit! (rather
> > than Persian)
>
> Take another look! Look at the trees, not just at the orders in which
> the languages are listed in the results. The Dravidian languages are
> in the 'others' group.

Yes, but the 'others' group splits into 'Dravidian' and then after
another split into Greek and Lithuanian/Lativian the, latter being
very close to Sanskrit. Agreed that Sanskrit coding is not that good.
They should have used 'pita' instead of 'baap.'


With the exception of the anomalous behaviour
> of Telugu under the insertion/deletion/substitution metric, the
> Dravidian languages, Tamil, Kannada, Telugu and Malayalam, form a
> subcluster within the 'others' group.
>
> Incidentally, I'm not sure that it is reasonable to say that Sanskrit
> was included in their list. Their Sanskrit list has several Hindi
> forms in it, which results in the similarity of Hindi and Sanskrit
> being overstated.
>
> Did you notice that English comes out as a North Germanic language?
>
> Richard.
>