Re: Automatic clustering of languages

From: Richard Wordingham
Message: 48204
Date: 2007-04-02

--- In cybalist@yahoogroups.com, "Daniel J. Milton" <dmilt1896@...> wrote:

> OK, but accepting the words just as they find them, I still don't
> see how they got their results.
> The first cluster in table B is Maori-Persian-Finnish. The
words are:
> MAORI PERSIAN FINNISH
> katoa hame kaikki
> kino bad huono
> hoopara shekam vatsa
> hiwahiwa siah mutsa
> iwi ostokhan luu
> maevao ruz paiva
> hemo mordan varjata
> inu nushidan juoda
>
> That's only half the list, but the other eight don't look any better.
> Would someone explain why their computer finds this a cluster under
> its insertion-deletion rules?

Persian doesn't actually cluster with 'Finno-Maori' in the other
metrics, so I'll just point out the commonalities (in upper case):

Maori Finnish distance (insertion/deletion)
KAtoa KAikki 7
kiNO huoNO 5
hoopArA vAtsA 8
hiwAhiwa mutsA 11
iwi luu 6 - This is a good match!!
mAeVAo pAiVA 5
hemo varjata 11
inU jUoda 6
poKORAringa KORvA 8
haupA syodA 8
heeki muna 9
kaIkA sIlmA 6
paaparA isA 8
iKA KAla 3
rIma vIisi 7
wAe jAlka 6

I thought the key to the Finno-Maori cluster was that the words are
vowel rich and there is only a limited range of frequent consonants.
However, the above gives a distance of 114, whereas the distance
between Persian and 'Sanskrit' is 100. This certainly merits
investigation. I wonder if it is an effect of the clustering algorithm.

Richard.