Re: Automatic clustering of languages

--- In cybalist@yahoogroups.com, "Francesco Brighenti" <frabrig@...>
wrote:

> "BAD"
> Bengali/Oriya <kharap>, Hindi/Rajasthani <kharab>, are Perso-Arabic
> loans; why do the authors of the study regard them as native Indo-
> Aryan words?

If you are just doing similarity comparisons on languages, there is no
justification for excluding loan words. The question is rather
whether they are now the 'typical' words for the meaning.

> "BLACK"
> Sanskrit <ka:la> 'black' is attested much later that <kr.s.n.a>,
> which also means black; had the authors chosen the older term, they
> would have seen that it nicely clusters with its Slavic cognates
> such as <crn> etc.

I don't think they would. For example, Harvard-Kyoto 'kRSNa' has no
characters in common with, say, 'cernyj'. Under their preferred
insertion-deletion (no substitution) metric, the distance between
'krishna' and 'cernyj' is 9, while the distance between 'kala' and
'cernyj' is 10. The teaching meterials mentioned in the parallel
thread used a much better metric, and it would probably have given
credit for having 'r...n' in common. Neither scheme gives credit for
the initial - unless of course you find and work with some ancient
transliteration using *<crishna>!

Richard.