Re: Automatic clustering of languages

From: mkelkar2003
Message: 48225
Date: 2007-04-04

--- In cybalist@yahoogroups.com, "Richard Wordingham" <richard@...> wrote:
>
> --- In cybalist@yahoogroups.com, "Francesco Brighenti" <frabrig@>
> wrote:
>
> > > If you are just doing similarity comparisons on languages, there is
> > > no justification for excluding loan words. The question is rather
> > > whether they are now the 'typical' words for the meaning.
>
> > Excuse my ignorance, but why should loan words, even those which have
> > in course of time become typical for a given meaning, be included in
> > these Swadesh-like lists created for the sake of historical
> > comparison? As minimum, one should in this case consider carefully
> > the time depth at which the loan took place (e.g.: was it in the
> > prehistoric period? in the early historical period? three centuries
> > ago? etc.).
>
> If you are using Swadesh lists for glottochronology, then the answer
> is simple - they are examples of word replacement and help date the
> notional splits.
>
> For lexicostatistics, I would again say that they reflect the nature
> of the vocabulary. For example, the English words chosen to define
> the 100-word list include several North Germanic and Romance words -
> the following at least:
>
> North Germanic: bark, skin, egg, give
> Romance: person, mountain, round
>
> The verb 'die' might be of native origin, or a native word revitalised
> by Danish influence. The IE collection of word lists (Dyen et al.)
> marks 'bird' as a loanword, I don't know why. From its history and
> phonetics, 'big' looks North Germanic, but the North Germanic cognates
> are lacking. In my usage, native 'belly' survives only in set phrases
> - I would normally use a word of Greek origin, 'tummy', 'stomach' or
> even 'abdomen'. 'Breasts' is not the normal word in most Britons'
> speech - Romance(?) 'tits' is the usual (plural) word, though a lot of
> substitutes are used. Pushing things further back, 'path' is a
> Scythian loan, and 'long' looks like a Celtic loan.
>
> This is a fair reflection of the fact that English vocabulary has been
> heavily influenced by North Germanic and Romance. If one wants to
> exclude such features for some reason, it is generally better to use
> an older form of the language.
>
> > If one includes loan words like these in Swadesh-like lists such as
> > those used by "our" Slovenian researchers, will English and Arabic
> > cluster close to each other after they have put the data into their
> > shaker? :^)
>
> Well, 'die' and 'egg' probably helped English come out as aberrant
> North Germanic rather than aberrant West Germanic. Other contributory
> factors may have been the use of 'to' in the infinitives, possibly
> better matching the 'att' etc. of the Scandinavian forms, as opposed
> to the zero of the Dutch and German forms. The High German consonant
> shift may also have helped with this misclassification. Note that in
> the schemes presented, regular correspondences get no discount - each
> word pays the full cost of the sound change!

That *is* the correct way to classify with complete objectivity. If
regular corrospondences are assumed to indicate genetic relation, then
these same genetic relations cannot be used to decide what is regular
and what is not regular. Comparative linguists have trapped
themselves in this kind of circular reasoning. That is why they need a
second law to explain the "irregularities" not explained by the first
one.

M. Kelkar

>
> Using spelling similarity with differences assessed word by word to
> identify genetic relationships is an attempt to automate naïve mass
> comparison. Need one say more?
>
> Richard.
>