[cybalist] Re: Computational Historical Linguistics

From: Gregory L. Eyink
Message: 2018
Date: 2000-04-03

--- In cybalist@egroups.com, "Mark Odegard" <markodegard@...> wrote:
> The topic has been talked to death elsewhere. What has to be
understood (so far as I understand it) is that Don Ringe et al's work
is producing ROOTLESS trees. When they proclaim Old English's location
on this tree, they are being somewhat tongue-in-cheek: they are
reporting what their computer runs turn out.



Dear Mark,

Thank you for your comment. I am unaware of these other discussions.
Are they online? Do you have URL's?

I understand that their algorithm produces rootless trees. What I am
really questioning is the value and meaning of such a model, HOWEVER
it is produced. If an evolutionary biologist has a theory of bird
origins which postulates that the maniraptorian clade divided into
aves, dromaeosaurs, troodontids, therizinosaurs,and oviraptors, then
he may or may not be correct, but I know exactly what he means.
According to the papers at the Upenn website they propose that the
tree model should apply to linguistic families which are
geographically spreading, so that separating members should no longer
interact. Fine. However, when the UPenn tree shows that
Germano-Balto-Slavic and Indo-Iranian divided from a common ancestor,
what does this really mean? If I take their model literally, then one
must assume that either (i) Indo-Iranian and Balto-Slavic
independently underwent a satemic shift or (ii) the satem shift
occurred in the common proto-language and Germanic underwent a
retrograde centum shift! (Very unlikely phonologically.) It is not
just an issue with the placement of Germanic either. For example,
Armenian also has satemic features. The proper conclusion really seems
to be that the tree model is just not a valid representation of
linguistic facts for the IE family. I still don't see the value
of using fancy mathematics to "correctly" produce a bad model!
This linguistic theory produces contradictions whose explanation
requires going outside the tree model itself, e.g. invoking areal
change, or wave theory.

>
> Essentially, all they are saying is that OE (which is NOT a satem
language) is nonetheless best placed inside the group which did
undergo satemization. The literature I've read says there are
incompletely explained peculiarities in Germanic which largely
disappear if you posit a strong genetic (but pre-satemic) relationship
with the B-S and I-I branches.
>


The UPenn group certainly emphasizes the anomalous position of
Germanic in construction of the optimal tree. For example, they have
found that if they removed Germanic from the tree construction, then
they obtained a "perfect phylogeny", i.e. a tree for which all
linguistic characters employed were compatible. But does this mean
that this "perfect" tree should be regarded as having established
validity of the tree concept, and representing an historic fact?
There are many aspects of the "perfect" tree that are still very
controversial. For example, it supports the existence of a
Greco-Armenian proto-language. Yet very credible arguments have been
presented against such an hypothesis, e.g. by James Clackson in his
1994 Oxford monograph, "The Linguistic Relationship Between Armenian
and Greek." There is a lot hidden in the UPenn results in terms
of the linguistic characters employed and the values assigned to
compatibilities.


> Much of what they are doing seems to be 'tinkering'. They are
attempting to find those linguistic features which can accurately
predict known relationships, and then apply the same methodology to
unknown relationships. One can only wish them success.

I wish any honest scientist well. I certainly did not intend my
remarks to be "nasty", just an honest criticism. However, my negative
remarks are a reaction against what I see as the UPenn's group
tendency to "oversell" their method, or to "intimidate" with
sophisticated mathematics. An example: In their IRCS report they claim
that one of the important results of their methodology is "the ability
to detect and handle loanwords that are not distinguishable from
cognates by traditional methods." This sounds really wonderful, right?
Isn't it great that they can feed well-known linguistic data into
their miraculous mathematical machine and get such striking
conclusions as the output? However, an examination of their work shows
otherwise. The above claim is based upon their difficulty in fitting
Germanic into the tree. They found that using linguistic characters
based upon phonology and morphology gave the tree in with Germanic
was
grouped with Indo-Iranian and Balto-Slavic. However, if they used
characters based upon vocabulary, then Germanic was best grouped with
Italo-Celtic. To explain the discrepancy, they THEORIZED a non-tree
effect: that Germanic at an early stage had borrowed much of the
distinctive common Western vocabulary (e.g. Goth. `fisks', Lat.
`piscis', OIr. `iasc') from Italo-Celtic. This is not an automatic
output of their mathematical apparatus, but an independent speculation
on their part. It is also not the only possible explanation
(e.g. the items in question could be independent borrowings
from a western-European pre-IE substrate.) What the example really
shows, again, is that the tree model breaks down. If the underlying
linguistic theory of separate development were correct, then it
wouldn't matter which set of linguistic characters were employed
(phonological-morphological vs. vocabulary) and the same tree would
result. The fact that it doesn't just means that the tree model is
insufficient.

I can see some value in using the UPenn algorithms as a way of testing
the limits of validity of the tree model. I see a lot of their
conclusions as being not so different from what traditional linguistic
methods have produced using the same data, but perhaps better
quantified. For example, it could be useful to have "compatibility
scores" for different possible trees, or to see that different trees
result from different linguistic characters. This would all be
valuable if used correctly. However, this is not the UPenn attitude.
They take it as a CRITICISM of lexicostatistics that the
"best-informed mathematical linguist who attempted such work makes
notably modest and reserved claims for the method." Instead, they make
very arrogant and overblown claims for theirs. They claim to "resolve
longstanding open problems" such as the Indo-Hittite and Italo-Celtic
hypotheses. They boast that their method "has been able to construct
a robust evolutionary tree of the IE languages" whereas "traditional
methods failed." This is just not honest.