Computational Historical Linguistics

From: Gregory L. Eyink
Message: 1968
Date: 2000-03-30

Hi, everyone!

I'm back from my trip, with another question. Actually, this is a
comment,
concerning the work of the Computational Historical Linguistics Project
at University of Pennsylvania (http://www.cis.upenn.edu/~histling/home.
html).
Perhaps I am just a grouchy physicist, but I don't see any value in
what
they have done.

As explained at their website, they use mathematically sophisticated
"tree
construction algorithms" which, for a given set of data, "return not
only
the optimal evolutionary tree (or trees) for the data presented, but
all trees
that are close to optimal." Among the surprising results that they
proclaim
at the site is that Old English (representing Germanic) "falls
somewhere
within the Satem core!"

OK, this is not hard to understand. It is not so different from the
view
espoused, for example, by Gamkrelidze and Ivanov, who posit a major
initial trunk division of the Indo-European tree, into an area A
(Anatolian,
Tocharian, and Italic-Celtic-Illyian) and an area B
(Germanic-Balto-Slavic
and Indo-Iranian-Greek-Armenian) with Anatolian splitting off early from
Area A. There are many well-known shared characteristics to justify the
Germano-Balto-Slavic grouping, such as the famous "m" phoneme in dative/
instrumental plural case endings, where other IE languages show "bh".
Even
a cursory examination of noun and verb paradigms of Gothic with those
in
Prussian or Lithuanian make clear the similarities. E.g. Gothic 2nd
person
pronouns (thu-jut-jus) in sing.-dual-pl. versus Lithuanian
(tu-judu-jes).

What seems really strange, however, is the insistence on grouping
Germanic
with the satem languages. What this really tells me is that the idea of
an
Indo-European Tree is an out-moded concept. Unlike in biology, where
genetics
provide a basis for such an evolutionary tree, there is none in
lingusistics.
Distinct biological species cannot interbreed (even here there are
plenty of
exceptional situations and grey areas) but in linguistic development
even
languages from very different families can mutually influence each
other.
For example, Turkish, a Ural-Altaic language, has influenced hugely the
vocabulary and even the grammar, of Armenian, an Indo-European
language.
Thus, the idea of simple phylogenetic trees becomes lost, as branches
of
wholly different trees may merge or influence each other. It is easy to
conceive that the evolutionary history is something like that posited
by the UPenn tree, with the satem shift a separate areal change that
affected Indo-Iranian-Armenian and Balto-Slavic but missed the--by
then---
geographically isolated Hellenic and Germanic groups. It is this type of
areal effect, widely recognized in the "wave theory" for example, that
the crude model of a branching tree completely misses and distorts.

I think that what I am saying is not so different from the currently
accepted
views in the field of historical linguistics. It thus amazes me that
the
UPenn group would get funding for their project from a reputable source
such as NSF. The idea of an "IE tree" was fine in the time of August
Schleicher
but now we are more sophisticated in our understanding of how languages
develop. Forcing linguistic evolution to fit a "branching tree" model
is truly a Procrustean exercise. It seems to me just an example how
sophisticated mathematics can be abused, to give an aura of "exactness"
which obfuscates rather than enlightens.

OK, I am an outsider to historical linguistics. Can anyone tell me why
I am
wrong, why the UPenn work is really great stuff?

Thanks!

Gregory

P.S. I would also be interested on peoples' ideas on the
Germano-Balto-Slavic
"m" in the dat.ins. pl. cases, vs. the "bh" in the rest of IE.