Re: Ringe's Work at U Penn

From: Piotr Gasiorowski
Message: 193
Date: 1999-11-06

junk
 
----- Original Message -----
From: markodegard@...
To: cybalist@eGroups.com
Sent: Saturday, November 06, 1999 4:37 PM
Subject: [cybalist] Re: Ringe's Work at U Penn

I am very unsure of the details of Ringe's work, but essentially, they are constructing 'rootless trees' using a supercomputer. The data consists of lexical, phonological and grammatical material taken from well-attested languages. The object is to discern pylogenic relationships, and measure the degree of relatedness between the individual data sets.

This is not glottochronology, in that chronological information plays only a small role. I might be wrong, but it seems they are having a computer do the huge 'eyeball comparison' that Greenberg used to demonstrate the Afro-Asiatic family of languages. This, then, would be the comparative method carried to its logical conclusion.

One interesting finding, illustrated on their web pages, is how difficult it is to place Germanic; so far, all they can say is that it is just outside the satemic core.

Apparently, the monograph for the initial work is more or less 'in publication'.

Mark Odegard.
  < html>


What they do is computer-aided "cladistic analysis" that is, comparing various possible hypotheses about the genetic relations holding within a set of languages in order to find the most likely or the most parsimonious one (requiring the minimal number of evolutionary "jumps" from language to language). The technique and the programs have been designed for biological classification, but historically it also owes much to the study of manuscript affiliations.
 
The program produces a so-called "cladogram" -- a tree diagram representing the best-fitting clustering of languages. A cladogram is in fact a hypothesis about the structure of cognacy relations in a set of languages. The diagram may be converted to a conventional family tree by establishing the "root node", that is, the position of the hypothetical protolanguage in the topology of the tree. The other nodes then show the order of branching within the family.
 
Neither part of the reconstruction is straigthforward, and the first part is very complex computationally, because the number of possible trees to be generated evaluated is very large even for a relatively small number of languages; hence the need to use computers (SUPERcomputers are not necessary for the kind of data historical linguists work with, and there are some effort-saving heuristic techniques that simplify the calculations).
 
One crucially important part of the procedure is the choice of "characters" (features) whose "states" (values) are to be determined for each reconstructed node of the tree. A feature which is "inherited" from a higher-order node represents an archaism, and the change of a feature as we move down a path in the tree represents an innovation. In a "good" tree, clusters of related languages should be characterised by shared innovations.
 
Being designed for biology, the program assumes the validity of the family-tree model (the effects of borrowing and areal convergence are ignored, as all branches are taken to evolve independently) and ultimate monogenesis. It will yield a result even if you feed data from unrelated languages into it.
 
Piotr