Re: [tied]  Genetic Tree for Language Matching

From: Michal Milewski
Message: 13613
Date: 2002-05-02

----- Original Message -----
From: <x99lynx@...>
> No, Michal.  These are cladistic tree (at least they was generated by
> cladistic phylogenic
software).
 
Sorry, but what you posted is in no way a classical cladistic tree. When you
draw a cladistic tree it cannot be based on arbitraly selected similarities
between analysed groups, but exclusively on the position of their common
ancestors. No classical cladistic tree can be formed, if the groups you try to correlate include members that are more closely related (phylogeneticaly) to some members of other groups than to themselves, as is obviously the case when you try to compare selected geographical regions (these are groups) containing different Y chromosome haplotypes (these are members). I hope you agree with me that it is not true that every Y chromosome from Europe is more closely related to any other European Y chromosome than it is to any other Y chromosome from other regions. Let me explain it using a very simple example. Let's compare two trees shown below.
 
       \
        \
        /\
       /  \
      /   /\
     /   /  \
    /   /    \
   /   /      \
fish birds mammals
 
 
       \
        \
        /\
       /  \
      /   /\
     /   /  \
    /   /    \
   /   /      \
shark herring human
 
 
The first tree looks quite OK, as we all probably agree that fish, birds and mammals have common ancestors (so they are related). In fact, this is the most popular way people show the evolution of vertebartes (of course, the tree shows only a part of it). However, this can't be called a classical cladistic tree, since some fish (or even most of them) are more closely related (have closer ancestors) with mammals and birds than with other fish. The second tree is a typical cladistic tree - maybe it looks a little bit strange to a nonbiologist, but it reflects in a more precise way the evolution of fish and mammals (or rather of shark, herring and human). In this case we can say that EVERY herring is more closely related with EVERY mammal than it is with ANY shark.
 
A typical cladistic tree is also shown on fig. 1 of Underhill's work, and this figure should be a basis for reconstruction of the evolution of human Y chromosomes (or the evolution of humans, and maybe their migrations, too). The most tricky part is of course a correlation between the evolution of Y chromosomal patterns and the migrations of human populations in the past (not even mentioning the correlation with the languages they spoke). But I'm still convinced that some estimations are possible. And of course you cannot do it using maximum likelihood network as the one from fig. 2 of Underhill. Let's assume for a moment that the frequencies analyzed in that figure correspond to Y chromosome haplotypes of all mammals found in different regions (even better example would include all eukaryotic organisms, but unfortunately most of them lack Y chromosome). How useful such correlation could be when trying to reconstruct the evolution (and/or migrations) of the ancestors of all horses, mice, or wales living today? But it doesn't mean that by analysing all wild mice Y chromosomal haplotypes we wouldn't be able to quite precisely reconstruct an evolution tree leading to Y chromosomes of most species (or strains in case we analyze just one species) living today (and maybe even roughly locate the nodes to the geographical regions).
 
 
> The nodes, branches and the haplogroups represent "defining mutations." 
 
If this is true, what is the "defining mutation" that defines the branch called "Europe" on fig.2 of Underhill? And what about the branches called "Mideast" and "America"? And also, what is the "defining mutation" for the whole branch that includes subbranches "Mideast", "Morocco", "Basque" and "Europe"?
 
 
> It's
> basic principle of
cladistic phylogeny.  This is precisely how Ringe attempts > to tree IE (by "innovations.") 
 
 
Unfortunately I don't know what are the exact linguistic features that were analysed by Ringe. You can build a cladistic tree for IE languages based on just one feature, if you are really confident that the differences you observe in all separate branches correspond to a progressive process that never (or almost never) goes back. (Is there any feature that would meet this criterium? I suspect that it would be hard to reach a consensus in this matter, but I don't exclude such possibility) If you have two such linguistic features, and the trees based on their evolution are not consistent, it means that at least one of those features does not meet the criterium. Of course, the more features you analyse, and the more confident you are that they meet the criterium, the more reliable your final result is. However, there is one very important difference between such analysis made for languages (and their features) and for regional populations (and their genetic markers). In the case of languages, you assume that a language is a unit that cannot be a descendant of two different ancestor languages (in case it inherits features from two or more languages, you always choose only one ancestor that seems to provide the "core" features - and defining these "core" features is critical for distingushing related from unrelated languages), because in such situation it would be impossible to draw a classical cladistic tree. On the other side, when working with arbitraly selected regional populations of Y chromosomes, there are quite frequent situations, when we cannot select a "core" haplotype that would properly define the position of the population as a whole. Most of the analyzed populations are too large and heterogenous (for example the European population) to refer to them as independent units and asign them single positions in the cladistic tree.
 

> (I don't know what you think this tree means, but yes it is a picture of the
> data, correlating the tree
in figure one with the geographical distributions
> in figure two. So,
yes, your Central Asia theory is rejected by the optimal
> cladistic tree
correlating regions and haplotypes generated by the data.
 
 
Let me know whether you still support this view after you read what I wrote above?
 
 
> And
> that may surprise
you only because this "network is consistent with the first > two principal components capturing 18% of the variation present in the 116
>
haplotypes."  In other words, this is the best view of direction of spread
> that could be generated from the data and it only captures 18% of the
> variances.  This low a participation is common in trees with a lot
of random
> distribution.)
 
And this is exactely why this method is useless in this case. (Because of  those numerous migrations in different dirrections that cannot be correctly interpreted using this kind of maximum likelihood network)
 

> And I've changed my mind.  The long Pakistani-Indian branch that off-shoots
> the "Mid East" branch
is clearly Sumero-Elamite-Dravidian.  No doubt about
> it. 
Just as these people felt their Y-Chromosomes changing, they definitely
>
started a new language family.  Probably innovated a more manly, guttural set > of sounds for use in their more aggressive verbal roots.
 
You don't need to be sarcastic. I've never claimed that my hypothesis is anything more than just a speculation about possible correlations between Y chromosomal markers, human migrations, and evolution of languages.
 
Regards,
 
Michal