Re: [tied] Interpreting the Y-Chromosome Research

From: Michal Milewski
Message: 13587
Date: 2002-04-30

----- Original Message -----
From: <x99lynx@...>
> Michal, first of all, I have to say I appreciate the effort you spent on
> this. And I hope that you don't take what I say below as being
> unappreciative of that effort.

Thanks for your kind words. Actually, I expected that your critics will be
much harsher. I am aware that this theory is rather speculative, but I am
happy that I made few people getting more interested with this kind of data.

> The basic problem with Underhill's data is
> sample sizes and this affects specific conclusions severely.

Right. This is why I used some data from other papers to get a more complete
picture. Of course, it is obvious that with more information available in
the future (with more populations studied, with larger samples analysed, and
with more polymorphisms detected) some elements will need to be redrawn.
However, I think that the basic concept will resist (but maybe I'm wrong).

(E.g., note
> that according to Underhill, the M46 mutation that you associate with
Uralic
> is not even found in Europe.)

This is because their sample was based on selected subpopulations
(Sardinian, Italian, German and Basque). But more detailed data on the
frequencies in different European populations can be found elswhere (and I
cited some of these numbers). Of course, what you wrote suggests, that
similar misrepresentation could be possible for other regions, and you are
absoultely right. This why I didn't draw more specific conclusions from data
concerning other regions (besides Europe). I hope future studies will alow
us to be more specific.

> The unreasonably large MODERN sample size for Central-Asia-Siberia pretty
> much assures that it will show up in many of Underhill's haplotypes. BUT
it
> seems the researchers corrected for that when they built their "maximun
> likelihood network" where Central Asia-Siberia plays a minor role.

You are again right, but Central Asia plays a specific role in my hypothesis
not (only) because of the presence of specific hyplotypes in this region,
but because this is the most likely place for the 09 lineage expansion,
although I agree that southern Asia (India and Pakistan) is also quite
possible and cannot be excluded. Actually, you may be right that this is the
more likely place for the 09 split, although it can't be said about the
later Amero-IE split. Do you agree? (of course, I'm talking about Y
chromosomes, not languages, which would be much more speculative)


> What is
> glaringly absent however is a sampling category called the "Near East,
which
> may have changed the later distributions significantly.

I agree. There is, however, a recent paper on this issue. I haven't read it
yet, but i t may give us some interesting details: Nebel A et al. (2001),
The Y chromosome pool of Jews as part of the genetic landscape of the Middle
East, Am J Hum Genet 69(5):1095. I hope their nomenclature is consistent
with that of Underhil, otherwise such papers are very hard to read.

> I'm going to post that map so the apparent implications are clear. My own
> conclusion is that this does not tell us much because the research was
really
> designed to confirm an out-of-Africa hypotheses and is statistically
invalid
> for more recent conclusions.

You are right. Their goal was to prove the out-of-Africa model. But it still
doesn't mean that some of those reasults are of no importance for other
questions. As for their "phylogenic" three you posted, it is rather useless
when trying to reconstruct the history of different lineages (I will explain
it later). More useful is their three from fig. 1 (and the data in table 1).


> 1. No, there is NO evidence that the "09" group you mention either QUICKLY
> expanded OR QUICKLY split. We have NO information about the original
> population of this group before it mutated in any way. We have NO
> information about how quick it mutated or any real timeline. We have NO
> direct knowledge of where 09 originated or where it went. There is no
> mention of a "Near East split."

I don't agree that we have no clues at all. Firstly, the 09 lineage had
quite recent (just one node back) ancestors with three groups that are
dominant (or it's close to be dominant) in Near East and northern Africa
(the "89 only" group), in Caucasus and Near East (the 172 lineage), in
Europe (the 170 lineage) and one group that is rare and present mostly in
Southern Asia (the 52 + 69 group). Knowing that this ancestor originated
from Africa, and that the O9 lineage dominates in SE Asia and America and is
also frequent in Europe and South Asia (although only relatively recent
sublineages dominate in last three locations), it seems more likely, that
the 09 lineage originated in the Near East and moved eastward, where it
spread in different directions. What other secenario is more likely in your
opinion?


> 2. The Underhill phylogenic "tree is rooted with respect to non-human
> primate sequences." This means that the timespan of this tree is relative
> and not anchored to anything but an estimated rate of mutation based on
> out-of-Africa and that rate is not supplied in this study. And we are not
> talking about mutation in the entire organism, but in the Y-chromosome, so
> the expected rate is even more ephemeral. When the original "09" mutation
> occured is not given, but potentially it very well could have happened
25,000
> years ago or more.

Yes this dating seems very likely. I would even push it further to about
35,000-30,000 BC.
I think we shold be rather thankfull that the Y chromosome avoids
crossing-over recombination (which would make this type of analysis
impossible) that is characteristic for autosomes and X chromosomes. In
addition to that, mutation rate is much lower than in the case of
mitochodrial DNA. Thanks to that, it is the best genetic tool we can use
today to uncover the human past. But you are of course right when
suggesting, that we should be aware of the male-related bias associated with
this kind of study.


> 3. Michal, it should be apparent that the sampling of this study favors
> finding varied haplotypes in Central Asia-Siberia. The sample from
> Central-Siberia was 184 males. The sample from the Europe was 60. The
> sample from the Mid-East was only 24. Since there were 116 haplotypes in
> the study, the chances that the study would have missed prevalent
haplotypes
> in Europe and especially in the Mid-East is strong.

As I said earlier, my hypothesis is not based on frequencies observed in
Central Asia (or not only on this).

> The ONLY reason you see "Central Asia" everywhere in the haplogroup tree
is
> because the SAMPLES FROM CENTRAL ASIA/SIBERIA MAKE UP ALMOST 20% OF ALL
THE
> SAMPLES from 21 regions sampled.

I don't agree. Could you show the fragments of my interpretation, where the
haplogroup frequency in Central Asia is the main argument when I argue that
the ancestors of "IE people" where in Central Asia at some point?

> 3. No, it does NOT seem that this original "09" group "moved towards
Central
> Asia." (In the Underhill study, "Central Asia" excludes "Pakistan +India"
and
> the "Mid-East", but includes "Siberia.")

I agree. Let's say that the 09 group moved eastward (it could be Southern or
Central Asia)


> In fact, to the extent that particular haplotype survives today (according
to
> the Underhill table), the largest apparent modern concentration of that
gene
> combination (marked 87 in Underhill) is in New Guinea (30%), with smaller
> ratios in Cambodia/Laos (5%), Hunza (5%) and "Central Asia-Siberia"(3%).
> Subsequent mutations of this founder gene does not affect that conclusion.

Sorry, but I think you are making a major mistake here (repeating it later
on several occasions). The fact, that the modern haplogroup 87 contains the
initial 09 polymorhism and lacks other polimorphisms, that appeared in
sister branches, does not mean that pople from this haplogroup are more
"ancestral" than others carriers of 09 polymorphism. Let's compare it with
the situation in lingustics. The fact that Baltic languages posses the
largest number of features characteristic for PIE (regardless whether it's
true or not), doesn't mean that the teritory ocupied by Lithuanians and
Latvians was an IE homeland. This is even more obvious in the case of
genetics, because the mutations leading to polymorhisms are accidental and
phenotypically neutral (at least we assume they are), whereas there can be
some correlation between the migration pattern and the specific features of
the language (and it will not be limited to vocabulary).


> This means that when we locate the original unmutated "09" gene today, it
is
> overwhelmingly located in Southeast Asia. If this was a plant or an
animal,
> standard topographic distribution assumptions would place its origin in
> Southeast Asia. (Modern % of population is not the best way to judge
ancestor
> populations but it is the only one that really allows any conclusions by
this
> data. Mutations are assumed to happen at the same rate in any location.)

I don't agree again. In this case there is something more than just the % of
population with given haplotypes. What makes this analysis very unique, and
much more useful then similar studies on autosomal or mitochondrial DNA, is
that we are able now to build a three that shows the sequence of appearance
of new polymorhisms in different lineages (and it does it with extremely
high level of certainity). So, we don't just use the relative frequency of
different modern haplotypes in different regions, but we can reconstruct (at
least raoughly) the sequence of events that lead to the situation we observe
today.


> If we step back A SINGLE NODE from your "09" group, to its immediate
> predecessor (marked haplotype "71") we find that it survives in a
> distribution that spans high concentrations in Morocco, the "Mid-east",
> Europe, India-Pakistan, and a smaller ratio in "Central Asia-Siberia."
> Despite the fact that "Central Asia-Siberia" has a small modern
> representation in haplotype "71", it is still larger than that region's
small
> ratio of representation of the Haplotype "09 anchor" - haplotype 87.

As I pointed above the model based on a modern "anchor haplotype" that
corresponds exactly (or even rougly, but somehow more than other sister
haplotypes) to the ancestor of larger haplogroup is incorrect and very
misleading. So, your following conclusion:

> So
> Central Asia seems to have lost concentration from the "09" mutation, not
> gained it. In effect, 09 originally moved AWAY from Central Asia by this
> data.

seems incorect, either.

> All this would strongly suggest that the original "09" haplotype was a
> peripheral visitor in "Central Asia-Siberia." However, "09" is probably
so
> old (25,000BC?) that it has little or no bearing on the discussion.

Yes. I can agree that 09 group could expand when moving to other places than
Central Asia, for example in Southern Asia, but I will insist these two
locations are the most likely centers of that spread. And I also think, that
if this event ocured as early as 35,000-25,000 BC, it is of great importance
for the concept of Nostratic family, and hence for the origin of IE.

> It seems
> to have directly contributed more to the populations of New Guinea,
Cambodia
> and Japan than to anywhere else.

You forgot about China, America and Europe. (But I think it was rather a
consequence of putting to much emphesasis of the 87 haplotype)

> The real clue to how old the original "09" haplotype is comes from the
fact
> that several nodes down a mutation in this branch shows a high ratio of
> presence to a modern Amerind populations. If we associate this with the
> migration into America (15,000-8,000 BC?), the number of nodes separating
it
> from the 09 anchor is actually greater than the nodes separating 09 from
an
> African genesis.

I completely agree. I would assume that their ancestors left Africa in
45,000-40,000 BC (the third wave), they reached Southern and/or Central Asia
by about 35,000 - 30,000 BC, and the split between future Amerindians and
future Indoeuropeans took place in about 20,000 - 15,000 BC.

> And if you are tracing languages back that far, you've gone
> quite beyond the comparative method and even well beyond Nostratic.

I agree. This is what I was trying to say in one of my previous messages.

> Given all that about the 09 node, is there any better evidence of a
"Central
> Asian-Siberian" source for the much later haplotype 104? (Michal's
mutation
> M173.)

You forgot about the 45/74 node. But I would really like to learn some
alternative theories. Do you have one?


> Haplotype 104's immediate ancestor survives in Haplotype 111. Today, the
> rare haplotype 111 apparently finds its main concentration in modern
American
> Indians, to a much smaller degree in Pakistan-India and to a even smaller
> degree "Cenral Asia-Siberian." Whatever we make of that, the origin of
111
> is as likely to be Pakistan_India (2%) as it is to be Central Asia (1%).

I think that the haplotype 111 frequency is not that relevant here, but I
agree that Pakistan/India is a possible place of the origin of an IE (173)
marker. However, it seems that a region more northward would be more likely
a homeland for related "IE" and "AI" markers.

> Other haplotypes (e.g., 113) could not have given rise to 104. I would
also
> question whether a better sampling of the "mid-East" would not have
yielded a
> truer origin for 111.

The further west the more diffcult it would be to explain the route for AI
markers, but If we find that some of the 111-114 hyplotypes (the 45/74
lineage with no 173 or 03 polymorhisms) are quite frequent there, it may be
reasonable to modify the location of the IE-AI split.

> A good date for this particular node is well before 8000BC - when
migrations
> into America may have started. 111 would seem to have originated deep in
the
> Ice Age - 15,000 BC. 111 is the founder haploid for 104 and all of
> Underwood's Haplogroup XI. Consistently, it seems in these particular
> Haplogroups, Pakistan + India seems the core of the new haplotypes, which
> makes sense in terms of its probable population density and therefore
> opportunity for mutation.

Again, I can't agree with a special role you asign to modern "founder
hyplotypes". Do you agree with me that this is not

> In Underwood

:-)

>, modern Haplotype 104 (mutation M173 alone) has the highest
> ratio of concentration among "European" (50%) and "Basque" (60%)
populations,
> with about 8% in the MidEast, about 6% in India-Pakistan, about 5% in
> "Central Asia-Siberia" and Morocco. In normal plant and animal
> distributions, this would suggest that the defining mutation occurred in
> Europe.

If the mutation M173 appeared first in Europe, it would be very likely that
other related hyplotypes (together with original paternal hyplotype - all
other relatives would probably have the nonmutated original haplotype)
survived in this region. And this is simply not the case. And the
association with AI markers shoul not be overlooked.


> A later mutation (M17) (haplotype 108) is the only M173-based group that
> shows strong eastern presence. It yields a 31% representation in
> Pakistan-India and a 16% ratio in Central Asia, again suggesting a
> Pakistan-India origin.

You should look at all hyplotypes that include the 17 (M17) mutation, not
only the haplotype 108, but you are right that the incomplete data on
European population (where eastern Europe was excluded) may suggest that
Pakistan/India is a Satem homeland. But I wouldn't allow myself to make such
statement without checking the Eastern European frequencis first (which I
did, as you know).

> The modern distribution in Europe according to
> Underhill's sampling is 5%. (Compare the numbers you supplied.)

I explained it earlier in this message.


> Is it reasonable to connect any of this with language? Back in 25,000BC
> maybe.

As you mentioned earlier, you would place the origin of M173 (which I call
the IE marker), and the M17 (Satem) marker is even younger, so I think that
these results are relevant to the discussion about the origin of IE.

> But the difference between M46 and M173 was probably lost the minute
> two humans had to talk about how they were going to get something to eat
that
> day.

You mean M45 and M173. Whatever was the differnce betwen them, it's obvious
that they had the same father, but their children and grandchildren live
very far apart today. So, the brothers had to move in opposite directions,
there was no other possibility for transferring their Y chromosomes to those
places (which doesn't need to be true in case of languages - but it is
nevertheless quite likely that in this particular case the migration was
associated with the language)

> The fact is that this Y-chrome trait is NOT necessarily correlated to
> ANY OTHER physical traits.

It has some advantage, since it means that those markers can not be subject
to an unknown selection pressure.

> Even if such an invisible genetic difference once
> meant something, it would have been swallowed up by necessity, culture and
a
> common language.

So should we stop trying to correlate linguistic, historical, archaeological
and genetic data?

Regards,

Michal