==================================================================
I. Identification of proteins with accelerated evolution in the hominid lineage:
==================================================================
Following the criteria set in the above section, we identified 115 genes from GenBank and obtained 5 additional genes from our laboratory that were suitable for the rate analysis. Fig 1B shows the distribution of the acceleration index for the 120 genes. Results from each of the 120 genes are given in the Appendix. The mean is 1.13 ± 0.54 and the median is 0.39. The distribution is skewed because no amino acid substitutions are found in the human lineage in about one-third (39/120 = 0.325) of the genes examined. A majority of the genes have < 3.2. Only two genes have significantly >1 (P < 0.003 and P < 0.001, respectively; binomial test). Since 120 tests were conducted, it was necessary to evaluate whether there were false-positive cases. For this, we conducted a computer simulation. As described in the above section, our simulations were designed to examine the type-I error of the binomial test. The results suggest that the expected number of false-positive cases is <<1 for our sample of 120 genes (Table 1). Thus, our positive detection is unlikely due to statistical artifact.
The two positive cases, PRM2 and FOXP2, are listed in Table 2. PRM2 (protamine 2) is a DNA-binding protein that replaces histones in spermatogenesis. It has been shown to evolve rapidly in humans and chimpanzees and was suggested to be a likely target of sexual selection (WYCKOFF et al. 2000 ). Thus, it is not unexpected that PRM2 is identified in our analysis. However, the fact that both human and chimpanzee lineages experienced accelerated evolution ( and are both significantly >1) suggests that the type of selection on PRM2 is probably not unique to humans. In contrast, FOXP2 has the highest (63.4) of all genes examined, while is 0 (Table 2), suggesting hominid-specific acceleration. We thus focus our analysis on FOXP2 in the remainder of the article.
==================================================================
II. Enhanced substitution rate of human FOXP2:
==================================================================
FOXP2 belongs to the winged helix/forkhead class of transcription factors (LAI et al. 2001 ; SHU et al. 2001 ). It is expressed in multiple fetal and adult tissues with a high expression in certain regions of the fetal brain (LAI et al. 2001 ; SHU et al. 2001 ). Mutations in the gene cause a severe speech and language disorder in affected individuals despite their adequate intelligence and opportunity for language acquisition, suggesting that FOXP2 is specifically involved in speech development (LAI et al. 2001 ). FOXP2 is a conserved protein, with only three amino acid differences (and a 1-amino-acid insertion/deletion) between human and mouse in its entire length of 715 amino acids (Fig 2). We sequenced the coding regions of the FOXP2 gene from the chimpanzee, pygmy chimpanzee, gorilla, and orangutan and determined that two of the three aforementioned substitutions occurred in the hominid lineage and no substitutions occurred in chimpanzees (Fig 2). As indicated in Table 2, the acceleration in the evolution of human FOXP2 is statistically significant. This significance is also obtained (P = 0.0010.006) when we consider ranges of divergence times for the human-chimpanzee split at 4.07.0 MYA (CHEN and LI 2001 ; BRUNET et al. 2002 ; STAUFFER et al. 2002 ) and the primate-rodent split at 80110 MYA (KUMAR and HEDGES 1998 ; ARCHIBALD et al. 2001 ; NEI et al. 2001 ).
The two amino acid substitutions in the human lineage are a Thr-to-Asn change at position 303 and an Asn-to-Ser change at position 325, both in exon 7. These substitutions are located in a broadly defined transcription repression domain (SHU et al. 2001 ; Fig 2), so it is possible that they affect the binding of FOXP2 with regulatory sequences of its target genes. If these substitutions are important to speech development, they should be fixed in normal humans and not be found in nonhuman organisms. Indeed, these substitutions are shared by all 32 normal humans surveyed (9 African Americans, 10 Caucasians, 9 Asians, and 4 Amerindians), but by none of the 29 nonhuman species examined. These species include a bird and 28 placental mammals from 12 representative orders (Fig 3). Interestingly, the Asn-to-Ser substitution also occurred independently in carnivores, suggesting that this substitution alone is not sufficient for the origin of speech and language.
===================================================================
III.Driving forces behind the accelerated evolution of human FOXP2
===================================================================
It would be interesting to identify the driving force behind the two amino acid substitutions and the accelerated evolution of human FOXP2. There are three possibilities:
- enhanced mutation rate,
- relaxed purifying selection, and
- positive selection.
Because synonymous nucleotide changes are usually immune to selection, the rate of synonymous substitutions can be used to measure the mutation rate (NEI and KUMAR 2000 ). Using parsimony, we determined the number of synonymous substitutions in each branch of the FOXP2 gene tree of five hominoids and mouse (Fig 4). It can be seen that the number of synonymous substitutions in the human lineage (two) is smaller than that in the two chimpanzee lineages (three and four, respectively). The number of synonymous substitutions per MY is also smaller in the human lineage (2/5.5 MY = 0.36) than in the lineage before the human-chimpanzee separation ([2.5 + 4.5 + 127.5]/[90 MY x 2 - 5.5 MY] = 0.77 for the branches linking node A and mouse; see Fig 4). Thus, there is no indication of enhanced mutation rate at FOXP2 in the human lineage. This conclusion is strengthened as the true number of synonymous substitutions is likely to be higher than the parsimony estimate for the long branch leading to the mouse, but not for the short branches within hominoids. Use of Bayesian estimates of ancestral sequences confirmed this result. Furthermore, the ratio of nonsynonymous substitutions to synonymous substitutions in the human lineage (2/2 = 1; see Fig 4) is significantly greater than the ratio in the branches linking node A and mouse (1/[2.5 + 4.5 + 127.5] = 0.007; P < 0.002, Fisher's exact test; ZHANG et al. 1997 ), suggesting that the rate difference is due to a difference in selection. It is unlikely, however, that the functional constraint and purifying selection on FOXP2 has been relaxed in humans, as mutations show severe deleterious effects (LAI et al. 2001 ). Consistent with the existence of strong purifying selection, no amino acid polymorphisms in FOXP2 were found in a survey of 48 humans (NEWBURY et al. 2002 ).
------------------------------------------------------------------
Thus, positive selection remains as the most likely cause of the accelerated evolution of human FOXP2.
------------------------------------------------------------------
We noted, however, that the rate ratio of nonsynonymous to synonymous substitutions per site is not >1 in the human FOXP2 lineage. This is likely due to the fact that FOXP2 is an overall conserved protein and many sites are under purifying selection. Under such circumstances, population genetic data may provide useful information on the evolutionary force. We therefore sequenced 9984 nucleotides in introns 6 and 7 of the FOXP2 gene from 10 humans (3 African-Americans, 3 Caucasians, 3 Asians, and 1 Amerindian) and one chimpanzee (Table 3). Introns 6 and 7 are adjacent to exon 7, where the two amino acid substitutions occurred in humans (Fig 2). By tight linkage to exon 7, these intron sequences may preserve information on the fixation process of the amino acid changes. For comparison, we also compiled available data on worldwide polymorphisms in other noncoding regions of the human genome that are at least 3000 nucleotides long and are not known to be under selection. We found that the level of polymorphism is lower in FOXP2 introns than in any other neutral noncoding regions examined (Table 4). An HKA neutrality test comparing the intra- and interspecific sequence variations between loci (HUDSON et al. 1987 ) yielded a very significant result when FOXP2 introns were compared with all other regions combined (P < 0.00001; Table 4). When these regions were compared individually with FOXP2, all indicated a lower-than-expected polymorphism in FOXP2 and four out of six cases showed statistical significance (Table 4). Mutation-rate variation among loci would not result in significant HKA test results (HUDSON et al. 1987 ). Population demographic changes cannot explain them either, because they would have affected all loci in a similar way (HUDSON et al. 1987 ). Rather, these comparisons suggest background selection and/or selective sweeps. Here background selection refers to purifying selection on deleterious mutations in tightly linked exons and selective sweep refers to quick fixation of advantageous mutations in these exons. These events, if recent enough, can lead to a reduced present-day polymorphism in introns 6 and 7 (MAYNARD SMITH and HAIGH 1974 ; CHARLESWORTH et al. 1993 ). Consistent with the HKA test results, Tajima's D (-1.36, P = 0.076) and Fu and Li's F* (-1.81, P = 0.064) are both negative for the FOXP2 intron data, although they are only marginally significant. Note that these tests are conservative as a recombination rate of zero was assumed in the coalescent simulation.
If the nonneutral pattern of introns 6 and 7 is due to background selection, the selection intensity must be high, because weak background selection is known to be ineffective in reducing the polymorphic level. This suggests that the adjacent exons must be under strong functional constraints with no relaxed purifying selection, which would imply that positive selection is the only possible explanation for the accelerated protein evolution. If a relatively recent selective sweep caused the low polymorphism, at least one of the two amino acid changes in exon 7 must be advantageous because no other amino acid substitutions occurred in the evolution of human FOXP2 and no other functional genes are located within 100 kb of FOXP2 exon 7.
====================================================================
Taken together, unless positive selection is invoked, one cannot explain the accelerated evolution of FOXP2 protein and low polymorphism of introns simultaneously. The finding that FOXP2 is critical to speech and language development (LAI et al. 2001 ) does not by itself demonstrate the role of this gene in the origin of human speech, because the function of FOXP2 could have remained unchanged during human evolution while other speech-related genes changed. However, the revelation of significant acceleration and positive selection in human FOXP2 suggests functional and fitness relevance of the two amino acid substitutions and provides support for the role of this gene in the evolution of speech and language. Interestingly, the notion of selection is consistent with the belief that the origin of language is an adaptation (PINKER and BLOOM 1990 ; BOYD and SILK 2000 ).
====================================================================
In the future, it would be interesting to examine the exact functional effects of the two amino acid substitutions of human FOXP2 by in vitro assays of protein function as well as characterization of human phenotypes of reverse mutations.
--------------------------------------------------------------
If the lower-than-expected nucleotide diversity in FOXP2 introns suggested by HKA tests and D and F* statistics is indeed a result of a relatively recent selective sweep,
=================================================================
the sweep probably occurred no earlier than 0.5 N generations ago,
=================================================================
because the signal of a sweep is unlikely to last longer than that (SIMONSEN et al. 1995 ).
Here N is the effective population size of humans and is generally thought to have been 10,000 (TAKAHATA 1993 ).
====================================================================
Thus, the sweep would have occurred no earlier than 5000 generations, or 100,000 years, ago.
====================================================================
This estimate is within the wide window of 40,000 years to 4 MYA during which human languages are believed to have emerged (BOYD and SILK 2000 ). A paleo-population genetic study (LAMBERT et al. 2002 ) may more accurately define the timing and process of the two amino acid substitutions in humans.
--------------------------------------------------------------------
-----------------------------------------------------------------
Perspective:
-----------------------------------------------------------------
In this study we focused on identification of proteins with accelerated evolution in the hominid lineage. Other strategies that may also be used in the search for genetic bases of uniquely human features include identifying human genes that are under positive selection, human-specific gene duplications, deletions or deactivations, and changes in gene expression (GAGNEUX and VARKI 2001 ; ENARD et al. 2002 ). Different from these methods, our approach is useful when the phenotype-affecting genetic changes are simple amino acid substitutions. Our computer simulation showed that unless the substitution rate per sequence (nr) is high, our rate-constancy test is quite conservative. While this property somewhat reduces the power of our approach, it also makes our claims more secure. In other words, the positively identified cases will have a high chance to be biologically meaningful. At present, only a small number of chimpanzee genes have been sequenced, and only 120 genes, or 0.35% of the genome, have been analyzed here. As the chimpanzee genome sequencing project (FUJIYAMA et al. 2002 ) proceeds, many more genes affecting uniquely human features may be found by this and other methods.
url:
http://www.genetics.org/cgi/reprint/162/4/1825