Man Wah So

MBB 452a

Prof. Mark Gerstein

Final Project

 

 

 

An Overview of Three Approaches to Construct Phylogeny Based On Entire Genomes

 

 

The advent of phylogenetic analysis based on molecular sequence information has radically transformed our perception of evolution.  Based upon rRNA-like sequence comparisons, Woese proposed the addition of a new taxon, “domain,” above the level of kingdom in the old five-kingdom taxonomy and classified organisms into three domains—the Archaea, the Bacteria, and the Eucarya—according to their differences at the molecular level (8).  Although Woese’s idea has generally been accepted, many trees that depict evolutionary distances between genes have shown ambiguous results or even failed to support the three-domain system (1, 2, 4, 5).  In order to complement information attained from traditional sequence comparisons of individual genes, researchers have consequently turned to entire genomes for phylogenetic reconstruction.  This paper aims at describing three different approaches proposed by Sankoff, Snel, and Tekaia in the past decade to infer phylogeny from organellar and complete genomes.  The availability of complete genome sequences has not only provided alternative approaches to assess Woese’s three-domain proposal, it has also enriched our understanding of evolution at the molecular level.

 

In a 1992 paper, “Gene order comparisons for phylogenetic inference: Evolution of the mitochondrial genome,” Sankoff et al investigated the feasibility of inferring evolutionary relationships from macrostructures of entire genomes (6).  Although evolutionary inference based on molecular information has traditionally compared homologous versions of a single gene in different organisms, such comparisons have been limited in that they are based on point mutations only.  Often, rapid rates of nucleotide substitutions make distinguishing homology between related genes from noise levels a difficult task.  By comparing gene orders rather than individual sequences, however, Sankoff’s proposal suggests an alternative route to circumvent the aforementioned problem.  At the genome level, chromosomal inversions, transpositions, insertions, and deletions, rather than nucleotide substitutions, are the major contributors that determine the evolutionary distance between organisms.  Therefore, in the study, Sankoff defined an evolutionary edit distance, E(a, b), as the number of elementary events—inversions, transpositions, and deletions or insertions—necessary to change the gene order of one circular genome a into that of another, b.  He then used the obtained evolutionary edit distance between mitochondrial genomes and constructed a database of sixteen mitochondrial gene orders from fungi and other eukaryotes (6).  Observing that trees based on gene order comparisons exhibit branching orders that correspond almost perfectly to accepted evolutionary knowledge, Sankoff concluded that macrostructures of genomes contain meaningful information for phylogenetic reconstruction (6).

 

Similarly, in addressing the observed inconsistencies in species phylogenies based on sequence comparisons of individual genes, Snel turned to the use of whole-genome trees in 1999.  In their paper, “Genome phylogeny based on gene content,” Snel et al presented an integrative view of genome analysis based on shared gene content and defined the similarity between two genomes as the number of genes that they have in common divided by their total number of genes (7).  In this approach, lists of pairs of homologous sequences were first compiled from a Smith-Waterman comparison (at the amino-acid level) of all the genes between two genomes using a cutoff value of E=0.01.  Then, pairs of genes that are each other’s “closest relative” in their respective genomes were selected to determine the number of genes shared between two genomes (7).  Comparing protein sequences encoded by thirteen completely sequenced genomes of unicellular species with one another, Snel observed that the number of genes two genomes have in common depends on their evolutionary distance (7).  When he derived a genome phylogeny from shared gene content, Snel found that the tree reflected the standard phylogeny based on rRNA sequence identity (7).  The study suggests that in cases where inconsistencies in single-gene trees are observed due to horizontal gene transfer, phylogenetic analysis based on gene content may provide a more representative view of the evolutionary differences among organisms at the molecular level (7).

 

While evolutionary descent was the basis for both Sankoff’s and Snel’s studies, Tekaia, instead, attempted to derive genome phylogeny from a hierarchical classification of genomes.  In a 1999 paper, “The genomic tree as revealed from whole proteome comparisons,” Tekaia et al constructed genomic trees from twenty completely sequenced genomes through whole proteome comparisons, taking into account the predicted gene product content of each organism and their overall similarity (9).  First, the full set of predicted gene products of a completely sequenced organism was compared with itself and with that of every other organism.  Then, the proportion of ORFs in organism j that have at least one similar ORF in organism i, Tij, was determined for all possible pairs of n organisms to generate an n x n matrix.  Using correspondence analysis on the matrix of Tij’s, distances between organisms were subsequently calculated and used for the construction of genomic trees (9).  Although Tekaia’s trees were essentially phenograms, he named them genomic trees due to their resemblance to Woese’s and other sequence-based phylogenies.  The correspondence of Tekaia’s trees, which embodied sequence divergence, gene acquisition and losses, with Woese’s rRNA-based tree, which was based solely on sequence divergence, suggests that the average duplication and deletion events that have taken place through evolutionary time are statistically similar in related organisms (9). 

 

Since Woese’s proposal for the three-domain classification scheme, the importance of understanding differences among organisms at the molecular level in the study of evolution has been increasingly recognized.  Where single-gene comparisons have failed, researchers have turned to comparisons of entire genomes for evolutionary inference; in turn, these studies are used to assess the fitness of previously derived phylogenies.  Although all three studies discussed in this paper showed results that support Woese’s rRNA-based phylogeny, the validity of his three-domain proposal has yet to be confirmed.  In particular, whole-genome phylogenies to date were based only on a limited number of available genome sequences, and hence their results may not be representative of the whole picture.  Only when more genome sequences are available will we be able to refine details of phylogenetic trees and properly assess the evolutionary relationships among organisms on this planet.  

 

 

References:

 

1.      Brown, J.R., and W.F. Doolittle.  1995.  Root of the universal tree of life based on ancient aminoacyl-tRNA synthetase gene duplications.  Proc. Natl. Acad. Sci. 92: 2441-2445.

2.      Cavalier-Smith, T.  1989.  Molecular phylogeny.  Archaebacteria and Archezoa.  Nature 339: 100-101.

3.      Doolittle, R.F.  1998.  Microbial genomes opened up.  Nature 392: 339-342.

4.      Forterre, P., N. Benachenhou-Lahfa, F. Confalonieri, M. Duguet, C. Elie, and B. Labedan.  1992.  The nature of the last universal ancestor and the root of the tree of life still open questions.  Biosystems 28: 15-32.

5.      Gupta, R.S.  1998.  Protein phylogenies and signature sequences: A reappraisal of evolutionary relationships among Archaebacteria, Eubacteria, and Eukaryotes.  Microbiol. Mol. Biol. Rev. 62: 1435-1491.

6.      Sankoff, D., G., Leduc, N. Antoine, B. Paquin, B.F. Lang, and R. Cedergren. 1992.  Gene order comparisons for phylogenetic inference: Evolution of the mitochondrial genome.  Proc. Natl. Acad. Sci. 89: 6575-6579.

7.      Snel, B., P. Bork, and M.A. Huynen.  1999.  Genome phylogeny based on gene content.  Nat. Genet. 21: 108-110.

8.      Woese, C.R.  1990.  Towards a natural system of organisms: Proposal for the domains Archaea, Bacteria, and Eucarya.  Proc. Natl. Acad. Sci. 87: 4576-4579.

9.      Tekaia, F., A. Lazcano, and B. Dujon.  1999.  The genomic tree as revealed from whole proteome comparisons.  Genome Res. 9: 550-557.