Role of Bioinformatics in Viral Taxonomy and Phylogeny Lara Ely The contribution of bioinformatics to the phylogenetic taxonomy of viruses has been felt most keenly in recent decades. The earliest attempt to initiate a universal method of naming and classifying viruses in 1930 relied primarily upon the type of symptoms produced by the virus during infection and the mode of transmission (Murphy and Kingsbury, 1990). This first meeting did not result in much progress. To fill the void created by the lack of consensus, many laboratories proposed their own nomenclatures and schemes for viral taxonomy. The confusion intensified over the ensuing decades as more viruses were discovered and more information was gleaned about the genomes and structures of already known strains (Murphy and Kingsbury, 1990). By 1966, enough data had accumulated that a permanent council was established to standardize viral taxonomy and nomenclature. The International Committee for Nomenclature of Viruses (which later became the International Committee for Taxonomy of Viruses) sifted through the jumble of independently proposed taxonomies and nomenclatures and formalized the regulations for naming and classifying viruses. Recently, the profusion of genomic and amino acid sequences for all types of viruses has allowed virologists to compare their sequences to those of related and seemingly unrelated virus families. The resulting, and sometimes surprising, homology in protein sequence or structure between various virus families has given clearer insight into the phylogeny and, subsequently, the evolution of viruses (Matthews, 1985; Strauss et al., 1990). The most firmly grounded taxa for viruses is that of the family (Matthews, 1985). Dr. David Baltimore developed a classification scheme for viruses that was rooted in the type of genome packaged into the viral particle and the method by which mRNA is made. The six categories are double stranded DNA, positive or negative sense single stranded DNA, double stranded RNA, positive sense single stranded RNA, negative sense single stranded RNA and the retroviruses which have a single stranded RNA genome as well but replicated through a double stranded DNA intermediate which integrates into the host genome. The philosophy behind the Baltimore classification scheme was expanded to include consideration of viral morphology and this has been adopted as the criteria for delineating virus families (Matthews, 1985). It is assumed that related viruses will order their genes similarly and, in the case of segmented viruses, there will be homology between the arrangement of genes on segments. Therefore, genome organization has become the major criterion for viral genera (Matthews, 1985). The demarkation of species among viruses is the least clear among the lower taxa. The qualifications for a virus species obviously lies between those for genus and strain but there is no general consensus as to what they should be exactly (Matthews, 1985). Bioinformatic techniques come into play when determining the particular strain of a viral isolate. Minor changes in amino acid sequence which create slight differences in viral behavior have provided the basis for strain differentiation (Matthews, 1985). For example, amino acid sequence of the two surface glycoproteins (HA and NA) of the influenza virus is used to track its antigenic drift and antigenic shift. Each year, the particular strain of flu that circles the globe differs slightly from the strain of the year before in the sequence of the epitopes located on HA and NA. This phenomenon is known as antigenic drift and ensures the continuing presence of influenza in the human population. Every now and again, influenza completely overhauls the structure of one or both of its surface glycoproteins resulting in something completely foreign to our immune system. This antigenic shift led to the lethal pandemics that have plagued humanity this century. For each outbreak of influenza, virologists compare the amino acid sequences of HA and NA between the new strain and those of previous years to determine how diverged this new strain is. The Centers for Disease Control also uses these sequences to design the annual vaccine. The heaviest use of bioinformatics is in the proposal of higher taxa and, concurrently, a phlyogenetic tree of viruses. There is considerable contention that viruses cannot be classified into taxa higher than that of family and that no one phylogenetic tree could encompass all viruses (Rybicki, 1990; Goldbach, 1992). The first complication is the high probability that various families of viruses evolved in several discrete instances (summarized in Strauss et al., 1990 and Matthews, 1991). There is the strong possibility that some bacteriophages arose from the circular extragenomic bacterial DNA known as plasmids. The theory is that one of these plasmids gained the necessary genes to package its DNA into particles. Among the viruses plaguing eukaryotes, parallels can be drawn between the behavior of retroviruses and certain retrotransposons which have the ability to move about the genome, inserting themselves in various places (Kingsman et al., 1991). Recently, speculation has begun to the effect that a group of RNA viruses, the potyviruses, may have arisen from parasitic fungus (Ward et al., 1994). The appearance of large, complicated viruses such as the poxviruses might be explained by the degeneration of early cells much in the way eukaryotic organelles are thought to have arisen (Fenner, 1979). There is even some debate surrounding the evolutionary relatedness of seemingly close groups such as single stranded RNA viruses. Some viral taxonomists believe that the method of replication in RNA viruses is sufficiently conserved that it probably only evolved once and all current RNA viruses descend from this early ancestor (Rybicki, 1990). Others argue that the relationship between phylogeny and replication strategy is tenuous and viruses with similar methods of replication could have evolved more than once (Koonin and Dolja, 1993). Some argue their probable polyphyletic origin renders higher taxonomy and a single tree describing the evolutionary relationships of all viruses unattainable (Rybicki, 1990; Goldbach, 1992). Their opponents contend that the polyphyletic origins can be accommodated by creating a separate kingdom for viruses and using the category of phylum to distinguish the various evolutionary occurrences of viruses (Ward, 1993). The second argument against higher classification of viruses is the ease with which viruses pilfer gene cassettes from their hosts and other viruses. This is readily apparent in the group of retroviruses. Transduction is the term applied to the method by which retroviruses pick up host genes while their double stranded DNA provirus is plucked from the host's genome. The propensity for viruses to incorporate foreign genes into their genome precludes phylogeny based upon sequences of the entire genome. However, many contend that the evolutionary relationships of viruses can be deduced by comparing the amino acid sequences of ancient, conserved proteins such as those involved in the virus' chosen method of replication (Koonin and Dolja, 1993). The group of ssRNA viruses is not only the largest, but also the one for which the most information is available (Francki et al., 1991). Therefore, the argument for higher viral taxonomy and the creation of phylogenetic trees is most advanced among virologists of this group. All RNA viruses require an RNA dependent RNA polymerase (RdRp) to replicate. Therefore, RdRp functions much like the 16S ribosomal protein in that its amino acid sequence is used as the basis for taxonomy among the RNA viruses (Matthews, 1985; Goldbach and Wellink, 1988; Strauss and Strauss, 1988; Habili and Symons, 1989; Strauss et al., 1990; Dolja and Carrington, 1992). Sequences of the proteins in the DNA polymerase complex have been put forth as potentially providing the basis for phylogenetic analysis of DNA viruses (Braithwaite and Ito, 1993). Applications of bioinformatics, such as amino acid sequence comparisons of RdRp, have revealed previously unconsidered relationships between virus families (Koonin and Dolja, 1993). There doesn't seem to be a clear delineation between the viruses that infect the various kingdoms of eukaryotes. However, there is a distinction between viruses that infect prokaryotes and those of eukaryotes (Matthews, 1985). Some virus families that infect vertebrates, such as rhabdoviruses, also infect plants and seem to have arisen there and passed to vertebrates through invertebrate intermediates as the animal kingdom evolved (Ward, 1993). Further sequence analysis of conserved motifs in ancient enzymes should serve to expand and refine our knowledge of viral evolution and phylogeny. References Braithwaite, D. K. & Ito, J. (1993), Compilation, alignment, and phylogenetic relationships of DNA polymerases. Nucl. Acids Res. 19, 217-226 Dolja, V. V. & Carrington, J. C. (1992), Evolution of positive-strand RNA viruses. Sem. Virol., 3, 315-326 Francki, R. I. B., Fauquet, C. M., Knudson, D. L. & Brown, F., (1991), Classification and Nomenclature of Viruses: Fifth Report of the International Committee on Taxonomy of Viruses. Springer-Verlag, Heidelberg, Berlin. Goldbach, R. (1992), The recombinative nature of potyviruses: implications for setting up true phylogenetic taxonomy, in "Potyvirus Taxonomy" (Barnett, O. W.) (pp. 299-304). Springer-Verlag, Heidelberg, Berlin. Goldbach, R. & Wellink, J. (1988), Evolution of plus-stranded RNA viruses. Intervirology, 29, 260-267 Habili, N. Symons, R. H. (1989), Evolutionary relationship between luteoviruses and other RNA plant viruses based on sequence motifs and their putative RNA polymerases and nucleic acid helicases. Nucl. Acids Res., 17, 9543-9555 Kingsman, A. J., Adams, S. E., Burns, N. R. & Kingsman, S. M. (1991), Retroelement particles as purification, presentation and targeting vehicles. Trends in Biotech., 9, 303-309 Koonin, E. V. & Dolja, V. V. (1993), Evolution and taxonomy of positive-strand RNA viruses: implication of comparative analysis of amino acid sequences. Crit. Rev. Biochem. Molec. Biol. 28, 375-430 Matthews, R. E. F., (1985a), Viral taxonomy for the non-virologist. Ann. Rev. Microbiol., 39, 451-474 Matthews, R. E. F., (1991), Plant Virology 3rd Edition, Chapter 17 (pp. 635-682). Academic Press, New York Murphy, F. A. & Kingsbury, D. W. (1990), Virus Taxonomy, in "Virology" (Fields, B. N., Knipe, D. M., Chanock, R. M., Hirsch, M. S., Melnick, J. L., Monath, T. P. & Roizman, B.) 2nd Edition (pp. 9-35). Raven Press, Ltd., New York Rybicki, E. (1990), The classification of organisms at the edge of life or problems with virus systematics. South African J. Sci., 86, 182-186 Strauss, J. H. & Strauss, E. G. (1988), Evolution of RNA viruses. Ann. Rev. Microbiol., 42, 657-683 Strauss, E. G., Strauss, J. H. & Levine, A. J. (1990), Virus evolution, in"Virology" (Fields, B. N., Knipe, D. M., Chanock, R. M., Hirsch, M. S., Melnick, J. L., Monath, T. P. & Roizman, B.), 2nd Edition, Chapter 9 (pp. 167- 190). Raven Press, Ltd., New York Ward, C. W. (1993), Progress towards a higher taxonomy of viruses. Res. Virol., 144, 419-453 Ward, C. W., Weiller, G., Shukla, D. D. & Gibbs, A. J. (1994), Molecular evolution of potyviruses, the largest plant virus family, in "Molecular basis of viral evolution" (Gibbs, A. J., Calisher, C. H. & Garcia-Arenal, F.). Cambridge University Press, Cambridge