Abola EE, S. J., Prilusky J, Manning NO (1997). Protein Data Bank archives of three-dimensional macromolecular structures. Meth. Enz. 277, 556-571.

Altman, R. & Gerstein, M. (1994). Finding an Average Core Structure: Application to the Globins. In Proceedings of the Second International Conferene on Intelligent Systems in Molecular Biology, pp. 19-27, AAAI Press, Menlo Park, CA.

Altschul, S. F., Boguski, M. S., Gish, W. & Wootton, J. C. (1994). Issues in searching molecular sequence databases. [Review]. Nature Genetics 6, 119-29.

Argos, P. (1988). An investigation of protein subunit and domain interfaces. Prot. Eng. 2, 101-113.

Arkin, I., Brunger, A. & Engelman, D. (1997). Are there dominant membrane protein families with a given number of helices? Proteins 28, 465-466.

Benner, S. A., Badcoe, I., Cohen, M. A. & Gerloff, D. L. (1994). Bona fide prediction of aspects of protein conformation. Assigning interior and surface residues from patterns of variation and conservation in homologous protein sequences. J Mol Biol 235, 926-58.

Benner, S. A., Cohen, M. A. & Gerloff, D. (1992). Correct structure prediction? Nature 359, 781.

Benner, S. A. & Gerloff, D. L. (1993). Predicting the conformation of proteins. Man versus machine. FEBS Lett 325, 29-33.

Berman, A. L., Kolker, E. & Trifonov, E. N. (1994). Underlying order in protein sequence organization. Proc Natl Acad Sci U S A 91, 4044-7.

Blaisdell, B. E., Campbell, A. M. & Karlin, S. (1996). Similarities and dissimilarities of phage genomes. Proceedings of the National Academy of Sciences of the United States of America 93, 5854-9.

Blattner, F. R., III, G. P., Bloch, C. A., Perna, N. T., Burland, V., Riley, M., Collado-Vides, J., Glasner, J. D., Rode, C. K., Mayhew, G. F., Gregor, J., Davis, N. W., Kirkpatrick, H. A., Goeden, M. A., Rose, D. J., Mau, B. & Shao, Y. (1997). The Complete Genome Sequence of Escherichia coli K-12. Science 277, 1453-1462.

Boberg, J., Salakoski, T. & Vihinen, M. (1992). Selection of a representative set of structures from Brookhaven Protein Data Bank. Proteins 14, 265-76.

Bork, P., Ouzounis, C., Sander, C., Scharf, M., Schneider, R. & Sonnhammer, E. (1992a). Comprehensive sequence analysis of the 182 predicted open reading frames of yeast chromosome iii. Protein Science 1, 1677-1690.

Bork, P., Ouzounis, C., Sander, C., Scharf, M., Schneider, R. & Sonnhammer, E. (1992b). What's in a genome? Nature 358, 287.

Bowie, J. U. & Eisenberg, D. (1993). Inverted protein structure prediction. Curr Opin Struct Biol 3, 437-444.

Boyd, D., Schierle, C. & Beckwith, J. (1998). How many membrane proteins are there? Prot. Sci. 7, 201-205.

Brenner, S., Chothia, C. & Hubbard, T. (1998). Assessing Sequence Comparison Methods. Proc. Natl. Acad. Sci. USA (submitted).

Brenner, S., Chothia, C., Hubbard, T. J. P. & Murzin, A. G. (1996). Understanding Protein Structure: Using Scop for Fold Interpretation. Meth. Enz. 266, 635-642.

Brenner, S., Hubbard, T., Murzin, A. & Chothia, C. (1995). Gene Duplication in H. Influenzae. Nature 378, 140.

Brenner, S. E., Chothia, C. & Hubbard, T. J. (1997). Population statistics of protein structures: lessons from structural classifications [In Process Citation]. Curr Opin Struct Biol 7, 369-76.

Bryant, S. H. & Altschul, S. F. (1995). Statistics of sequence-structure threading. Curr Opin Struct Biol 5, 236-44.

Bult, C. J., White, O., Olsen, G. J., Zhou, L., Fleischmann, R. D., Sutton, G. G., Blake, J. A., Fitzgerald, L. M., Clayton, R. A., Gocayne, J. D., Kerlavage, A. R., Dougherty, B. A., Tomb, J.-F., Adams, M. D., Reich, C. I., Overbeek, R., Kirkness, E. F., Weinstock, K. G., Merrick, J. M., Glodek, A., Scott, J. L., Geohagen, N. S. M., Weidman, J. F., Fuhrmann, J. L., Nguyen, D., Utterback, T. R., Kelley, J. M., Peterson, J. D., Sadow, P. W., Hanna, M. C., Cotton, M. D., Roberts, K. M., Hurst, M. A., Kaine, B. P., Borodovsky, M., Klenk, H.-P., Fraser, C. M., Smith, H. O., Woese, C. R. & Venter, J. C. (1996). Complete Genome Sequence of the Methanogenic Archaeon, Methanococcus jannaschii. Science 273, 1058-1073.

Casari, G., Andrade, M., Bork, P., Boyle, J., Daruvar, A., Ouzounis, C., Schneider, R., Tamames, J., Valencia, A. & Sander, C. (1995). Challenging times for bioinformatics. Nature 376, 647-648.

Chakrabartty, A., Kortemme, T. & Baldwin, R. L. (1994). Helix propensities of the amino acids measured in alanine-based peptides without helix-stabilizing side-chain interactions. Protein Science 3, 843-52.

Das, S., Yu, L., Gaitatzes, C., Rogers, R., Freeman, J., Bienkowska, J., Adams, R. M., Smith, T. F. & Lindelien, J. (1997). Biology's new Rosetta stone [letter]. Nature 385, 29-30.

Doolittle, R. F. (1997). A bug with excess gastric avidity [news; comment]. Nature 388, 515-6.

Dubchak, I., Holbrook, S. R. & Kim, S. H. (1993). Prediction of protein folding class from amino acid composition. Proteins 16, 79-91.

Dubchak, I., Muchnik, I., Holbrook, S. R. & Kim, S. H. (1995). Prediction of protein folding class using global description of amino acid sequence. Proc Natl Acad Sci U S A 92, 8700-4.

Eddy, S. R. (1996). Hidden Markov models. Curr. Opin. Struc. Biol. 6, 361-365.

Engelman, D. M., Steitz, T. A. & Goldman, A. (1986). Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins. [Review]. Annual Review of Biophysics & Biophysical Chemistry 15, 321-53.

Felsenstein, J. (1989). PHYLIP — Phylogeny Inference Package (Verstion 3.2). Cladistics 5, 164-166.

Felsenstein, J. (1993). PHYLIP (Phylogeny Inference Package) version 3.5c. In pp. Department of Genetics, University of Washington, Seattle.

Fischer, D. & Eisenberg, D. (1997). Assigning folds to the proteins encoded by the genome of mycoplasma genitalium [In Process Citation]. Proc Natl Acad Sci U S A 94, 11929-34.

Fleischmann, R. D., Adams, M. D., White, O., Clayton, R. A., Kirkness, E. F., Kerlavage, A. R., Bult, C. J., Tomb, J. F., Dougherty, B. A., Merrick, J. M., McKenney, K., Sutton, G., Fitzhugh, W., Fields, C., Gocayne, J. D., Scott, J., Shirley, R., Liu, L. I., Glodek, A., Kelley, J. M., Weidman, J. F., Phillips, C. A., Spriggs, T., Hedblom, E., Cotton, M. D., Utterback, T. R., Hanna, M. C., Nguyen, D. T., Saudek, D. M., Brandon, R. C., Fine, L. D., Fritchman, J. L., Fuhrmann, J. L., Geoghagen, N. S. M., Gnehm, C. L., McDonald, L. A., Small, K. V., Fraser, C. M., Smith, H. O. & Venter, J. C. (1995). Whole-genome random sequencing and assembly of haemophilus influenzae rd. Science (Washington D C) 269, 496-512.

Fraser, C. M., Gocayne, J. D., White, O., Adams, M. D., Clayton, R. A., Fleischmann, R. D., Bult, C. J., Kerlavage, A. R., Sutton, G., Kelley, J. M. & et al. (1995). The minimal gene complement of Mycoplasma genitalium [see comments]. Science 270, 397-403.

Gaasterland, T. & Sensen, C. W. (1996). Fully automated genome analysis that reflects user needs and preferences. A detailed introduction to the MAGPIE system architecture. Biochimie 78, 302-10.

Garnier, J., Gibrat, J. F. & Robson, B. (1996). GOR method for predicting protein secondary structure from amino acid sequence. Meth. Enz. 266, 540-553.

Garnier, J., Osguthorpe, D. & Robson, B. (1978). Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J. Mol. Biol. 120, 97-120.

Gerstein (1997). A Structural Census of Genomes: Comparing Eukaryotic, Bacterial and Archaeal Genomes in terms of Protein Structure. J. Mol. Biol. 274, 562-576.

Gerstein, M. & Altman, R. (1995). Average core structures and variability measures for protein families: Application to the immunoglobulins. J. Mol. Biol. 251, 161-175.

Gerstein, M., Lesk, A. M., Baker, E. N., Anderson, B., Norris, G. & Chothia, C. (1993). Domain Closure in Lactoferrin: Two Hinges produce a See-saw Motion between Alternative Close-Packed Interfaces. J. Mol. Biol. 234, 357-372.

Gerstein, M., Lesk, A. M. & Chothia, C. (1994). Structural Mechanisms for Domain Movements. Biochemistry 33, 6739-6749.

Gerstein, M. & Levitt, M. (1996). Using Iterative Dynamic Programming to Obtain Accurate Pairwise and Multiple Alignments of Protein Structures. In Proc. Fourth Int. Conf. on Intell. Sys. Mol. Biol., pp. 59-67, AAAI Press, Menlo Park, CA.

Gerstein, M. & Levitt, M. (1997). A Structural Census of the Current Population of Protein Sequences. Proc. Natl. Acad. Sci. USA 94, 11911-11916.

Gerstein, M. & Levitt, M. (1998). Comprehensive Assessment of Automatic Structural Alignment against a Manual Standard, the Scop Classification of Proteins. Protein Science (in press).

Gibrat, J., Garnier, J. & Robson, B. (1987). Further developments of protein secondary structure prediction using information theory. J. Mol. Biol. 198, 425-443.

Goffeau, A., Barrell, B. G., Bussey, H., Davis, R. W., Dujon, B., Feldmann, H., Galibert, F., Hoheisel, J. D., Jacq, C., Johnston, M., Louis, E. J., Mewes, H. W., Murakami, Y., Philippsen, P., Tettelin, H. & Oliver, S. G. (1996). Life with 6000 Genes. Science 274, 546-567.

Goffeau, A. & names], e. a. (1997). The Yeast Genome Directory. Nature 387(Supp), 5-105.

Goffeau, A., Slonimski, P., Nakai, K. & Risler, J. L. (1993). How Many Yeast Genes Code for Membrane-Spanning Proteins? Yeast 9, 691-702.

Himmelreich, R., Hilbert, H., Plagens, H., Pirkl, E., Li, B. C. & Herrmann, R. (1996). Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae. Nucleic Acids Res 24, 4420-49.

Hobohm, U. & Sander, C. (1994). Enlarged representative set of protein structures. Protein Science 3, 522.

Hobohm, W., Scharf, M., Schneider, R. & Sander, C. (1992). Selection of representative protein data sets. Prot. Sci. 1, 409-417.

Hubbard, T. J. P., Murzin, A. G., Brenner, S. E. & Chothia, C. (1997). SCOP: a structural classification of proteins database. Nucleic Acids Res 25, 236-9.

Jones, S. & Thornton, J. (1996). Principles of protein-protein interactions. Proc. Natl. Acad. Sci. USA 93, 13-20.

Kaneko, T., Sato, S., Kotani, H., Tanaka, A., Asamizu, E., Nakamura, Y., Miyajima, N., Hirosawa, M., Sugiura, M., Sasamoto, S., Kimura, T., Hosouchi, T., Matsuno, A., Muraki, A., Nakazaki, N., Naruo, K., Okumura, S., Shimpo, S., Takeuchi, C., Wada, T., Watanabe, A., Yamada, M., Yasuda, M. & Tabata, S. (1996). Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions. DNA Res 3, 109-36.

Karlin, S. & Altschul, S. F. (1993). Applications and statistics for multiple high-scoring segments in molecular sequences. Proceedings of the National Academy of Sciences of the United States of America 90, 5873-7.

Karlin, S. & Burge, C. (1995). Dinucleotide relative abundance extremes: a genomic signature. [Review]. Trends in Genetics 11, 283-90.

Karlin, S., Burge, C. & Campbell, A. M. (1992). Statistical analyses of counts and distributions of restriction sites in DNA sequences. Nucleic Acids Research 20, 1363-70.

Karlin, S., Mrazek, J. & Campbell, A. M. (1996). Frequent oligonucleotides and peptides of the haemophilus influenzae genome. Nucleic Acids Research 24, 4263-4272.

Kaufman, L. & Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, New York.

King, R. D. & Sternberg, M. J. E. (1996). Identification and application of the concepts important for accurate and reliable protein secondary structure prediction. Prot. Sci. 5, 2298-2310.

Koonin, E. V., Mushegian, A. R. & Rudd, K. E. (1996). Sequencing and analysis of bacterial genomes. Curr Biol 6, 404-16.

Levitt, M. & Chothia, C. (1976). Structural patterns in globular proteins. Nature 261, 552-558.

Levitt, M. & Gerstein, M. (1998). A Unified Statistical Framework for Sequence Comparison and Structure Comparison. Proceedings of the National Academy of Sciences USA (in press).

Lipman, D. J. & Pearson, W. R. (1985). Rapid and sensitive protein similarity searches. Science 227, 1435-1441.

Medigue, C., Moszer, I., Viari, A. & Danchin, A. (1995). Analysis of a Bacillus subtilis genome fragment using a co-operative computer system prototype. Gene 165, GC37-51.

Metfessel, B. A., Saurugger, P. N., Connelly, D. P. & Rich, S. S. (1993). Cross-validation of protein structural class prediction using statistical clustering and neural networks. Protein Sci 2, 1171-82.

Murzin, A., Brenner, S. E., Hubbard, T. & Chothia, C. (1995). SCOP: A Structural Classification of Proteins for the Investigation of Sequences and Structures. J. Mol. Biol. 247, 536-540.

Netzer, W. J. & Hartl, F. U. (1997). Recombination of protein domains facilitated by co-translational folding in eukaryotes [see comments]. Nature 388, 343-9.

Olsen, G. J., Woese, C. R. & (1994)., R. O. (1994). J. Bacteriol. 176, 1-6.

Ouzounis, C., Bork, P., Casari, G. & Sander, C. (1995). New protein functions in yeast chromosome VIII. Protein Sci. 4, 2424-2428.

Pearson, W. R. (1996). Effective Protein Sequence Comparison. Meth. Enz. 266, 227-259.

Pearson, W. R. (1997). Identifying distantly related protein sequences. Comput Appl Biosci 13, 325-32.

Pearson, W. R. & Lipman, D. J. (1988). Improved Tools for Biological Sequence Analysis. Proc. Natl. Acad. Sci. USA 85, 2444-2448.

Rost, B. (1996). PHD: Predicting One-dimensional Protein Secondary Structure by Profile-Based Neural Networks. Meth. Enz. 266, 525-539.

Rost, B., Fariselli, P. & Casadio, R. (1996). Topology prediction for helical transmembrane segments at 95% accuracy. Prot. Sci. 7, 1704-1718.

Rost, B., Fariselli, P., Casadio, R. & Sander, C. (1995). Prediction of helical transmembrane segments at 95% accuracy. Prot. Sci. 4, 521-533.

Rost, B. & Sander, C. (1992). Jury returns on structure prediction. Nature 360, 540.

Rost, B. & Sander, C. (1993). Prediction of protein secondary structure at better than 70% accuracy. J. Mol. Biol. 232, 584-599.

Salamov, A. & Solovyev, V. (1995). Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. J. Mol. Biol. 247, 11-15.

Scharf, M., Schneider, R., Casari, G., Bork, P., Valencia, A., Ouzounis, C. & Sander, C. (1994). GeneQuiz: a workbench for sequence analysis. In Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 348-353, AAAI Press, Menlo Park, California.

Smith, C. K., Withka, J. M. & Regan, L. (1994). A thermodynamic scale for the beta-sheet forming tendencies of the amino acids. Biochemistry 33, 5510-7.

Stampf, D. R., Felder, C. E. & Sussman, J. L. (1995). PDBbrowse--a graphics interface to the Brookhaven Protein Data Bank. Nature 374, 572-4.

Tatusov, R. L., Altschul, S. F. & Koonin, E. V. (1994). Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. Proc Natl Acad Sci U S A 91, 12091-5.

Tatusov, R. L., Koonin, E. V. & Lipman, D. J. (1997). A genomic perspective on protein families. Science 278, 631-7.

Tomb, J.-F., White, O., Kerlavage, A. R., Clayton, R. A., Sutton, G. G., Fleischmann, R. D., Ketchum, K. A., Klenk, H. P., Gill, S., Dougherty, B. A., Nelson, K., Quackenbush, J., Zhou, L., Kirkness, E. F., Peterson, S., Loftus, B., Richardson, D., Dodson, R., Khalak, H. G., Glodek, A., McKenney, K., Fitzegerald, L. M., Lee, N., Adams, M. D., Hickey, E. K., Berg, D. E., Gocayne, J. D., Utterback, T. R., Peterson, J. D., Kelley, J. M., Cotton, M. D., Weidman, J. M., Fujii, C., Bowman, C., Watthey, L., Wallin, E., Hayes, W. S., Borodovsky, M., Karpk, P. D., Smith, H. O., Fraser, C. M. & Venter, J. C. (1997). The complete genome sequence of the gastric pathogen Helicobacter pylori. Nature 388, 539-547.

Wall, L., Christiansen, D. & Schwartz, R. (1996). Programming Perl. O'Reilly and Associates, Sebastapol, CA.

Weiss, M. S., Abele, U., Weckesser, J., Welte, W., Schiltz, E. & Schulz, G. E. (1991). Molecular architecture and electrostatic properties of a bacterial porin. Science 254, 1627-30.

Wootton, J. C. (1994). Non-globular domains in protein sequences: automated segmentation using complexity measures. Comput Chem 18, 269-85.

Wootton, J. C. & Federhen, S. (1993). Statistics of local complexity in amino acid sequences and sequence databases. Computers and Chemistry 17, 149-163.

Wootton, J. C. & Federhen, S. (1996). Analysis of compositionally biased regions in sequence databases. Methods Enzymol 266, 554-71.