MB&B 452a / 752a2

Fall 1999



Quick Links

Survey, "Demographics", Projects, Intro. lecture [html] [pdf], Sequences lectures [html] [pdf], Databases lectures [html] [pdf], Structures lectures [html] [pdf], Simulation lectures [html] [pdf], Fun Reading, Use of Overheads

Brief Description

Bioinformatics describes the computational analysis of gene sequences and protein structures on a large scale. Topics include sequence alignment, biological database design, geometric analysis of protein structure, and macromolecular simulation. [Blue Book Entry]

Timing and Location

Meeting from 1:00-2:15 PM on Mondays and Wednesday, in Bass 305.
MB&B Department, Bass building, Yale University, New Haven, CT 06520


Mark Gerstein
Bass 432A, Phone 203 432-6105, e-mail Mark.Gerstein@yale.edu

Office hours right after class

Handouts and readings with Joann Delvecchio <joann.delvecchio@yale.edu>
J W Gibbs 309C, 203 432-5566.

General Information

Course e-mail list at class@bioinfo.mbb.yale.edu with Hypertext archive of course messages.

The bioinformatics module will follow a very similar progression to the course offered last spring.
(See, in particular, http://bioinfo.mbb.yale.edu/course/classes.)

Also, see other related on-line lectures.

Related Coures


Research Jobs in Bioinformatics

If you're really motivated, take a look at http://bioinfo.mbb.yale.edu/jobs.

Use of Overheads and Other Course Materials

If you want to use the overheads in your own course, feel free, as long as you give proper attribution.
(A number of the overheads were derived from related courses at Stanford and are so acknowledged.)
Most of the reading material is copyright and can NOT be freely distributed. It should not be accessible outside of Yale.

Things to Do


Please e-mail back to Mark.Gerstein@yale.edu
See overall "course demographics" based on survey plus first quiz.

Attendence, class participation


2 short ones in class, probably for a third of the class. SIMPLE multiple choice questions that you should be able to answer from the lectures plus the main readings.

First quiz on Monday 8 November will cover Introduction and Sequences material.

Provisionally, second quiz on 1 December.

Final Project

At End of Reading Period. Provisionally, on Friday 10 December at noon
Turn in full printout to Joann in Gibbs 309C + upload
HTML, PDF, or TEXT document
Summarize and Review an area
Interpret and Analyze Data
Come up with a New Approach
Alignment Methods (sequence & structure)
Scoring Statistics (sequence & structure)
Protein Geometry (surfaces and volumes)
Databases (theory & application)
Also, Genomes, Pathways, Trees, Patterns, Docking, Modelling
~3-4 pages in total (no more than 1000 words or 4 pages)
Upload Details
Will be ported to course website and integrated with course materials.
See http://bioinfo.mbb.yale.edu/mbb452a/projects for upload form and location of project
Please test your upload before the deadline. You can repeatedly upload files. Only most recent will be used.
Please DO NOT send your project as a huge e-mail to Mark Gerstein.

"Hello World" in HTML


Overheads [html] [pdf 2.5Mb]


Overheads [html] [pdf 3.9Mb]

Sequence Topics

Basic Alignment via Dynamic Programming
Suboptimal Alignment
Gap Penalties
Similarity (PAM) Matrices
Multiple Alignment
Profiles, Motifs, HMMs
Local Alignment
Probabilistic Scoring Schemes
Rapid Similarity Search: Fasta
Rapid Similarity Search: Blast
Practical Suggestions on Sequence Searching
Transmembrane helix predictions
Secondary Structure Prediction: Basic GOR
Secondary Structure Prediction: Other Methods
Assessing Secondary Structure Prediction
Features of Genomic DNA Sequence

Sequence Alignment Required Reading

[1] Chapter 3 from Gribskov, M. and Devereux, J. (1992). Sequence Analysis Primer. New York, Oxford University Press.
(Focus on dynamic programming section of this chapter.)

[2] Needleman, S. B. and Wunsch, C. D. (1971). "A general method applicable to the search for similarities in the amino acid sequence of two proteins." J. Mol. Biol. 48: 443-453.
(The original paper. Still pretty easy to read. Will be used in class.)

[3] Smith, T. F. and Waterman, M. S. (1981). "Identification of common molecular subsequences." J. Mol. Biol. 147: 195-197
(The original paper on local alignment. Not quite as easy to read, but introduces this important concept.)

[4] Alschul et al. (1998). "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs," Nucleic Acids Res 1997 Sep 1;25(17):3389-402
** http://bioinfo.mbb.yale.edu/course/private-xxxx/altschul-nar-blast2.pdf

Scoring Required Reading

[5] Altschul, S. F., Boguski, M. S., Gish, W. and Wootton, J. C. (1994). Issues in searching molecular sequence databases. Nature Genetics. 6(2): 119-29.
(Most important. A short overall review.)

[6] M Levitt & M Gerstein (1998). A Unified Statistical Framework for Sequence Comparison and Structure Comparison. Proceedings of the National Academy of Sciences USA 95: 5913-5920
** http://bioinfo.mbb.yale.edu/e-print/statframe-pnas-reprint.pdf
(Understand the concept of P-value and the framework for deriving scoring statistics.)

[7] Pearson, W. R. (1996). Effective Protein Sequence Comparison. Meth. Enz. 266: 227-259.
(Understand how the FASTA e-value is derived.)

Multiple Alignment Required Reading

[8] Eddy, S. R. (1996). "Hidden Markov models," Curr. Opin. Struc. Biol. 6, 361-365.

[9] Higgins, D. G., Thompson, J. D. & Gibson, T. J. (1996). "Using CLUSTAL for multiple sequence alignments," Methods Enzymol 266, 383-402.

Secondary Structure Prediction Required Reading

[10]Garnier, J., Gibrat, J. F. & Robson, B. (1996b). "GOR method for predicting protein secondary structure from amino acid sequence," Methods Enzymol 266, 540-53.

[11] King, R. D. & Sternberg, M. J. E. (1996). "Identification and application of the concepts important for accurate and reliable protein secondary structure prediction," Prot. Sci. 5, 2298-2310.

Extra Sequences Reading

Frishman D, and Argos P. (1997) "The Future of Protein Secondary Structure Prediction Accuracy," Folding & Design 2:159-62.
(Controversial idea: secondary structure prediction to 80%?)
** http://bioinfo.mbb.yale.edu/mbb452a/reading/frishman-fad-acc-secstr.pdf

M Gerstein (1998). "Measurement of the Effectiveness of Transitive Sequence Comparison, through a Third ‘Intermediate’ Sequence," Bioinformatics 14: 707-14.
** http://bioinfo.mbb.yale.edu/e-print/transcmp-bioinfo-reprint.pdf


Overheads [html] [pdf 4.1Mb]

Database Topics

Structuring Information in Tables
Keys and Joins
Complex RDB encoding
Indexes and Optimization
Forms and Reports
Clustering & Trees
Function Classification and Orthologs
The Genomic vs. Single-molecule Perspective
Folds in Genomes, shared & common folds
Genome Trees
Bulk Structure Prediction
Extent of Fold Assignment: the Bias Problem
Correcting for Biases with Sampling
Cross-tabulation, folds and functions
Analysis of Expression Data
Analysis of Other Whole Genome Datasets

Databases Required Reading

[12] M Gerstein & W Krebs (1998). "A Database of Macromolecular Movements," Nuc. Acid. Res. 26:4280-4290.
** http://bioinfo.mbb.yale.edu/e-print/molmovdb-nar-reprint.pdf

[13] Korth & Silberschatz, Database System Concepts
(CS book on databases; Read pages 1 to 65 [sections 1.0 to mid-3.2] and pages 97 to 108 [part of section 4.1]. Some of the information on SQL is available from the on-line link below.)
** http://bioinfo.mbb.yale.edu/course/private-xxxx/sqltut.htm

Genome Surveys Required Reading

[14]Fred Tekaia, Antonio Lazcano & Bernard Dujon (1999). "The Genomic Tree as Revealed from Whole Proteome Comparisons," Genome Res. 9:550-557
** http://bioinfo.mbb.yale.edu/mbb452a/reading/dujon-genomeres-proteometree.pdf

[15] H Hegyi & M Gerstein (1999). "The Relationship between Protein Structure and Function: a Comprehensive Survey with Application to the Yeast Genome," J Mol. Biol. 228: 147-164.
** http://bioinfo.mbb.yale.edu/e-print/foldfunc-jmb/text-ff.pdf

[16] M Gerstein & H Hegyi (1998). "Comparing Microbial Genomes in terms of Protein Structure: Surveys of a Finite Parts List," FEMS Microbiology Reviews 22: 277-304.
** http://bioinfo.mbb.yale.edu/e-print/surveys-fems-preprint.pdf

Extra Database Surveys Readings

M Gerstein (1998). "Patterns of Protein-Fold Usage in Eight Microbial Genomes: A Comprehensive Structural Census," Proteins 33: 518-534.
** http:// bioinfo.mbb.yale.edu/course/private-xxxx/proteins_33_518.pdf
(This is an example of the application of large-scale, database-style calculations.)

Tomb, J.-F., White, O., Kerlavage, A. R., Clayton, R. A., Sutton, G. G., Fleischmann, R. D., Ketchum, K. A., Klenk, H. P., Gill, S., Dougherty, B. A., Nelson, K., Quackenbush, J., Zhou, L., Kirkness, E. F., Peterson, S., Loftus, B., Richardson, D., Dodson, R., Khalak, H. G., Glodek, A., McKenney, K., Fitzegerald, L. M., Lee, N., Adams, M. D., Hickey, E. K., Berg, D. E., Gocayne, J. D., Utterback, T. R., Peterson, J. D., Kelley, J. M., Cotton, M. D., Weidman, J. M., Fujii, C., Bowman, C., Watthey, L., Wallin, E., Hayes, W. S., Borodovsky, M., Karp, P. D., Smith, H. O., Fraser, C. M. & Venter, J. C. (1997). "The complete genome sequence of the gastric pathogen Helicobacter pylori," Nature 388, 539-547.
** http://bioinfo.mbb.yale.edu/course/private-xxxx/nature-hpylori.pdf
** http://bioinfo.mbb.yale.edu/course/private-xxxx/nature-sum-hpylori.html
** http://www.nature.com/Nature2/serve?SID=&CAT=NatGen&PG=pylori/pylori1.html
(This research article describes one of the recent genome sequences.)

Cavalli-Sforza, L. & Edwards, S. (1967). "Phylogenetic analysis: models and estimation procedures," Evolution 21, 550-570.

M Gerstein (1998). "How Representative are the Known Structures of the Proteins in a Complete Genome? A Comprehensive Structural Census," Folding & Design 3: 497-512.
** http://bioinfo.mbb.yale.edu/e-print/pdb-v-gen-folddes/fad-3-497-reprint.pdf

Fitch, W. M. (1971). "Toward defining the course of evolution: minimum change for a specific topology," Syst. Zool. 20, 406-416.

Short review by R Young at MIT on Gene Chips, http://bioinfo.mbb.yale.edu/mbb452a/reading/young-tigs-chips.pdf

Swofford et al. (1994). "Phylogeny reconstruction," In Molecular Systematics (2nd ed.), Sinauer Press.
(This book chapter is a good reference thought not a neccessary reading.)


Overheads [html] [pdf 4.8Mb]

Structures Topics

What Structures Look Like?
RMS Superposition
Structural Alignment by Iterated Dynamic Programming
Scoring Structural Similarity
Fold Library
Relation of Sequence Similarity to Structural and Functional Similarity
Protein Geometry
Calculation of Surface Area
Calculation of Volume
Standard Volumes and Radii

Structure Alignment Required Reading

[17] Holm, L. and Sander, C. (1993). Protein Structure Comparison by Alignment of Distance Matrices. J. Mol. Biol. 233: 123-128.
(A different method of structural alignment, which differs more from sequence alignment.)

[18] M Gerstein & M Levitt (1998). "Comprehensive Assessment of Automatic Structural Alignment against a Manual Standard, the Scop Classification of Proteins," Protein Science 7: 445-456.
** http://bioinfo.mbb.yale.edu/~mbg/preprint/ss-prsci.pdf
(Understand the method, not results, in this paper OR in Gerstein & Levitt (1996), below)

Geometry Required Reading

[20] M Gerstein & F M Richards, "Protein Geometry: Volumes, Areas, and Distances," (2000) chapter 22 of volume F of the International Tables for Crystallography ("Molecular Geometry and Features" in "Macromolecular Ccrystallography")
** http://bioinfo.mbb.yale.edu/e-print/geom-inttab/geom-inttab.pdf

Extra Structures Reading

Taylor, W. R. & Orengo, C. A. (1989). Protein Structure Alignment. J. Mol. Biol. 208, 1-22.

Kuntz, I. D. (1992). Structure-Based Strategies for Drug Design and Discovery. Science 257, 1078-1082.
(Docking. See link below for more information.)
** http://www.cmpharm.ucsf.edu/kuntz

Richards, F. M. (1977). Areas, Volumes, Packing, and Protein Structure. Ann. Rev. Biophys. Bioeng. 6, 151-76.
** http://bioinfo.mbb.yale.edu/course/private-xxxx/richards-annrev-areas.pdf

Richards, F. M. (1974). The Interpretation of Protein Structures: Total Volume, Group Volume Distributions and Packing Density. J. Mol. Biol. 82, 1-14.
(Original Application of Voronoi Method to Proteins. See Int. Tabl. document above for more details on method.)

Pattabiraman, N., Ward, K.B. and Fleming, P.J. (1995) Occluded Molecular Surface: Analysis of Protein Packing, Journal of Molecular Recognition, 8:334-344
** http://bioinfo.mbb.yale.edu/course/private-xxxx/fleming-os.pdf
http://csbmet.csb.yale.edu/userguides/datamanip/os/os_descrip.html  -- OS

Joan Pontius, Jean Richelle, Shoshana J. Wodak (1996). Deviations from Standard Atomic Volumes as a Quality Measure for Protein Crystal Structures. Journal of Molecular Biology 264: 121-136.
** http://bioinfo.mbb.yale.edu/mbb452a/reading/wodak-jmb-volume.pdf

Barry Cipra (1998). “Packing Challenge Mastered At Last,” Science 281: 1267
** http://www.sciencemag.org/cgi/content/full/281/5381/1267

Simon Singh (1998). “Mathematics ‘Proves’ What the Grocer Always Knew,” New York Times (August 25).
** http://bioinfo.mbb.yale.edu/mbb452a/reading/nyt-sci-packproof.txt


Overheads [html] [pdf 5.5Mb]

Simulation Topics

Basic Forces: Electrostatics
VDW Forces
Bonds as Springs
Energy Minimization
Monte Carlo
Molecular Dynamics
Energy and Entropy
Parameter Sets
Number Density
Poisson-Boltzman Equation
Lattice Models and Simplification

Simulation Required Reading

[21] M Gerstein & M Levitt (1998). "Simulating Water and the Molecules of Life," Scientific American 279: 100-105.
** http://bioinfo.mbb.yale.edu/geometry/sciam

[22] McCammon, J. A. & Harvey, S. C. (1987). Dynamics of Proteins and Nucleic Acids. Cambridge UP.

[23] Honig, B. & Nicholls, A. (1995). Classical electrostatics in biology and chemistry. Science 268, 1144-9.

Extra Simulation Readings

Information on Liquid Simulation Methods (excerpted from a thesis, 1992)

Levitt, M. (1983). Protein folding by restrained energy minimization and molecular dynamics. J Mol Biol 170, 723-64.

Allen, M. P. & Tildesley, D. J. (1987). Computer Simulation of Liquids. Claredon Press, Oxford. (A good reference.)

Karplus, M. & McCammon, J. A. (1986). The dynamics of proteins. Sci. Am. 254, 42-51. (A good reference.)

Duan, Y. & Kollman, P. A. (1998). Pathways to a protein folding intermediate observed in a 1-microsecond simulation in aqueous solution Science 282, 740-4.
** http://bioinfo.mbb.yale.edu/course/private-xxx/kollman-science-longsim.pdf
** http://www.sciencemag.org/cgi/content/abstract/282/5389/740

Sharp, K. (1999). Electrostatic Interactions in Proteins. In International Tables for Crystallography, International Union of Crystallography, Chester, UK.

Dill, K. A., Bromberg, S., Yue, K., Fiebig, K. M., Yee, D. P., Thomas, P. D. & Chan, H. S. (1995). Principles of protein folding--a perspective from simple exact models. Protein Sci 4, 561-602.

Franks, F. (1983). Water. The Royal Society of Chemistry, London. Pages 35-56.

"Fun" Pop Reading (Extra)

"Fathering life and other feats," Economist, 2 February 1999
(About synethetized M. genitalium)
** http://bioinfo.mbb.yale.edu/mbb452a/reading/mailbox-archive.mbx

"The Gutenberg Internet," Wall Street Jounal, June 11, 1999
** http://bioinfo.mbb.yale.edu/mbb452a/reading/mailbox-archive.mbx

"The hot new job in agriculture is bioinformatics," Work Week, Wall Street Jounal, August 17, 1999, A1
** http://bioinfo.mbb.yale.edu/mbb452a/reading/mailbox-archive.mbx

Antonio Regalado (1999), "Mining the Genome," MIT TechReview, Sept/Oct. issue.
** http://www.techreview.com/articles/oct99/regalado.htm
** http://bioinfo.mbb.yale.edu/mbb452a/reading/regalado-techrev-bioinfo.txt

Charles C. Mann, "Biotech Goes Wild," TechReview, July/August
** http://www.techreview.com/articles/july99/mann.htm

** http://bioinfo.mbb.yale.edu/mbb452a/reading/mailbox-archive.mbx

Economist, 6/28/99, "Science & Technology: Drowning in data"
** http://www.economist.com/editorial/freeforall/current/st2340.html
** http://bioinfo.mbb.yale.edu/mbb452a/reading/economist-bioinfo.txt

GEORGE JOHNSON, "Searching for the Essence of the World Wide Web," April 11, 1999
** http://www.nytimes.com/library/review/041199internet-ecosystem-review.html

HENRY FOUNTAIN, "Hiding Secret Messages Within Human Code," New York Times, June 22, 1999, F5
** http://bioinfo.mbb.yale.edu/mbb452a/reading/mailbox-archive.mbx

J L Weldon. "A Career in Data Modeling," Byte, June 1997
(Practical hands-on discussion of data modeling in commercial context, many of the same issues apply in bioinformatics.)
** http://www.byte.com/art/9706/sec7/art3.htm

J L Weldon. "Data Warehouse Building Blocks," Byte, January 1997
** http://www.byte.com/art/9701/sec7/art1.htm

J L Weldon. "Warehouse Cornerstones," Byte, January 1997
(Other, less relevant articles, on the some of the practical hardware issues in database design.)
** http://www.byte.com/art/9701/sec7/art2.htm

J L Weldon. "RDBMSes Get a Make-Over," Byte, April 1997
(Practical discussion of what an object database is.)
** http://www.byte.com/art/9704/sec7/art7.htm

Johnson, G. (1997). "Proteins Outthink Computers in Giving Shape to Life," New York Times. March 25, 1997, C1.
** http://bioinfo.mbb.yale.edu/course/private-xxxx/nyt-casp2.html

Johnson, G. (1997). "Proteins Outthink Computers in Giving Shape to Life," New York Times. March 25, 1997, C1.
** http://bioinfo.mbb.yale.edu/course/private-xxxx/nyt-casp2.html

L Hunter (ed), AI and Molecular Biology, AAAI Press (A new intro. text)
** http://www.aaai.org/Press/Books/Hunter/hunter-contents.html

L. Fisher (1999). "Surfing the Human Genome; Data Bases of Genetic Code Are Moving to the Web," New York Times. 09/20/99, C1

Langreth, R. (1997). "Scientists Unlock Sequence Of Ulcer Bacterium's Genes," Wall Street Journal. 7 August.
** http://bioinfo.mbb.yale.edu/course/private-xxxx/wsj-hpylori.txt

Lisa Belkin, "Splice Einstein and Sammy Glick. Add a Little Magellan," New York Times Magazine, 08/23/98, Page 26 (Article on J C Venter)
** http://bioinfo.mbb.yale.edu/mbb452a/reading/mailbox-archive.mbx

M Gerstein (1999). "E-publishing on the Web: Promises, Pitfalls, and Payoffs for Bioinformatics," Bioinformatics 15: 429-431.
** http://bioinfo.mbb.yale.edu/e-print/epub-ed-bioinfo

MARLISE SIMONS, "Team of Scientists to Prepare a Rolodex of Life on Earth," New York Times, July 27, 1999, F2
** http://bioinfo.mbb.yale.edu/mbb452a/reading/mailbox-archive.mbx

N Wade, "Who'll Sequence Human Genome First? It's Up to Phred," New York Times, March 23, 1999, F2
** http://bioinfo.mbb.yale.edu/mbb452a/reading/mailbox-archive.mbx

NICHOLAS WADE, "Cambridge Lab Keeps Britain Ahead in Genome Stakes," New York Times, October 6, 1998
** http://www.nytimes.com/library/national/science/100698sci-sanger.html

NICHOLAS WADE, "Gains Are Reported in Decoding Genome," New York Times, May 22, 1999, A4
** http://bioinfo.mbb.yale.edu/mbb452a/reading/mailbox-archive.mbx

PAMELA LICALZI O'CONNELL "Beyond Geography: Mapping Unknown of Cyberspace," New York Times, September 30, 1999
** http://www.nytimes.com/library/tech/99/09/circuits/articles/30maps.html
** http://graphics.nytimes.com/library/tech/99/09/circuits/articles/30maps.2.jpg

Pollack, A. (1998). Drug Testers Turn to'Virtual Patients' as Guinea Pigs. New York Times, Nov. 10
** http://www.nytimes.com/library/tech/98/11/biztech/articles/10health-virtual.html
** http://bioinfo.mbb.yale.edu/course/private-xxx/pollack-nytimes-bioinfo.html

Primer on Molecular Genetics from the DOE
** http://www.bis.med.jhmi.edu/Dan/DOE/intro.html

ROBERT LANGRETH, "CuraGen's Finds 55,000 Variations Of Genes, Auguring Tailored Drugs," Wall Street Jounal, August 16, 1999
** http://bioinfo.mbb.yale.edu/mbb452a/reading/mailbox-archive.mbx

Steven Vogel, "Academically Correct Biological Science", American Scientist, November-December 1998
** http://www.amsci.org/amsci/issues/macroscope/macroscope98-11.html

Tanouye, E. & Langreth, R. (1998). "SmithKline-Glaxo Deal Driven By the Hunt for Human Genes," Wall Street Journal. February 2.
** http://bioinfo.mbb.yale.edu/course/private-xxxx/wsj-drug-merge.txt

Wade, N. (1997). "Now Playing at a Nearby Lab : 'Revenge of the Fly People,'" New York Times. 05/20/97, C1.
** http://bioinfo.mbb.yale.edu/course/private-xxxx/nyt-flybase.txt

Wade, N. (1997). "Scientists Map Ulcer Bacterium's Genetic Code," New York Times. August 7.
** http://bioinfo.mbb.yale.edu/course/private-xxxx/nyt-hpylori.html

Wade, N. (1997). "Thinking Small Paying Off Big In Gene Quest," New York Times. 02/03/97, A1.
** http://bioinfo.mbb.yale.edu/course/private-xxxx/nyt-pathogens-genomes.txt

WILLIAM K. STEVENS, "Rearranging the Branches on a New Tree of Life," August 31, 1999, F1
** http://bioinfo.mbb.yale.edu/mbb452a/reading/mailbox-archive.mbx


The DNA-mouse image is adapted from the GCB-98 homepage. What's wrong with the adaptation?

[home]  Lab Home