MB&B [47]47b4 - BIOINFORMATICS

Synopsis of Lectures

General

Course Projects, Mailing List, &c

Class 1

"What is Bioinformatics?"

Types of Molecular Biology Information. The Range of Calculations in Bioinformatics, Three Major Application Areas in Bioinformatics.

Lecture Notes [html-with-frames] [pdf]

Class 2

Sequence Alignmnent

Sequence Similiarity, Sequence Comparison via Dynamic Programming.

Lecture Notes [html-with-frames] [pdf]
Extra Notes [pdf]

Blast Search

Basic: http://www.ncbi.nlm.nih.gov/cgi-bin/BLAST/nph-blast
Test Data: http://bioinfo.mbb.yale.edu/course/classes/c2-testdata.txt
(Look at Advanced Blast too)

Readings

(For next Monday)

Chapter 3 from Gribskov, M. and Devereux, J. (1992). Sequence Analysis Primer. New York, Oxford University Press.
(Focus on dynamic programming section of this chapter.)

Needleman, S. B. and Wunsch, C. D. (1971). "A general method applicable to the search for similarities in the amino acid sequence of two proteins." J. Mol. Biol. 48: 443-453.
(The original paper. Still pretty easy to read. Will be used in class.)

Smith, T. F. and Waterman, M. S. (1981). "Identification of common molecular subsequences." J. Mol. Biol. 147: 195-197
(The original paper on local alignment. Not quite as easy to read, but introduces this important concept.)

Class 3

Sequence Alignment II

Sequence Comparison via Dynamic Programming. Issues in Sequence Comparison. Mutation Matrix. Local vs. Global Alignment. Low-complexity Regions. Basic Structures.

Lecture Notes [html-with-frames] [pdf]
Extra Notes [pdf]

Links

Alignment Tutorial

Main Readings

Altschul, S. F., Boguski, M. S., Gish, W. and Wootton, J. C. (1994). Issues in searching molecular sequence databases. [Review]. Nature Genetics. 6(2): 119-29.
(Most important. A short overall review.)

M Levitt & M Gerstein (1998). A Unified Statistical Framework for Sequence Comparison and Structure Comparison. Proceedings of the National Academy of Sciences USA 95: 5913-5920
** http://bioinfo.mbb.yale.edu/e-print/statframe-pnas-reprint.pdf
(Understand the concept of P-value and the framework for deriving scoring statistics.)

Holm, L. and Sander, C. (1993). Protein Structure Comparison by Alignment of Distance Matrices. J. Mol. Biol. 233: 123-128.
(A different method of structural alignment, which differs more from sequence alignment.)

Other Readings

M Gerstein & M Levitt (1998). "Comprehensive Assessment of Automatic Structural Alignment against a Manual Standard, the Scop Classification of Proteins," Protein Science 7: 445-456.
** http://bioinfo.mbb.yale.edu/~mbg/preprint/ss-prsci.pdf
(Understand the method, not results, in this paper OR in Gerstein & Levitt (1996), below)

Pearson, W. R. (1996). Effective Protein Sequence Comparison. Meth. Enz. 266: 227-259.
(Understand how the FASTA e-value is derived.)

Alschul et al. (1998). "Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs," Nucleic Acids Res 1997 Sep 1;25(17):3389-402
** http://bioinfo.mbb.yale.edu/course/private-xxxx/altschul-nar-blast2.pdf

M Gerstein & M Levitt (1996). "Using Iterative Dynamic Programming to Obtain Accurate Pairwise and Multiple Alignments of Protein Structures," in Proceedings of the Fourth International Conference on Intelligent Systems in Molecular Biology, 59-67 (Menlo Park, CA, AAAI Press, June 12-15).
** http://hyper.stanford.edu/~mbg/Align/ismb96

Class 4

Scoring Schemes

Blast, FASTA, Low Complexity Regions

Lecture Notes [html-with-frames] [pdf]

Class 5

Structural Alignment

Lecture Notes on Alignment [html-with-frames] [pdf]

Links

http://bioinfo.mbb.yale.edu/align -- structural alignments
Alignment Tutorial

Class 6

Databases

Normalization, Applications, Genome Censuses

Lecture Notes on Databases [html-with-frames] [pdf]
Lecture Notes on Databases II [html-with-frames] [pdf]

Links

http://bioinfo.mbb.yale.edu/MolMovDB -- sample database, illustrates reports
http://bioinfo.mbb.yale.edu/census/browser -- sample database, highlights table structure
http://bioinfo.mbb.yale.edu/ius/?MIval=links&page=course -- a mini-database form, add your own links!!

Main Readings

Tomb, J.-F., White, O., Kerlavage, A. R., Clayton, R. A., Sutton, G. G., Fleischmann, R. D., Ketchum, K. A., Klenk, H. P., Gill, S., Dougherty, B. A., Nelson, K., Quackenbush, J., Zhou, L., Kirkness, E. F., Peterson, S., Loftus, B., Richardson, D., Dodson, R., Khalak, H. G., Glodek, A., McKenney, K., Fitzegerald, L. M., Lee, N., Adams, M. D., Hickey, E. K., Berg, D. E., Gocayne, J. D., Utterback, T. R., Peterson, J. D., Kelley, J. M., Cotton, M. D., Weidman, J. M., Fujii, C., Bowman, C., Watthey, L., Wallin, E., Hayes, W. S., Borodovsky, M., Karp, P. D., Smith, H. O., Fraser, C. M. & Venter, J. C. (1997). "The complete genome sequence of the gastric pathogen Helicobacter pylori," Nature 388, 539-547.
** http://bioinfo.mbb.yale.edu/course/private-xxxx/nature-hpylori.pdf
** http://bioinfo.mbb.yale.edu/course/private-xxxx/nature-sum-hpylori.html
** http://www.nature.com/Nature2/serve?SID=&CAT=NatGen&PG=pylori/pylori1.html
(This research article describes one of the recent genome sequences.)

Korth & Silberschatz, Database System Concepts
(CS book on databases; Read pages 1 to 65 [sections 1.0 to mid-3.2] and pages 97 to 108 [part of section 4.1]. Some of the information on SQL is available from the on-line link below.)
** http://bioinfo.mbb.yale.edu/course/private-xxxx/sqltut.htm

J L Weldon. "A Career in Data Modeling," Byte, June 1997, http://www.byte.com/art/9706/sec7/art3.htm
(Practical hands-on discussion of data modeling in commercial context, many of the same issues apply in bioinformatics.)

M Gerstein & H Hegyi (1998). "Comparing Microbial Genomes in terms of Protein Structure: Surveys of a Finite Parts List," FEMS Microbiology Reviews 22: 277-304.
** http://bioinfo.mbb.yale.edu/e-print/surveys-fems-preprint.pdf

Other Readings

M Gerstein (1998). "Patterns of Protein-Fold Usage in Eight Microbial Genomes: A Comprehensive Structural Census," Proteins 33: 518-534.
** http:// bioinfo.mbb.yale.edu/course/private-xxxx/proteins_33_518.pdf
(This is an example of the application of large-scale, database-style calculations.)

Wade, N. (1997). "Scientists Map Ulcer Bacterium's Genetic Code," New York Times. August 7.
** http://bioinfo.mbb.yale.edu/course/private-xxxx/nyt-hpylori.html

Langreth, R. (1997). "Scientists Unlock Sequence Of Ulcer Bacterium's Genes," Wall Street Journal. 7 August.
** http://bioinfo.mbb.yale.edu/course/private-xxxx/wsj-hpylori.txt

J L Weldon. "RDBMSes Get a Make-Over," Byte, April 1997, http://www.byte.com/art/9704/sec7/art7.htm
(Practical discussion of what an object database is.)

J L Weldon. "Data Warehouse Building Blocks," Byte, January 1997, http://www.byte.com/art/9701/sec7/art1.htm

J L Weldon. "Warehouse Cornerstones," Byte, January 1997, http://www.byte.com/art/9701/sec7/art2.htm
(Other, less relevant articles, on the some of the practical hardware issues in database design.)

Tanouye, E. & Langreth, R. (1998). "SmithKline-Glaxo Deal Driven By the Hunt for Human Genes," Wall Street Journal. February 2.
** http://bioinfo.mbb.yale.edu/course/private-xxxx/wsj-drug-merge.txt

Wade, N. (1997). "Now Playing at a Nearby Lab : 'Revenge of the Fly People,'" New York Times. 05/20/97, C1.
** http://bioinfo.mbb.yale.edu/course/private-xxxx/nyt-flybase.txt

Johnson, G. (1997). "Proteins Outthink Computers in Giving Shape to Life," New York Times. March 25, 1997, C1.
** http://bioinfo.mbb.yale.edu/course/private-xxxx/nyt-casp2.html

Wade, N. (1997). "Thinking Small Paying Off Big In Gene Quest," New York Times. 02/03/97, A1.
** http://bioinfo.mbb.yale.edu/course/private-xxxx/nyt-pathogens-genomes.txt

Class 7

Trees

Lecture Notes on Trees [html-with-frames] [pdf]

Main Readings

Cavalli-Sforza, L. & Edwards, S. (1967). "Phylogenetic analysis: models and estimation procedures," Evolution 21, 550-570.

Other Readings

Fitch, W. M. (1971). "Toward defining the course of evolution: minimum change for a specific topology," Syst. Zool. 20, 406-416.

Swofford et al. (1994). "Phylogeny reconstruction," In Molecular Systematics (2nd ed.), Sinauer Press.
(This book chapter is a good reference thought not a neccessary reading.)

Class 8

Geometry

Surfaces, volumes (+ structural alignment)

Lecture Notes on Geometry [html-with-frames] [pdf]
Further Lecture Notes on Geometry [html]

http://bioinfo.mbb.yale.edu/geometry

Main Readings

Richards, F. M. (1977). Areas, Volumes, Packing, and Protein Structure. Ann. Rev. Biophys. Bioeng. 6, 151-76.
** http://bioinfo.mbb.yale.edu/course/private-xxxx/richards-annrev-areas.pdf

Other Readings

Richards, F. M. (1974). The Interpretation of Protein Structures: Total Volume, Group Volume Distributions and Packing Density. J. Mol. Biol. 82, 1-14.
(Original Application of Voronoi Method to Proteins. See draft document below for more details on method.)

M Gerstein & F M Richards, "Protein Geometry: Volumes, Areas, and Distances," a m.s. submitted for chapter 22 of volume F of the International Tables for Crystallography ("molecular geometry and features" in "macromolecular crystallography")
** http://bioinfo.mbb.yale.edu/e-print/geom-inttab/geom-inttab.pdf

Kuntz, I. D. (1992). Structure-Based Strategies for Drug Design and Discovery. Science 257, 1078-1082.
(Docking. See link below for more information.)
** http://www.cmpharm.ucsf.edu/kuntz

Pattabiraman, N., Ward, K.B. and Fleming, P.J. (1995) Occluded Molecular Surface: Analysis of Protein Packing, Journal of Molecular Recognition, 8:334-344
** http://bioinfo.mbb.yale.edu/course/private-xxxx/fleming-os.pdf

Class 9

Bioinformatics in Industry

Class 10

Summary, Multiple Alignment

Lecture Notes giving Overall Summary [html]

Multiple Alignment, Profiles, Patterns
Lecture Notes [html]

Main Readings

Eddy, S. R. (1996). "Hidden Markov models," Curr. Opin. Struc. Biol. 6, 361-365.

Higgins, D. G., Thompson, J. D. & Gibson, T. J. (1996). "Using CLUSTAL for multiple sequence alignments," Methods Enzymol 266, 383-402.


[Course Home]