Class 1, M 1/12/98, Bass 405

Topics: "What is Bioinformatics?" Types of Molecular Biology Information.
Lecture Notes

Class 2, W 1/14/98, 9:30-10:20


"What is Bioinformatics?" The Range of Calculations in Bioinformatics, Three Major Application Areas in Bioinformatics, Sequence Similiarity, Sequence Comparison via Dynamic Programming.

Lecture Notes
Extra Notes

Blast Search

Chapter 3 from Gribskov, M. and Devereux, J. (1992). Sequence Analysis Primer. New York, Oxford University Press.
(Focus on dynamic programming section of this chapter.)

Needleman, S. B. and Wunsch, C. D. (1971). "A general method applicable to the search for similarities in the amino acid sequence of two proteins." J. Mol. Biol. 48: 443-453.
(The original paper. Still pretty easy to read. Will be used in class.)

Smith, T. F. and Waterman, M. S. (1981). "Identification of common molecular subsequences." J. Mol. Biol. 147: 195-197
(The original paper on local alignment. Not quite as easy to read, but introduces this important concept.)

Class 3, Mon. 1/19/98


Sequence Comparison via Dynamic Programming. Issues in Sequence Comparison. Mutation Matrix. Local vs. Global Alignment. Low-complexity Regions. Basic Structures.

Lecture Notes
Extra Notes


Altschul, S. F., Boguski, M. S., Gish, W. and Wootton, J. C. (1994). Issues in searching molecular sequence databases. [Review]. Nature Genetics. 6(2): 119-29.
(Most important. A short overall review.) 

M Gerstein & M Levitt (1996). "Using Iterative Dynamic Programming to Obtain Accurate Pairwise and Multiple Alignments of Protein Structures," in Proceedings of the Fourth International Conference on Intelligent Systems in Molecular Biology, 59-67 (Menlo Park, CA, AAAI Press, June 12-15).
M Gerstein & M Levitt (1998). "Comprehensive Assessment of Automatic Structural Alignment against a Manual Standard, the Scop Classification of Proteins," Protein Science (in press).
(Understand the method, not results, in this paper OR
in Gerstein & Levitt (1996), above) 

M Levitt & M Gerstein (1998). A Unified Statistical Framework for Sequence Comparison and Structure Comparison. Proceedings of the National Academy of Sciences USA (in press)
(Understand the concept of P-value and the framework for deriving scoring statistics.)

Holm, L. and Sander, C. (1993). Protein Structure Comparison by Alignment of Distance Matrices. J. Mol. Biol. 233: 123-128.
(A different method of structural alignment, which differs more from sequence alignment.)

Pearson, W. R. (1996). Effective Protein Sequence Comparison. Meth. Enz. 266: 227-259.
(Understand how the FASTA e-value is derived.)

Class 4, Wed. 1/21/98


Mathematical background on probability distributions, Matrices, vector products.

Class 5, Mon. 1/26/98


Scoring Schemes, Low Complexity Regions, Beginning Databases.

Required Reading

Tomb, J.-F., White, O., Kerlavage, A. R., Clayton, R. A., Sutton, G. G., Fleischmann, R. D., Ketchum, K. A., Klenk, H. P., Gill, S., Dougherty, B. A., Nelson, K., Quackenbush, J., Zhou, L., Kirkness, E. F., Peterson, S., Loftus, B., Richardson, D., Dodson, R., Khalak, H. G., Glodek, A., McKenney, K., Fitzegerald, L. M., Lee, N., Adams, M. D., Hickey, E. K., Berg, D. E., Gocayne, J. D., Utterback, T. R., Peterson, J. D., Kelley, J. M., Cotton, M. D., Weidman, J. M., Fujii, C., Bowman, C., Watthey, L., Wallin, E., Hayes, W. S., Borodovsky, M., Karp, P. D., Smith, H. O., Fraser, C. M. & Venter, J. C. (1997). "The complete genome sequence of the gastric pathogen Helicobacter pylori," Nature 388, 539-547.
(This research article describes one of the recent genome sequences.)

Korth & Silberschatz, Database System Concepts
(CS book on databases, Skim chapters 1 to 3, understand 4.1 in detail. Some of the information on SQL is available from the on-line link below.)
J L Weldon. "A Career in Data Modeling," Byte, June 1997, http://www.byte.com/art/9706/sec7/art3.htm
(Practical hands-on discussion of data modeling in commercial context, many of the same issues apply in bioinformatics.)

Gerstein (1997). A Structural Census of Genomes: Comparing Eukaryotic, Bacterial and Archaeal Genomes in terms of Protein Structure. J. Mol. Biol. 274, 562-576.
(This is an example of the application of large-scale, database-style calculations.)

Extra Reading

Wade, N. (1997). "Scientists Map Ulcer Bacterium's Genetic Code," New York Times. August 7.
Langreth, R. (1997). "Scientists Unlock Sequence Of Ulcer Bacterium's Genes," Wall Street Journal. 7 August.
Gerstein, M. & Levitt, M. (1997). A Structural Census of the Current Population of Protein Sequences. Proc. Natl. Acad. Sci. USA 94, 11911-11916
(Another similar example of the application of large-scale, database-style calculations.)

J L Weldon. "RDBMSes Get a Make-Over," Byte, April 1997, http://www.byte.com/art/9704/sec7/art7.htm
(Practical discussion of what an object database is.)

J L Weldon. "Data Warehouse Building Blocks," Byte, January 1997, http://www.byte.com/art/9701/sec7/art1.htm
J L Weldon. "Warehouse Cornerstones," Byte, January 1997, http://www.byte.com/art/9701/sec7/art2.htm
(Other, less relevant articles, on the some of the practical hardware issues in database design.)

Class 6, Wed. 1/28/98


