MB&B [47]47b3
BIOINFORMATICS

 

Overview of ON-LINE Documents
&
Synopsis of Classes

Class 1, M 1/12/98

Topics: "What is Bioinformatics?" Types of Molecular Biology Information.
Survey (Please e-mail back to Mark.Gerstein@yale.edu )
Lecture Notes [html-with-frames] [pdf]

Class 2, W 1/14/98

Topics

"What is Bioinformatics?" The Range of Calculations in Bioinformatics, Three Major Application Areas in Bioinformatics, Sequence Similiarity, Sequence Comparison via Dynamic Programming.

Lecture Notes [html-with-frames] [pdf]
Administrative Info Page
Extra Notes [pdf]

Blast Search

Basic: http://www.ncbi.nlm.nih.gov/cgi-bin/BLAST/nph-blast
Test Data: http://bioinfo.mbb.yale.edu/course/classes/c2-testdata.txt
(Look at Advanced Blast too)

Readings

(For next Monday)

Chapter 3 from Gribskov, M. and Devereux, J. (1992). Sequence Analysis Primer. New York, Oxford University Press.
(Focus on dynamic programming section of this chapter.)

Needleman, S. B. and Wunsch, C. D. (1971). "A general method applicable to the search for similarities in the amino acid sequence of two proteins." J. Mol. Biol. 48: 443-453.
(The original paper. Still pretty easy to read. Will be used in class.)

Smith, T. F. and Waterman, M. S. (1981). "Identification of common molecular subsequences." J. Mol. Biol. 147: 195-197
(The original paper on local alignment. Not quite as easy to read, but introduces this important concept.)

Class 3, M 1/19/98

Topics

Sequence Comparison via Dynamic Programming. Issues in Sequence Comparison. Mutation Matrix. Local vs. Global Alignment. Low-complexity Regions. Basic Structures.

Lecture Notes [html-with-frames] [pdf]
Extra Notes [pdf]

Links

Alignment Tutorial

Readings

(For next Wednesday, to be handed out later in the week.)

Altschul, S. F., Boguski, M. S., Gish, W. and Wootton, J. C. (1994). Issues in searching molecular sequence databases. [Review]. Nature Genetics. 6(2): 119-29.
(Most important. A short overall review.)

M Gerstein & M Levitt (1996). "Using Iterative Dynamic Programming to Obtain Accurate Pairwise and Multiple Alignments of Protein Structures," in Proceedings of the Fourth International Conference on Intelligent Systems in Molecular Biology, 59-67 (Menlo Park, CA, AAAI Press, June 12-15).
** http://hyper.stanford.edu/~mbg/Align/ismb96

M Gerstein & M Levitt (1998). "Comprehensive Assessment of Automatic Structural Alignment against a Manual Standard, the Scop Classification of Proteins," Protein Science (in press).
** http://bioinfo.mbb.yale.edu/~mbg/preprint/ss-prsci.pdf
(Understand the method, not results, in this paper OR
in Gerstein & Levitt (1996), above)

M Levitt & M Gerstein (1998). A Unified Statistical Framework for Sequence Comparison and Structure Comparison. Proceedings of the National Academy of Sciences USA (in press)
** http://bioinfo.mbb.yale.edu/~mbg/preprint/stat-framework-pnas-preprint.pdf
(Understand the concept of P-value and the framework for deriving scoring statistics.)

Holm, L. and Sander, C. (1993). Protein Structure Comparison by Alignment of Distance Matrices. J. Mol. Biol. 233: 123-128.
(A different method of structural alignment, which differs more from sequence alignment.)

Pearson, W. R. (1996). Effective Protein Sequence Comparison. Meth. Enz. 266: 227-259.
(Understand how the FASTA e-value is derived.)

Class 4, W 1/21/98

Topics

Mathematical background on probability distributions, Matrices, vector products.

Lecture Notes [html-part-1] [html-part-2] [pdf]

Guest Instructor: Mark Wilson

Class 5, M 1/26/98

Topics

Scoring Schemes, Low Complexity Regions, Blast, FASTA.

Lecture Notes [html-with-frames] [pdf]

Required Reading

(For next week; In URLs with private-xxxx, replace "xxxx" with alternate string.)

Tomb, J.-F., White, O., Kerlavage, A. R., Clayton, R. A., Sutton, G. G., Fleischmann, R. D., Ketchum, K. A., Klenk, H. P., Gill, S., Dougherty, B. A., Nelson, K., Quackenbush, J., Zhou, L., Kirkness, E. F., Peterson, S., Loftus, B., Richardson, D., Dodson, R., Khalak, H. G., Glodek, A., McKenney, K., Fitzegerald, L. M., Lee, N., Adams, M. D., Hickey, E. K., Berg, D. E., Gocayne, J. D., Utterback, T. R., Peterson, J. D., Kelley, J. M., Cotton, M. D., Weidman, J. M., Fujii, C., Bowman, C., Watthey, L., Wallin, E., Hayes, W. S., Borodovsky, M., Karp, P. D., Smith, H. O., Fraser, C. M. & Venter, J. C. (1997). "The complete genome sequence of the gastric pathogen Helicobacter pylori," Nature 388, 539-547.
** http://bioinfo.mbb.yale.edu/course/private-xxxx/nature-hpylori.pdf
** http://bioinfo.mbb.yale.edu/course/private-xxxx/nature-sum-hpylori.html
** http://www.nature.com/Nature2/serve?SID=&CAT=NatGen&PG=pylori/pylori1.html
(This research article describes one of the recent genome sequences.)

Korth & Silberschatz, Database System Concepts
(CS book on databases; Read pages 1 to 65 [sections 1.0 to mid-3.2] and pages 97 to 108 [part of section 4.1]. Some of the information on SQL is available from the on-line link below.)
** http://bioinfo.mbb.yale.edu/course/private-xxxx/sqltut.htm

J L Weldon. "A Career in Data Modeling," Byte, June 1997, http://www.byte.com/art/9706/sec7/art3.htm
(Practical hands-on discussion of data modeling in commercial context, many of the same issues apply in bioinformatics.)

Gerstein (1997). A Structural Census of Genomes: Comparing Eukaryotic, Bacterial and Archaeal Genomes in terms of Protein Structure. J. Mol. Biol. 274, 562-576.
** http:// bioinfo.mbb.yale.edu/course/private-xxxx/genome-jmb-reprint.pdf
(This is an example of the application of large-scale, database-style calculations.)

Extra Reading

Wade, N. (1997). "Scientists Map Ulcer Bacterium's Genetic Code," New York Times. August 7.
** http://bioinfo.mbb.yale.edu/course/private-xxxx/nyt-hpylori.html

Langreth, R. (1997). "Scientists Unlock Sequence Of Ulcer Bacterium's Genes," Wall Street Journal. 7 August.
** http://bioinfo.mbb.yale.edu/course/private-xxxx/wsj-hpylori.txt

Gerstein, M. & Levitt, M. (1997). A Structural Census of the Current Population of Protein Sequences. Proc. Natl. Acad. Sci. USA 94, 11911-11916
** http://bioinfo.mbb.yale.edu/course/private-xxxx/census-pnas-reprint.pdf
(Another similar example of the application of large-scale, database-style calculations.)

J L Weldon. "RDBMSes Get a Make-Over," Byte, April 1997, http://www.byte.com/art/9704/sec7/art7.htm
(Practical discussion of what an object database is.)

J L Weldon. "Data Warehouse Building Blocks," Byte, January 1997, http://www.byte.com/art/9701/sec7/art1.htm
J L Weldon. "Warehouse Cornerstones," Byte, January 1997, http://www.byte.com/art/9701/sec7/art2.htm
(Other, less relevant articles, on the some of the practical hardware issues in database design.)

In-class Presentation Assignment

"Hello World" in HTML for Next Wednesday

Class 6, W 1/28/98

Topics

Structural Alignment. Protein Geometry: surfaces.

Lecture Notes on Alignment [html-with-frames] [pdf]
Lecture Notes on Geometry [html-with-frames] [pdf]
Further Lecture Notes on Geometry [html]

Links

http://bioinfo.mbb.yale.edu/align -- structural alignments
Alignment Tutorial
http://bioinfo.mbb.yale.edu/geometry -- surfaces

Class 7, M 2/2/98

Topics

Protein Geometry: volumes. Beginning Databases.

Lecture Notes on Geometry [html-with-frames] [pdf]
Lecture Notes on Databases [html-with-frames] [pdf]

Links

http://bioinfo.mbb.yale.edu/geometry -- volumes
http://bioinfo.mbb.yale.edu/MolMovDB -- sample database

Class 8, W 2/4/98

Topics

Databases II: Normalization, Applications, Genome Censuses

Lecture Notes on Databases [html-with-frames] [pdf]

Links

http://bioinfo.mbb.yale.edu/MolMovDB -- sample database, illustrates reports

http://bioinfo.mbb.yale.edu/census/browser -- sample database, highlights table structure

http://bioinfo.mbb.yale.edu/ius/?MIval=links&page=course -- a mini-database form, add your own links!!

Final Project

Plan
Summarize and Review an area
Interpret and Analyze Data
Come up with a New Approach
Areas
Alignment Methods (sequence & structure)
Scoring Statistics (sequence & structure)
Protein Geometry (surfaces and volumes)
Databases (theory & application)
Also, Genomes, Pathways, Trees, Patterns, Docking, Modelling
Length
~7 pages in total (2000 words)
Format
Turn in printout + relative-link HTML document
Will be ported to course website and integrated with course materials.
Final URL like http://bioinfo.mbb.yale.edu/course/projects/joe-bone.
Coding?
Computer implementation and coding possible
Documentation still necessary!...
perl, fortran, java, c....
Meet Individually to Discuss Project

"Hello World" in HTML???

Required Reading

(Due Next Wednesday)

Richards, F. M. (1974). The Interpretation of Protein Structures: Total Volume, Group Volume Distributions and Packing Density. J. Mol. Biol. 82, 1-14.
(Original Application of Voronoi Method to Proteins. See draft document below for more details on method.)
** http://bioinfo.mbb.yale.edu/course/private-xxxx/vol-draft.pdf

Richards, F. M. (1977). Areas, Volumes, Packing, and Protein Structure. Ann. Rev. Biophys. Bioeng. 6, 151-76.
** http://bioinfo.mbb.yale.edu/course/private-xxxx/richards-annrev-areas.pdf

Kuntz, I. D. (1992). Structure-Based Strategies for Drug Design and Discovery. Science 257, 1078-1082.
(Docking. See link below for more information.)
** http://www.cmpharm.ucsf.edu/kuntz

Extra Reading

Pattabiraman, N., Ward, K.B. and Fleming, P.J. (1995) Occluded Molecular Surface: Analysis of Protein Packing, Journal of Molecular Recognition, 8:334-344
** http://bioinfo.mbb.yale.edu/course/private-xxxx/fleming-os.pdf

Gerstein, M. & Chothia, C. (1996). Packing at the Protein-Water Interface. Proc. Natl. Acad. Sci. USA 93, 10167-10172.

Class 9, M 2/9/98

Topics

More Protein Geometry, Docking.
Questions!

Lecture Notes  [html-with-frames]

Guest Instructor: Pat Fleming

Links

http://www.csb.yale.edu -- CSB Core
http://csbmet.csb.yale.edu/userguides/datamanip/os/os_descrip.html  -- OS
http://www.csb.yale.edu/people/core/pjf/templates -- Templates for Presentations

Class 10, W 2/11/98

Topics

Multiple Alignment, Profiles, Patterns
Questions!

Lecture Notes  [html]

Guest Instructor: Hedi Hegyi

Required Reading

Cavalli-Sforza, L. & Edwards, S. (1967). "Phylogenetic analysis: models and estimation procedures," Evolution 21, 550-570.

Eddy, S. R. (1996). "Hidden Markov models," Curr. Opin. Struc. Biol. 6, 361-365.

Fitch, W. M. (1971). "Toward defining the course of evolution: minimum change for a specific topology," Syst. Zool. 20, 406-416.

Higgins, D. G., Thompson, J. D. & Gibson, T. J. (1996). "Using CLUSTAL for multiple sequence alignments," Methods Enzymol 266, 383-402.

Extra Reading

Swofford et al. (1994). "Phylogeny reconstruction," In Molecular Systematics (2nd ed.), Sinauer Press.
(This book chapter is a good reference thought not a neccessary reading.)

Bork, P. & Gibson, T. J. (1996). "Applying motif and profile searches," Methods Enzymol 266, 162-84.

Class 11, M 2/16/98

Topics

Trees I

Guest Instructor: J Kim

Lecture Notes on Trees [html-with-frames] [pdf]

Class 12, W 2/18/98

Topics

Trees II

Guest Instructor: J Kim

Lecture Notes on Trees [html-with-frames] [pdf]

Class 13, M 2/23/98

Topics

Molecular Simulation, Presentation of Short Summary Talks

Lecture Notes on Simulation [html-with-frames] [pdf]
Student talks

Class 14, W 2/25/98

Topics

Overall Summary, Presentation of Short Summary Talks II

Lecture Notes giving Overall Summary [html]
Student talks

Important, "Required" Reading

(due whenever! In URLs with private-xxxx, replace "xxxx" with alternate string.)

McCammon, J. A. & Harvey, S. C. (1987). Dynamics of Proteins and Nucleic Acids. Cambridge UP.

** Information on Liquid Simulation Methods (excerpted from a thesis, 1992)

Extra, "fun" Reading

(due whenever!)

Tanouye, E. & Langreth, R. (1998). "SmithKline-Glaxo Deal Driven By the Hunt for Human Genes," Wall Street Journal. February 2.
** http://bioinfo.mbb.yale.edu/course/private-xxxx/wsj-drug-merge.txt

Wade, N. (1997). "Now Playing at a Nearby Lab : 'Revenge of the Fly People,'" New York Times. 05/20/97, C1.
** http://bioinfo.mbb.yale.edu/course/private-xxxx/nyt-flybase.txt

Johnson, G. (1997). "Proteins Outthink Computers in Giving Shape to Life," New York Times. March 25, 1997, C1.
** http://bioinfo.mbb.yale.edu/course/private-xxxx/nyt-casp2.html

Wade, N. (1997). "Thinking Small Paying Off Big In Gene Quest," New York Times. 02/03/97, A1.
** http://bioinfo.mbb.yale.edu/course/private-xxxx/nyt-pathogens-genomes.txt


[Course Home]