MB&B 447b / 747bSpring 1999BIOINFORMATICS |

Computational analysis of gene sequences and protein structures, on a large-scale. Topics include sequence alignment, biological database design, comparative genomics, geometric analysis of protein structure, and macromolecular simulation. [Blue Book Entry]

To be offered in the **2nd** half of the spring term
as a "module." Meeting from 1:00-2:15 PM on Mondays
and Wednesday, in Bass 205. (Some classes may also be held on
Fridays.)

**First meeting: Bass 205, 1:00-2:15 PM, Monday 3/22/99.**

Mark Gerstein,
MB&B Department, Bass
432A, Yale University, New
Haven, CT
06520

Phone: 203 432-6105, E-mail: Mark.Gerstein@yale.edu

Handouts and readings with Kate Tatham <kathleen.tatham@yale.edu> Bass 336, 203 432-8990.

- A
**synopsis**of each of the lectures: http://bioinfo.mbb.yale.edu/mbb447-99/lectures.htm - Detailed discussion of course
**assignments**, mailing lists, survey, etc.: http://bioinfo.mbb.yale.edu/mbb447-99/todo.htm - If you want to take or audit the course, you need to fill
out the
**survey** - The course will follow a very similar progression to the
bioinformatics course offered
**last spring**.

(See, in particular, http://bioinfo.mbb.yale.edu/course/classes.) - Also, see other related on-line lectures.

This course will provide an overview of bioinformatics, the application of computational methods to interpret the rapidly expanding amount of biological information. Following the natural flow of this information in the cell, the course will begin with the analysis of gene sequences and progress to the study of protein structures. The classic dynamic programming method of sequence alignment will be presented first, and then it will be shown how this can be extended to allow rapid searching and scoring of the thousands of sequences in a genome. This will naturally lead to the question of how large amounts of biological information can be intelligently organized into a database. Discussion of sequence-structure relationships will form the bridge to protein structure. Particular emphasis will be placed here on statistically based "predictions" of secondary structure. For the analysis of 3D structures, mathematical constructions, such as Voronoi polyhedra, will be presented for calculating simple geometric quantities, such as distances, angles, axes, areas, and volumes. Finally, it will be shown how these simple quantities can be related to the basic properties of proteins and this will naturally lead to a brief overview of the more physical calculations that are possible on protein structures, namely molecular dynamics and Monte Carlo simulation.

- General Overview
- Sequences I: Alignment via Dynamic Programming
- Sequences II: Multiple Alignment and Consensus Patterns
- Sequences III: Scoring schemes and Matching statistics
- Sequences IV: Secondary Structure Propensities and Prediction
- Structures I: Basic Protein Geometry and Least-Squares Fitting
- Structures II: Calculation of Volume and Surface
- Structures III: Structural Alignment
- Structures IV: Molecular Dynamics & Monte Carlo
- Databases I: Relational Database Concepts
- Databases II: Protein Domains and Modules
- Databases III: Clustering and Trees
- Databases IV: Large-scale Censuses and Genome Comparisons
- Summary Lecture

Readings will be excerpted from a number of original research papers. In addition, sections from the following books will be used:

- Sequence Analysis Primer by Gribskov & Deveraux
- atabase System Concepts by Korth & Silberschatz
- Dynamics of Proteins & Nucleic Acids by McCammon & Harvey.

- Approximately 25-30 pages of reading will be required each week.
- Students will be evaluated on the basis of:
- A Final Paper/Project (can include programming)
- Class Attendance and Participation
- In class, talk extending the contents of lecture and some of the reading

- For a sample see some of last year's projects and talks.

- The course is keyed towards first-year MB&B graduate students and advanced MB&B undergraduates.
- Students should have :
- A basic knowledge of biochemistry and molecular biology.
- A knowledge of basic quantitive concepts, such as single variable calculus, some probability and statistics, and basic programming skills.

- These can be fufilled by the following prerequistes statement:
"Prerequisites: Biol. 122b and Mathematics 115 or permission
of the instructor."

- Fill out this year's survey !!
- Also, take a look a survey of last year's attendees and a statistical analysis of survey results [html] [pdf]

If you're really motivated, take a look at http://bioinfo.mbb.yale.edu/jobs.

The DNA-mouse image is adapted from the GCB-98
homepage. What's wrong with the adaptation?