Detailed Outline of Lectures for MB&B 447b3 (Bioinformatics)

(Rough Draft as of 7/12/97,
  1. General Overview
  2. What is Bioinformatics?
    What one should know?

    What is a DNA sequence? an amino acid sequence?
    What does a protein look like in 3D?
    Molecules as information carriers.

  3. Sequences I: Alignment via Dynamic Programming
  4. Detailed description a Needleman-Wunsch implementation (using Perl).
    Comparison of the results with canned program, examples of sequence families.

    Local vs Global Alignment

    [Gribskov, 1992 #993]

  5. Sequences II: Multiple Alignment and Consensus Patterns
  6. What does one do with the alignment of many things?

    HMMs, Profiles, multiple alignment.

  7. Sequences III: Scoring schemes and Matching statistics
  8. How does one assess the statistical validity of a databank match?

    What is a p-value (or an e-value)?

    Discuss scoring schemes (extreme val. dist.)

    [Altschul, 1994 #1249]

  9. Sequences IV: Secondary Structure Propensities and Prediction
  10. Secondary Structure, TM-helices

    The wall, why tertiary structure is so hard?

  11.  Structures I: Basic Protein Geometry and Least-Squares Fitting
  12. how to represent an atom (vector),

    how to represent a line (calculating a helix axis in 3D)

    how to measure the change in a vector (gradient)

    how to do a least squares fit of two structures

    Press et al. chapter on fitting (just skip)

    Boas (chapter 3 and chapter 6, to 257)

    Hoel chapter of fitting a line

  13. Structures II: Calculation of Volume and Surface
  14. Calculating area and volumes of protein structures

    how to represent a plane, how to represent a solid, how to calculate an area

  15. Structures III: Structural Alignment
  16. Aligning sequences on the basis of 3D structure. The dynamic programming does not converge, what do you do?

    other approaches: T & O, H & S

  17. Structures IV: Molecular Dynamics & Monte Carlo Methods
  18. McCammon & Harvey (chapter 1 to chapter 4, focusing on 35 to 47)

    McCammon & Harvey (47 to 60)?

    A & T, 110-118

  19. Databases I: Relational Database Concepts
  20. The key concepts:

    SQL, Join, key, object-oriented DBMS, Normalization, Foriegn Key, Cross Product, Natural Join as "where" selection on cross product, joins as array referencing (in perl and dbm) , views, transactions, Forms & reports [user views]

    Select {columns} from {huge cross-product of tables} where {row-selection is true}

    Why pdb format is BAD!

    Korth & Silberschatz:(skim chapters 1 to 3, understand 4.1 in detail, skim 6.1 and 13.1)

  21. Databases II: Protein Domains and Modules
  22. What are the units of biological information?

    Motifs, modules, domains, in terms of sequence and structure

    ProDom, scop (on-line), Doolittle papers, CATH, average core papers (on-line)

  23. Databases III: Clustering and Trees
  24. Guest Lecture by Jungyong Kim

    The types of trees: parsimony, maximum likelihood, UPGMA
    Methods of clustering: UPGMA, single-linkage
    Evolutionary implications

  25.  Databases IV: Large-scale Censuses and Genome Comparisons

    Ortholog Families, pathways

  27. Summary Lecture