What is Bioinformatics?
What one should know?
What is a DNA sequence? an amino acid sequence?
What does a protein look like in 3D?
Molecules as information carriers.
Detailed description a Needleman-Wunsch implementation (using Perl).
Comparison of the results with canned program, examples of sequence
families.
Local vs Global Alignment
[Gribskov, 1992 #993]
What does one do with the alignment of many things?
HMMs, Profiles, multiple alignment.
How does one assess the statistical validity of a databank match?
What is a p-value (or an e-value)?
Discuss scoring schemes (extreme val. dist.)
[Altschul, 1994 #1249]
Secondary Structure, TM-helices
The wall, why tertiary structure is so hard?
how to represent an atom (vector),
how to represent a line (calculating a helix axis in 3D)
how to measure the change in a vector (gradient)
how to do a least squares fit of two structures
Press et al. chapter on fitting (just skip)
Boas (chapter 3 and chapter 6, to 257)
Hoel chapter of fitting a line
Calculating area and volumes of protein structures
how to represent a plane, how to represent a solid, how to calculate an area
http://hyper.stanford.edu/~mbg/SurfVolTalk/new96/svt.00.html
Aligning sequences on the basis of 3D structure. The dynamic programming does not converge, what do you do?
http://bioinfo.mbb.yale.edu/Align/ismb96
other approaches: T & O, H & S
McCammon & Harvey (chapter 1 to chapter 4, focusing on 35 to 47)
McCammon & Harvey (47 to 60)?
http://bioinfo.mbb.yale.edu/Geometry/mbg-phd/phd-ch6.html
A & T, 110-118
The key concepts:
SQL, Join, key, object-oriented DBMS, Normalization, Foriegn Key, Cross Product, Natural Join as "where" selection on cross product, joins as array referencing (in perl and dbm) , views, transactions, Forms & reports [user views]
Select {columns} from {huge cross-product of tables} where {row-selection is true}
Why pdb format is BAD!
http://bioinfo.mbb.yale.edu/~mbg/clippings-u/sqltut.htm
Korth & Silberschatz:(skim chapters 1 to 3, understand 4.1 in detail,
skim 6.1 and 13.1)
What are the units of biological information?
Motifs, modules, domains, in terms of sequence and structure
ProDom, scop (on-line), Doolittle papers, CATH, average core papers (on-line)
http://hyper.stanford.edu/~mbg/Align
Guest Lecture by Jungyong Kim
The types of trees: parsimony, maximum likelihood, UPGMA
Methods of clustering: UPGMA, single-linkage
Evolutionary implications
http://bioinfo.mbb.yale.edu/census
Ortholog Families, pathways