BIOINFORMATICS
Sequences

Sequence Topics (Contents)

Molecular Biology Information: Protein Sequence

Aligning Text Strings

Dynamic Programming

Step 1 -- Make a Dot Plot (Similarity Matrix)

A More Interesting Dot Matrix

Step 2 --
Start Computing the Sum Matrix

Step 3 -- Keep Going

Step 4 -- Sum Matrix All Done

Step 5 -- Traceback

Step 5 -- Traceback

Step 6 -- Alternate Tracebacks

Suboptimal Alignments

Suboptimal Alignments II

Gap Penalties

Step 2 -- Computing the Sum Matrix with Gaps

All Steps in Aligning a 4-mer

Key Idea in Dynamic Programming

Similarity (Substitution) Matrix

Where do matrices come from?

More on this….

Amino Acid Frequencies of Occurrence

Principles of Scoring Matrix Construction, in detail

Principles of Scoring Matrix Construction, in detail #2

Principles of Scoring Matrix Construction, in detail #3

Different Matrices are Appropriate at Different Evolutionary Distances

Change in Matrix with Ev. Dist.

Other Matrices:
How to score the exchange of two amino acids in an alignment?

The BLOSUM Matrices

Local vs. Global Alignment

Modifications for Local Alignment

End of Class 1

Transitive Sequence Comparison

Multiple Sequence Alignments

Progressive Multiple Alignments

Problems with Progressive Alignments

Popular Multiple Alignment Programs

Profiles Motifs
HMMs

Profiles

Profiles formula for position
M(p,a)

Profiles formula for entropy
H(p,a)

C1Q - Example

Clustal Alignment

Motifs

Prosite Pattern -- EGF like pattern

EGF Profile Generated for SEARCHWISE


HMMs

Markov Models

Hidden Markov models

More HMMs

Example:
simple fully interconnected model (N=3)

Scoring by Brute Force method:

Sequence profile elements

Sequence profile elements

HMM sequence profiles

Result: HMM sequence profile

Different topologies:

Algorithms

Modules

The Score

Score in Context of Other Scores

P-value in Sequence Matching

Objective is to Find Distant Homologues

Coverage v Error Rate

P-values

What Distribution Really Looks Like

EVD Fits

Extreme Value vs. Gaussian

EVD #2

End of Class 2

Explicit Form of the P-value in terms of Extreme Value Distribution

Use Sequence Scores to Validate

Significance Depends
on Database Size

Low-Complexity Regions

Computational Complexity

FASTA

Join together query lookups into diagonals and then a full alignment

Basic Blast

Blast: Extension of Hash Hits

Blasting against the DB

Analytic Score Formalism for Blast

Blast2: Gapped Blast

Blast2: Gapped Blast

Y-Blast

PSI-Blast

Practical Issues on DNA Searching

General Protein Search Principles

Overview

What secondary structure prediction tries to accomplish?

Some TM scales:
GES                     KD

How to use GES to predict proteins

Graph showing Peaks in scales

Removing Signal sequences

Ex. P(i,a) probability that residue i has secondary structure a

End of Class 3

Statistics Based Methods:
Persson & Argos

Refinements: Charge on the Outside, Positive Inside Rule

Refinements:
MaxH

GOR: Simplifications

Basic GOR

More GOR

Directional Information

Types of Residues

GOR IV

Assessment

Training and Testing Set

Is 100% Accuracy Possible?

Types of Secondary Structure Prediction Methods

GOR Semi-parametric Improvements

Multiple Sequence Methods

DSC -- an improvement on GOR

Conservation, k-nn

Neural Networks

More NN

Yet more methods….

Mail Servers and Web Forms

Additional Features of DNA sequences in Genomes

Gene finding

Genetic Code

Splicing

Alternative Splicing:
Multiple Proteins from One Gene

Promotors

References

References

References

References

End of Class 4 with 15’ left