BIOINFORMATICS Sequences

11/14/99


Click here to start


Table of Contents

BIOINFORMATICS Sequences

Sequence Topics (Contents)

Molecular Biology Information: Protein Sequence

Aligning Text Strings

Dynamic Programming

Step 1 -- Make a Dot Plot (Similarity Matrix)

A More Interesting Dot Matrix

Step 2 -- Start Computing the Sum Matrix

Step 3 -- Keep Going

Step 4 -- Sum Matrix All Done

Step 5 -- Traceback

Step 5 -- Traceback

Step 6 -- Alternate Tracebacks

Suboptimal Alignments

Suboptimal Alignments II

Gap Penalties

Step 2 -- Computing the Sum Matrix with Gaps

All Steps in Aligning a 4-mer

Key Idea in Dynamic Programming

Similarity (Substitution) Matrix

How to score the exchange of two amino acids in an alignment?

Where do matrices come from?

Principles of Scoring Matrix Construction, in detail

Principles of Scoring Matrix Construction, in detail #2

Principles of Scoring Matrix Construction, in detail #3

Different Matrices are Appropriate at Different Evolutionary Distances

Change in Matrix with Ev. Dist.

The BLOSUM Matrices

Local vs. Global Alignment

Modifications for Local Alignment

Local vs. Global Alignment

Transitive Sequence Comparison

Multiple Sequence Alignments

Progressive Multiple Alignments

Problems with Progressive Alignments

Popular Multiple Alignment Programs

C1Q

Clustal Alignment

Profiles Motifs HMMs

Motifs

Prosite Pattern -- EGF like pattern

Profiles

EGF Profile Generated for SEARCHWISE

HMMs

Modules

The Score

Score in Context of Other Scores

P-value in Sequence Matching

Objective is to Find Distant Homologues

P-values

What Distribution Really Looks Like

EVD Fits

Extreme Value vs. Gaussian

EVD #2

Explicit Form of the P-value in terms of Extreme Value Distribution

Use Sequence Scores to Validate

Significance Depends on Database Size

Low-Complexity Regions

Computational Complexity

FASTA

Join together query lookups into diagonals and then a full alignment

Basic Blast

Blast: Extension of Hash Hits

Blasting against the DB

Analytic Score Formalism for Blast

Blast2: Gapped Blast

Blast2: Gapped Blast

PSI-Blast

Practical Issues on DNA Searching

General Protein Search Principles

Overview

What secondary structure prediction tries to accomplish?

Some TM scales: GES KD

How to use GES to predict proteins

Graph showing Peaks in scales

Removing Signal sequences

Ex. Pr(S) probability that residue j has secondary structure i

Statistics Based Methods: Persson & Argos

Refinements: Charge on the Outside, Positive Inside Rule

Refinements: MaxH

GOR: Simplifications

Basic GOR

Directional Information

Types of Residues

GOR IV

Assessment

Training and Testing Set

Is 100% Accuracy Possible?

Types of Secondary Structure Prediction Methods

GOR Semi-parametric Improvements

Multiple Sequence Methods

DSC -- an improvement on GOR

Neural Networks

More NN

Yet more methods….

Mail Servers and Web Forms

Additional Features of DNA sequences in Genomes

Genetic Code

Splicing

Alternative Splicing: Multiple Proteins from One Gene

Promotors

References

References

References

References

Author: Office97

Email: Mark.Gerstein@yale.edu

Home Page: http://bioinfo.mbb.yale.edu