Molecular Biophysics & Biochemistry 447b3 / 747b3Bioinformatics

Mark Gerstein

Class 6a, 1/28/98

Yale University

Molecular Biology Information:Macromolecular Structure

DNA/RNA/Protein
- Almost all protein

Molecular Biology Information: Protein Structure Details

Statistics on Number of XYZ triplets
- 200 residues/domain -> 200 CA atoms, separated by 3.8 A
- Avg. Residue is Leu: 4 backbone atoms + 4 sidechain atoms, 150 cubic A
  - => ~1500 xyz triplets (=8x200) per protein domain
- 10 K known domain, ~300 folds

ATOM 2 O ACE 0 10.432 30.832 60.722 1.00 50.35 1GKY 68

ATOM 3 CH3 ACE 0 8.876 29.767 59.226 1.00 50.04 1GKY 69

ATOM 4 N SER 1 8.753 29.755 61.685 1.00 49.13 1GKY 70

ATOM 5 CA SER 1 9.242 30.200 62.974 1.00 46.62 1GKY 71

ATOM 6 C SER 1 10.453 29.500 63.579 1.00 41.99 1GKY 72

ATOM 7 O SER 1 10.593 29.607 64.814 1.00 43.24 1GKY 73

ATOM 8 CB SER 1 8.052 30.189 63.974 1.00 53.00 1GKY 74

ATOM 9 OG SER 1 7.294 31.409 63.930 1.00 57.79 1GKY 75

ATOM 10 N ARG 2 11.360 28.819 62.827 1.00 36.48 1GKY 76

ATOM 11 CA ARG 2 12.548 28.316 63.532 1.00 30.20 1GKY 77

ATOM 12 C ARG 2 13.502 29.501 63.500 1.00 25.54 1GKY 78

...

ATOM 1444 CB LYS 186 13.836 22.263 57.567 1.00 55.06 1GKY1510

ATOM 1445 CG LYS 186 12.422 22.452 58.180 1.00 53.45 1GKY1511

ATOM 1446 CD LYS 186 11.531 21.198 58.185 1.00 49.88 1GKY1512

ATOM 1447 CE LYS 186 11.452 20.402 56.860 1.00 48.15 1GKY1513

ATOM 1448 NZ LYS 186 10.735 21.104 55.811 1.00 48.41 1GKY1514

ATOM 1449 OXT LYS 186 16.887 23.841 56.647 1.00 62.94 1GKY1515

TER 1450 LYS 186 1GKY1516

Sperm Whale Myoglobin

Structure Comparison:AlignmentRigid-Body MovementsSuperpositionSignificance

Structural Alignment of Two Globins

Immunoglobulin Alignment (Harder)

Automatically Comparing Protein Structures

Given 2 Structures (A & B), 2 Basic Comparison Operations

RMS Superposition (1)

RMS Superposition (2):Distance Betweenan Atom in 2 Structures

RMS Superposition (3):RMS Distance BetweenAligned Atoms in 2 Structures

RMS Superposition (4):Rigid-Body Rotation and Translationof One Structure (B)

RMS Superposition (5):Optimal Movement of One Structure to Minimize the RMS

Alignment (1) Make a Similarity Matrix(Like Dot Plot)

Structural Alignment (1b) Make a Similarity Matrix(Generalized Similarity Matrix)

PAM(A,V) = 0.5
- Applies at every position

S(aa @ i, aa @ J)
- Specific Matrix for each pair of residues i in protein 1 and J in protein 2
- Example is Y near N-term. matches any C-term. residue (Y at J=2)

S(i,J)
- Doesn’t need to depend on a.a. identities at all!
- Just need to make up a score for matching residue i in protein 1 with residue J in protein 2

Structural Alignment (1c*)Similarity Matrixfor Structural Alignment

Structural Alignment
- Similarity Matrix S(i,J) depends on the 3D coordinates of residues i and J
- Distance between CA of i and J
- M(i,j) = 100 / (5 + d2)

Threading
- S(i,J) depends on the how well the amino acid at position i in protein 1 fits into the 3D structural environment at position J of protein 2

Alignment (2): Dynamic Programming,Start Computing the Sum Matrix

cell(R,C) { Old value, either 1 or 0 }

+ Max[

cell (R+1, C+1), { Diagonally Down, no gaps }

cells(R+1, C+2 to C_max),{ Down a row, making col. gap }

cells(R+2 to R_max, C+2) { Down a col., making row gap }

]

Alignment (3):Dynamic Programming, Keep Going

Alignment (4): Dynamic Programming, Sum Matrix All Done

Alignment (5): Traceback

In Structural Alignment, Not Yet Done (Step 6*)

Use Alignment to LSQ Fit Structure B onto Structure A
- However, movement of B will now change the Similarity Matrix

This Violates Fundamental Premise of Dynamic Programming
- Way Residue at i is aligned can now affect previously optimal alignment of residues(from 1 to i-1)

Structural Alignment (7*), Iterate Until Convergence

2 Align via Dyn. Prog.

3 RMS Fit Based on Alignment

4 Move Structure B

5 Re-compute Sim. Matrix

6 If changed from #1, GOTO #2

Score S at End Just Like SW Score, but also have final RMS

S(i,j) = similarity matrix score for aligning i and j

Sum is carried out over all aligned i and j

n = number of gaps (assuming no gap ext. penalty)

G = gap penalty

Scores from Structural Alignment Distributed Just Like Ones from Sequence Alignment (E.V.D.)