Amino Acid Substitution Matrices (continued)
Principles of Scoring Matrix Construction
The Dayhoff Matrix: Proteins evolve through a succesion
of independent point mutations, that are accepted in a population and subsequently
can be observed in the sequence pool.
(Dayhoff, M.O. et al. (1978) Atlas of Protein Sequence and Structure.
Vol. 5, Suppl. 3 National Biomedical Reserach Foundation, Washington D.C.
U.S.A).
First step: Pair Exchange Frequencies
A PAM (Percent Accepted Mutation)
is one accepted point mutation on the path between two sequences, per 100
residues.
Second step: Frequencies of Occurence
Amino acid frequencies:
1978 1991
L 0.085 0.091
A 0.087 0.077
G 0.089 0.074
S 0.070 0.069
V 0.065 0.066
E 0.050 0.062
T 0.058 0.059
K 0.081 0.059
I 0.037 0.053
D 0.047 0.052
R 0.041 0.051
P 0.051 0.051
N 0.040 0.043
Q 0.038 0.041
F 0.040 0.040
Y 0.030 0.032
M 0.015 0.024
H 0.034 0.023
C 0.033 0.020
W 0.010 0.014
Third step: Relative Mutabilities
Relative mutabilities of amino acids:
1978 1991
A 100 100
C 20 44
D 106 86
E 102 77
F 41 51
G 49 50
H 66 91
I 96 103
K 56 72
L 40 54
M 94 93
N 134 104
P 56 58
Q 93 84
R 65 83
S 120 117
T 97 107
V 74 98
W 18 25
Y 41 50
All values are taken relative to alanine, which is arbitrarily
set at 100.
Fourth step: Mutation Probability Matrix
The probability that an amino acid in row i
of the matrix will replace the amino acid in column j : the
mutability of amino acid j, multiplied by the pair exchange
frequency for ij divided by the sum of all pair exchange
frequencies for amino acid i:
Last step: the log-odds matrix
log to base 10: a value of +1 would mean that the corresponding
pair has been observed 10 times more frequently than expected by chance.
The most commonly used matrix is the matrix from the 1978 edition of the
Dayhoff atlas, at PAM 250: this is also frequently referred to as the MDM78
PAM250 matrix.