# Alignment Tutorial

 The Original Needleman-Wunsch Algorithm Which Can be Easily Adapted to Structural Alignment Mark Gerstein, 1998 http://bioinfo.mbb.yale.edu/Align/align-tutorial.html http://bioinfo.mbb.yale.edu/course M Gerstein & M Levitt (1996). “Using Iterative Dynamic Programming to Obtain Accurate Pairwise and Multiple Alignments of Protein Structures,” in Proceedings of the Fourth International Conference on Intelligent Systems in Molecular Biology, 59-67 (Menlo Park, CA, AAAI Press, June 12-15). http://hyper.stanford.edu/~mbg/Align/ismb96 Needleman, S. B. and Wunsch, C. D. (1971). "A general method applicable to the search for similarities in the amino acid sequence of two proteins." J. Mol. Biol. 48: 443-453 Step 1 -- Make a Dot Plot (Similarity Matrix) Put 1's where characters are identical. A B C N Y R Q C L C R P M A 1 Y 1 C 1 1 1 Y 1 N 1 R 1 1 C 1 1 1 K C 1 1 1 R 1 1 B 1 P 1 Step 2 -- Start Computing the Sum Matrix new_value_cell(R,C) <= cell(R,C) { Old value, either 1 or 0 } + Max[ cell (R+1, C+1), { Diagonally Down, no gaps } cells(R+1, C+2 to C_max),{ Down a row, making col. gap } cells(R+2 to R_max, C+2) { Down a col., making row gap } ] Use this Excel Formula: "=IF(M28="",0,1)+MAX(N45,N46:N47,O45:Y45)" Good Excel Formulas Inserted in Step 4 (below) A B C N Y R Q C L C R P M A 1 Y 1 C 1 1 1 Y 1 N 1 R 1 1 C 1 1 1 K C 1 1 1 R 1 2 0 0 B 1 2 1 1 1 1 1 1 1 1 1 0 0 P 0 0 0 0 0 0 0 0 0 0 0 1 0 Step 3 -- Keep Going A B C N Y R Q C L C R P M A 1 Y 1 C 1 1 1 Y 1 N 1 R 5 4 3 3 2 2 0 0 C 3 3 4 3 3 3 3 4 3 3 1 0 0 K 3 3 3 3 3 3 3 3 3 2 1 0 0 C 2 2 3 2 2 2 2 3 2 3 1 0 0 R 2 1 1 1 1 2 1 1 1 1 2 0 0 B 1 2 1 1 1 1 1 1 1 1 1 0 0 P 0 0 0 0 0 0 0 0 0 0 0 1 0 Step 4 -- Sum Matrix All Done Alignment Score is 8 matches. A B C N Y R Q C L C R P M A 8 7 6 6 5 4 4 3 3 2 1 0 0 Y 7 7 6 6 6 4 4 3 3 2 1 0 0 C 6 6 7 6 5 4 4 4 3 3 1 0 0 Y 6 6 6 5 6 4 4 3 3 2 1 0 0 N 5 5 5 6 5 4 4 3 3 2 1 0 0 R 4 4 4 4 4 5 4 3 3 2 2 0 0 C 3 3 4 3 3 3 3 4 3 3 1 0 0 K 3 3 3 3 3 3 3 3 3 2 1 0 0 C 2 2 3 2 2 2 2 3 2 3 1 0 0 R 2 1 1 1 1 2 1 1 1 1 2 0 0 B 1 2 1 1 1 1 1 1 1 1 1 0 0 P 0 0 0 0 0 0 0 0 0 0 0 1 0 Step 5 -- Traceback A B C N Y - R Q C L C R - P M A Y C - Y N R - C K C R B P A B C N Y R Q C L C R P M A 8 7 6 6 5 4 4 3 3 2 1 0 0 Y 7 7 6 6 6 4 4 3 3 2 1 0 0 C 6 6 7 6 5 4 4 4 3 3 1 0 0 Y 6 6 6 5 6 4 4 3 3 2 1 0 0 N 5 5 5 6 5 4 4 3 3 2 1 0 0 R 4 4 4 4 4 5 4 3 3 2 2 0 0 C 3 3 4 3 3 3 3 4 3 3 1 0 0 K 3 3 3 3 3 3 3 3 3 2 1 0 0 C 2 2 3 2 2 2 2 3 2 3 1 0 0 R 2 1 1 1 1 2 1 1 1 1 2 0 0 B 1 2 1 1 1 1 1 1 1 1 1 0 0 P 0 0 0 0 0 0 0 0 0 0 0 1 0 Step 6 -- Alternate Paths A B C - N Y R Q C L C R - P M A Y C Y N - R - C K C R B P A B C N Y R Q C L C R P M A 8 7 6 6 5 4 4 3 3 2 1 0 0 Y 7 7 6 6 6 4 4 3 3 2 1 0 0 C 6 6 7 6 5 4 4 4 3 3 1 0 0 Y 6 6 6 5 6 4 4 3 3 2 1 0 0 N 5 5 5 6 5 4 4 3 3 2 1 0 0 R 4 4 4 4 4 5 4 3 3 2 2 0 0 C 3 3 4 3 3 3 3 4 3 3 1 0 0 K 3 3 3 3 3 3 3 3 3 2 1 0 0 C 2 2 3 2 2 2 2 3 2 3 1 0 0 R 2 1 1 1 1 2 1 1 1 1 2 0 0 B 1 2 1 1 1 1 1 1 1 1 1 0 0 P 0 0 0 0 0 0 0 0 0 0 0 1 0

Last Updated on 1/28/98
By Mark Gerstein
Email: Mark.Gerstein@yale.edu