README ~~~~~~ for 'code-mbg' library code (1.1) A Overview ~~~~~~~~ This is "library" source code for doing a variety of calculations on protein structures and sequences, including calculating surface areas and volumes, superimposing two structures, calculating helix axes, finding H-bonds and contacts, and computing sequence similarity. The primary use of this computer code is for volume or surface calculations. If you use it for calculating volumes, please refer to the following references: M Gerstein, J Tsai & M Levitt (1995). "The volume of atoms on the protein surface: Calculated from simulation, using Voronoi polyhedra," J. Mol. Biol. 249: 955-966. Y Harpaz, M Gerstein & C Chothia (1994). "Volume Changes on Protein Folding," Structure 2: 641-649. If you use the code for other purposes (such as calculating surfaces areas or helix axes), please cite: M Gerstein (1992). "A Resolution-Sensitive Procedure for Comparing Protein Surfaces and its Application to the Comparison of Antigen-Combining Sites," Acta Cryst. A48: 271-276. B Copying, Building, Using... ~~~~~~~~~~~~~~~~~~~~~~~~ This code was assembled by Mark Gerstein. Much of it was written by Mark Gerstein but there are substantial contributions from Yehouda Harpaz, Jerry Tsai, David Hinds, and others. This code is copyright 1995. You are free to use it for whatever academic calculations you may wish to. However, you are asked to: 1 -- Cite the references above when you use the programs for published work; 2 -- Keep this statement with the programs; 3 -- Not incorporate this code into any commercial programs without obtaining explicit permission from the author. Contact Mark Gerstein if you have any questions or difficulties. Everything here is written in the C language and compiles using make under unix. It has so far been tested on DEC alphas, SG Indigos, and an i486 running linux. To get everything going, just type 'make' in the top level directory and all the libraries and executables should be built and tested. The actual library source is contained in the subdirectory 'src-lib' . The subdirectory 'data' contains various data files pertinent to the calculations -- e.g. standard radii and volumes for atoms (See the paper for more explanation of the parameters). The calculations of most interest to people will be those relating to Voronoi volumes. For these a few jiffy demonstration programs are built from the library code. These are contained in the subdirectory 'src-prog' . The programs for calculating surfaces and volumes are based on Fred Richards' original surface and volume programs, written in fortran. These are available from Fred Richards at Yale (See F M Richards (1974), J. Mol. Biol. 82: 1-14; F M Richards (1977), Annu. Rev. Biophys. Bioeng. 6: 151-176). C Descriptions of the sample executables ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Descriptions of some of the executables in the sub-directory 'src-prog' follow. Sample runs are in the directory 'sample-runs'. In this discussion, the following convention is used. (To keep the distribution from getting to big the sample run for the full-dump-polyhedra program is abbreviated.) in.pdb = input file in pdb format ("-" for stdin). out.pdb = output file in pdb-like format with extra columns and results written into various columns [-arg] = an optional argument ------------ calc-surface.exe -i in.pdb > out.pdb ------------ * Surface * ATOM 1 N ARG 1 32.231 15.281 -13.143 0.00 0.00 50.43 (Surface is in square Angstroms.) ----------- calc-volume.exe -i in.pdb [-method N] [-RichardsRadii] > out.pdb ----------- First optional argument determines whether the normal Voronoi, method B, the radical plane, or a modified method B is used. Inclusion of second optional argument causes Richards' radii to be used for the atoms. The default is the radii of Chothia. (See paper for discussion.) * Volume * ATOM 2 CA ARG 1 32.184 14.697 -11.772 0.00 0.00 15.25 0 (Volume is in cubic Angstroms. If volume isn't calculable, it is set to -1.00) -------------- show-2rad-refV.exe -i in.pdb [-sv ref-vol.dat] > out.pdb -------------- Optional argument "-sv" specifies a file for the reference volumes. R-Cov R-VDW V-Ref ATOM 1 N ARG 1 32.231 15.281 -13.143 0.70 1.65 13.63 R-Cov = covalent radius (A) (Different parameters are possible; R-VDW = VDW radius (A) see JMB paper above.) V-Ref = Standard reference volume (cubic A) from the analysis of the interiors of proteins (See JMB paper above for discussion.) -------------- dump-polyhedra.exe -i in.pdb > out.vects -------------- This dumps the vertices of the Voronoi polyhedron for each atom in format suitable for import into the graphics program O. Here is a section of out.vects. DRAW_OBJECT_WritePoly t ChangeThisToTotalLines 80 Begin_object WritePoly ! Beginning Atom C ARG 1 Move 38.9047 18.7797 -13.1947 Line 32.9549 13.8252 -13.0297 Move 32.9549 13.8252 -13.0297 Line 32.4831 13.2205 -12.7607 Move 32.4831 13.2205 -12.7607 Line 32.8308 11.9338 -10.8669 Move 32.8308 11.9338 -10.8669 . . . Line 34.1948 12.8591 -10.6045 Move 33.1069 12.4968 -10.2644 Line 32.3484 12.5938 -10.1057 Move 32.3484 12.5938 -10.1057 Line 32.8308 11.9338 -10.8669 Move 32.8308 11.9338 -10.8669 Line 33.2656 12.1306 -10.6793 Move 33.1069 12.4968 -10.2644 Line 33.2656 12.1306 -10.6793 ! volume= 14.1673 MaxDistSq= 7.5539 ! Ending Atom 2 : C ARG 1 ! ATOM 3 C ARG 1 33.438 13.890 -11.387 14.17 7.55 ... and so on ... End_Object To use this with O, do the following: 1. Change the '80' on the first line of the output to the number of lines in out.vects. 2. Inside of O, type 'read_formatted out.vects' to create an O datablock 'draw_object_writepoly' . 3. Then type 'draw_object draw_object_writepoly' to draw this datablock to the screen using the O graphics descriptor language. ------------------- full-dump-polyhedra.exe -i in.pdb > out.dat ------------------- Here out.dat contains a full specification of the polyhedron for each atom, including the area of each face and the vertices constituting it. This full specification is very useful for quantifying inter-atomic contacts and generating the Delaunay tessellation (see below). For instance, for the first atom, the CA of Arg2, the polyhedron description is shown below. Atoms are specified by the ISER number (the first number after the "ATOM"). FullDumpPoly(): BEGIN polyhedron for following atom, which has ID 2. ATOM 2 CA ARG 1 32.184 14.697 -11.772 0.00 0.00 DumpAFace(): BEGIN face 0 -- Face between atom 2 and neighbour 1, which are separated by 1.491 A -- List of 5 vertices: number, derived from atoms (4 IDs), coord. (x,y,z) 0 2 6 1 5 31.6946 13.6065 -13.1026 1 2 6 1 3 32.5716 13.1079 -13.2849 2 2 33 1 3 34.5954 17.0962 -11.5166 3 2 872 1 33 31.6729 18.7648 -10.9061 4 2 872 1 5 29.9060 17.9477 -11.3147 -- Face-Centroid= 32.0881 16.1046 -12.0250 -- Distance of face to central atom: 1.4334 -- Face-Area= 8.4958 Pyramid-Volume= 4.0592 DumpAFace(): END face 0 DumpAFace(): BEGIN face 1 -- Face between atom 2 and neighbour 5, which are separated by 1.531 A -- List of 5 vertices: number, derived from atoms (4 IDs), coord. (x,y,z) 0 2 6 1 5 31.6946 13.6065 -13.1026 .... and so on .... and on .... *** *** *** This program effectively describes the Delauney Tessellation. *** *** *** The Delauney tessellation is formed from connecting the two atoms that determine each face. (That's why it's dual to the Voronoi polyhedron). Specifically, if you look at the output, you will see a lot of lines like: -- Face between atom 2 and neighbour 5, which are separated by 1.549 A -- Face between atom 2 and neighbour 3, which are separated by 1.497 A -- Face between atom 7 and neighbour 438, which are separated by 3.455 A The two atom numbers on each line (e.g. 2 and 5 for the first line above) are serial numbers (ISER in PDB terminology) of a pair of atoms connected in the Delauney tessellation. So if you take the output above and draw a vector between atoms 2 and 5, 2 and 3, and 7 and 438, you will begin to build up the tessellation. D Location of the code ~~~~~~~~~~~~~~~~~~~~ This file and all the computer code is located at the following URLs: 1 ftp://hyper.stanford.edu/pub/mbg/SurfaceVolumes/code-mbg.tar.Z This URL is a tar achive that contains the source code plus sample output and scripts for checking out the programs. To extract files from the archive use the command 'uncompress -c code-mbg.tar.Z | tar xvf -' 2 ftp://hyper.stanford.edu/pub/mbg/SurfaceVolumes/code-mbg This URL has everything in the above archive expanded into directories. It also has pre-built libraries and executables for DEC alpha, Silicon Graphics, and i486 linux computers. (These binaries are stored in the subdirectories lib-alpha, lib-sgi, lib-linux, bin-alpha, bin-sgi, and bin-linux . Note that 'lib' and 'bin' are just arbitrary pointers to an appropriate pair of these directories.) Mark Gerstein / 25 October 1995 / Stanford, CA .. _ .. _ .. .. _ .. _ .. .. _ .. _ .. .. _ .. _ .. .. _ .. _ .. .. _ .. E Other References ~~~~~~~~~~~~~~~~ Regarding surfaces and volumes, you may also want to look at: M Gerstein & C Chothia (1996). "Packing at the Protein-Water Interface" PNAS 93: 10167-10172. M Gerstein & R M Lynden-Bell (1993). "What is the natural boundary for a protein in solution?" J. Mol. Biol. 230: 641-650. F History ~~~~~~~ 25 October 1995 -- Version 1.0, released code 3 May 1996 -- Version 1.1, * Corrected bug that prevented the calculation of hydrogen volumes * Corrected bug (I think) that lead to the first atom type being given a zero radius. To make doubly sure this doesn't happen, just make this first atom type a dummy type ("JUNK"). 7 June 1996 -- Version 1.101 * Added a few references to this README 26 November 1996 -- Version 1.102 * Fixed typing of GLY CA (was C4H, now C4HH) 24 December 1996 -- Version 1.103 * Updated this README file to better document the calculation of Delauney Tessellations