phd-apa

Appendix A Computer Implementation

I Introduction

Protein structures are exceedingly complex. A structure can contain more than 10,000 atoms. Consequently, computers are essential for visualizing, analyzing, and understanding protein structure.

Much of my effort over the past three years has went into the development and use of software tools to study protein structure. In the software development, great effort was put into expressing computational concepts concisely and elegantly using the languages and tools of modern computer science. What follows is a list and synopsis of the major pieces of software developed over my PhD. It is a waste to reinvent the wheel for each project and a great effort was made to integrate this software with existing programs and approaches.

II Molecular Simulation
A Monte-Carlo Program (MC4)

The Monte Carlo program for water simulation discussed in chapters 7, 8, and 9 was written from scratch in ANSI C (Harbison & Steele, 1987). Its design was guided by the models provided by Allen & Tildesley (1987), Press et al. (1988), and X-PLOR (Brünger, 1990; Brünger et al., 1987). As shown in the code fragment in table A-1, it essentially consisted of four loops, and the outermost and innermost loops could be parallelized.

The outermost loop cycles over "runs." As discussed in chapters 7 and 9, each "run" was a completely independent heat-cool-run part of the overall simulation and could be executed in a separate processor. The results were combined at the end. The next loop is over steps in a run. At each step, a loop is then made over water molecules, upon which a Monte-Carlo move is attempted. The innermost loop is over all the neighbors of a water molecule that is being moved. It can be vectorized on an array or pipeline processor. A Verlet-style neighbor list is used, and it employs actual C pointers instead of array indices. The whole program has a COVI (concurrent outer, vector inner) structure that is highly efficient on moderately parallel computers (Alliant, 1991). Water molecule configurations generated by the program can be analyzed in situ, written to disk in CHARMM DCD format (Brooks et al., 1983), or output as vector lists for graphics display.

B Configuration Analysis Program (DCD3)

The simulation analysis program converted a set of Monte-Carlo configurations (or a molecular-dynamics trajectory) into a set of images of probability density. It was written so it could fit into the Monte-Carlo program as a independent subroutine or be used independently to read DCD format files. As an intermediate step the program writes out the probability density into MRC-image/CCP4-map file format (CCP4, 1991). For final output the program scales and thresholds these maps, converts them to postscript images (Adobe, 1985), and embellishes them with atom positions and other annotation.

III Structure Comparison
A Reciprocal Space Comparisons (PACK, FILL, and CORR)

The reciprocal space comparisons discussed in chapter 10 were done with a suite of programs connected by a C-shell script. Using standard CCP4 programs, a density map is generated from a set of atomic coordinates. A short C program (PACK) reads in this map, thresholds it, and converts it to a compressed ASCII format. A short program (FILL) written in emacs lisp (Stallman, 1986) is used to implement the region growing procedure to fill the map and make an envelope. The envelope is reconverted to CCP4 map format by PACK and then Fourier transformed using standard crystallographic routines. A second C program (CORR) reads the resulting LCF file of transform phases and amplitudes and computes the statistics discussed in chapter 10.

B Graphics and Fitting

• A number of general purpose coordinate and rotation-matrix manipulation routines were developed and implemented as C libraries (UTIL). These did the screw axis calculation, area and volume subtraction, and rotational decomposition calculation (chapter 4 and Appendix D).

• The fit-all procedure and other fits were done with scripts for X-PLOR or PINQ (Lesk, 1986a).

• A C implementation of a lisp interpreter (MLISP) was written as a front end for many of the programs. It used a recursive descent parser and a fully implemented garbage collector. Various operations, such as matrix decomposition, were written as lisp primitives.

• Many different graphics programs were used: FRODO (Jones, 1985), O (Jones et al., 1991), INSIGHT (Dayringer et al., 1986), MOLSCRIPT (Kraulis, 1991), and ARTPLOT (Lesk & Hardmann, 1982). A preprocessor (PP) was written in LEX (Kernigan & Pike, 1984) for converting general drawing commands or Brookhaven headers into graphics commands for these programs.

Table A-1 Parallel Implementation of the Monte-Carlo Program

Code schematic showing the overall construction of the simulation program. Loops over R and P can be parallelized. Those over S and W can not be.

for(R=0; R< num_runs ; R++)

if (fork() == IS_CHILD) {

for(S=0; S<num_steps ; S++, T=heat_cool_run(S) ){

for(W=0; W< num_waters ; W++){

for(P= nbr_list[P] ; *P != END_LIST; P++){

} /* P */

} /* W */

analysis(results[P],config,S);

} /* S */

} /* R */

wait();

for(R=0; R<num_runs; R++)

totals = accumulate(results[R],totals);