# Topics in the Simulation of Biological Macromolecules

### 1. Introduction

Macromolecular simulation is an important tool in molecular biology. Applications range from refinement of X-ray and NMR structures to structure-based drug design. Limitations of macromolecular simulation as well as strategies to circumvent these limitations are addressed. While a rigorous model is theoretically desirable, a large concern in the simulation of complex biological macromolecules is to keep the speed and effort of computation in a practical range. A variety of methods that address the latter problem are selectively discussed in this paper. After a short review of simulation methods various attempts to reduce the computation time in particular simulation applications are reviewed. These comprise approaches to simplify the physical model, approaches from statistics and bioinformatics, and improvements and simplifications in the computational algorithms. Finally the use of macromolecular simualtion in structure-based design is outlined, which demonstrate how powerful simulation despite its shortcomings can be.

### 2. A Brief Overview of Simulation Methods

#### 2.1. Molecular Dynamics and Monte Carlo Algorithms

The most widely used methods in macromolecular simulation are molecular dynamics and Monte Carlo simulation. The approach in a molecular dynamics simulation of macromolecules consists of numerically integrating Newton's equations of motion for the assembly of atoms, thereby following their trajectories in time. A potential function describes the interaction between atoms and molecules.

Instead of actually simulating the course of events over time, Monte Carlo simulations randomly generate a set of states of a macromolecular system. The Monte Carlo formalism guarantees that this set of states follows a Boltzmann distribution. This allows the calculation of thermodynamic properties of the system.

Monte Carlo simulations are sometimes superior to molecular dynamics simulations for study of equilibrium properties in that drastic, sometimes unphysical moves of the molecules can be implemented to explore the configuration space in a very efficient way. A drawback of Monte Carlo simulations is that they can only model equilibrium situations. Therefore Monte Carlo methods cannot be employed to gain information on time-dependent processes (such as for instance diffusion).

#### 2.2. Potential Energy Functions

A good overview of the possible interactions between molecules (see table 1) can be found in Prausnitz (1986). The nature of electrostatic forces is well understood. Electrostatic forces arise either between two electric point charges (which can be described by Coulomb's law) or between permanent electric dipoles. In the presence of an electric field nonpolar molecules experience a polarization of its charge distribution that gives rise to an induced dipole. Dispersion forces between nonpolar molecules are due to the interaction of instantaneous dipoles. For repulsive forces no satisfactory description exists to date. In practice, the Lennard-Jones potential is widely used, which assumes that repulsive forces are proportional to the 12th power of the inverse distance. In addition to electrostatic interactions and interactions modeled by the Lennard-Jones potential, chemical effects such as hydrogen bonding add important contributions to intermolecular potentials. The forces arising from chemical bonds can for instance be modeled as springs.

 Nature of forces Proportional to (r: intermolecular distance) Attractive forces Electrostatic Induction Dispersion . r^(-1) r^(-6) r^(-6) Repulsive forces r^(-12) (in Lennard-Jones model) Chemical forces Hydrogen bonds Electron Donor /Electron Acceptor Complexes - -

Table 1. Intermolecular forces (Prausnitz 1986).

### 3. Some Issues in Physical Modeling

#### 3.1. Solvent Modeling

Explicit modeling of the solvent is based on a complete microscopic description of the solvent molecules. Explicit approaches are often difficult to implement in practice because calculations of the solvent behavior consume the majority of computer time. For instance, the number of atoms required to model the solvent explicitly in the simulation of a protein in solution is about one order of magnitude larger than the number of atoms in the protein itself (van Gunsteren 1997).

Therefore the solvent is often modeled in an implicit approach. The solvent is treated in a macroscopic fashion and is for instance represented as a Newtonian liquid with a dielectric constant (Koczack 1995). A disadvantage of treating the solvent as a continuum is that water molecules that play a structural role in the macromolecule are usually not considered.

#### 3.2. Modeling Chemical Reactions

Models based on molecular dynamics or Monte Carlo simulations treat atoms classically and are therefore strictly only an approximation to the actual behavior of atoms as described by quantum mechanics (Schrödinger equation). However, molecular mechanics models provide realistic simulations of enzymatic systems in practice. The drawback is that only quantum mechanics can appropriately describe the formation and breaking of chemical bonds. Simulating an enzymatic system solely on a strictly theoretical or semi-empirical quantum mechanical description is often too time consuming. Therefore hybrid models for enzymatic systems with chemical reactions are employed that model a small part around the active site with quantum mechanics and the rest of the structure with classical molecular mechanics (see Field 1990).

### 4. Modeling with Methods from Statistics and Bioinformatics

#### 4.1. Statistically derived potentials

Instead of directly modeling physical interatomic and intermolecular forces, potentials can also be derived from the knowledge of protein crystal structures (Jernigan 1996). This approach has gained importance since the number of known structures has increased vastly in recent years so that statistical errors should become increasingly smaller. Instead of calculating the forces of individual atoms, often a mean force that acts at the centroid of the side chain is calculated. For example the following expression for the energy parameter e is widely used for the interaction between two residues of a protein (see Godzik 1995):

eps(i,j)/(kT) = -ln(nobs(i,j)/nexp(i,j))

(k: Boltzmann constant, T: temperature)

Here, nobs(i,j) is the observed number of contacts between pairs of amino acids i and j. This quantity can be computed using data from the Protein Data Bank. nexp(i,j) is the number of expected contacts between pairs of amino acids. Strictly, this equation assumes that Boltzmann statistics apply, which requires that the system is at thermodynamic equilibrium. Structure derived potentials provide an effective potential for interactions between atoms that comprise a variety of effects. It is for instance interesting to note that these potentials can result in attractive forces between two positively charged residues (Miyazawa 1996). Although there should be electrostatic repulsion between the two residues, the effective force is attractive due to the clustering of charged groups because of hydrophobic packing of the protein.

#### 4.2. A Special Approach to Structure Prediction: Homology Modeling

One of the biggest challenges in macromolecular simulation is the prediction of protein folding. A method to determine the three-dimensional structure of a protein not from physical simulation but from a related modeling approach is briefly outlined here. The idea of homology or comparative modeling is to use a known 3D structure as a template for the structure of a target sequence. Schematically the following four sequential steps are performed (Sanchez 1997). First, the proteins of known structures that exhibit sequence homology with the target sequence must be found. Second, these homologous proteins must be sequence aligned with the target protein. Third, a structure for the target protein is computed based on the known structures of the homologous proteins. Fourth, the model is evaluated and the second and third steps can be iteratively repeated until the evaluation criteria are satisfied. (For an overview of current methods for the computation of structures in comparative structure modeling see: Sanchez 1997.) Remarkably, the root mean square (rms) error of structures predicted by comparative modeling can be kept below 2 Angstroms (Sanchez 1997). Sali et al. (1997) demonstrated that good homology modeling strategies can result in an rms error of one Angstrom and less for about 90% of the main chain atoms if the target sequence is at least 40% identical to the template. Larger rms errors occur mainly in sidechains and loop regions. For less than 40% sequence identity the usefulness of homology modeling is reduced. An application of homology modeling is to derive initial model structures in X-ray crystallographic refinement. Comparative modeling also seems to be a compelling approach for the analysis of genome databases (Gerstein 1997). The number of currently known protein sequences is on the order of 150,000 while the number of known protein structures is on the order of 5,000. It is estimated (Rost 1996) that about one third of the known protein sequences are related to the known structures of proteins deposited in databases. The success of homology structure predicition largely depends on the correct alignment of the target sequence with the template. Therefore research aiming at improving the current alignment methods will also be helpful for comparative structure prediction.

### 5. Special Algorithms

#### 5.1. Reduction of the Conformational Space

A major determinant in the computation time consumed in macromolecular modeling is the size of the configuration space. For instance, for a macromolecule with a collection of N atoms the number of degrees of freedom is at least 3N (for 3 spacial dimensions) if all atoms are regarded as independent of one another (see Rice 1994).

##### 5.1.1. Torsion Angle Molecular Dynamics (TAD)

A special approach (torsion angle molecular dynamics) to reduce the number of variables that constitute the configurational space is described by Rice et al. (1994). The configurational space of a macromolecule can be restricted if chemical information such as bond lengths is considered and, in particular, if only ideal covalent geometries are allowed. Most of the conformational variability in a protein can in fact be described by its torsion angles, which reduces the total degrees of freedom approximately ten-fold. The motion of the protein can then be readily described using equations from classical mechanics which are commonly used in robotics.

##### 5.1.2. Lattice Models

Another strategy to reduce the conformational space for the search of the global minimum of the objective function is to implement a lattice representation of the conformational space. In highly simplified models of macromolecules the individual atoms are only allowed to assume certain points in space, for instance in a cubic lattice (Hoffmann 1997). These models can only provide qualitative insights into the behavior of proteins or nucleic acids. Studies with more sophisticated lattice models have been performed and compared with experimental results. For instance, Kolinski et al. (1994) report the structure prediction of the small globular protein crambin with a root mean square (rms) error between 2 and 4 Angstroms compared with its crystal structure. One constraint for lattice models is that they can only be explored by Monte Carlo algorithms.

#### 5.2. Simulated Annealing

Another difficulty in molecular dynamics arising from a large configuration space (i.e., a large amount of adjustable parameters) is the fact that the objective function most likely has multiple local minima. It is a challenge for the optimization algorithm to find the global minimum of the target function rather than just a local minimum. Simulated annealing represents one attempt to overcome this problem (Brunger 1997). In simulated annealing a fixed temperature is assigned to the system. Local conformational transitions across barriers in the target function are possible when those barriers are lower than the kinetic energy of the system as specified by the temperature. On the other hand, local conformational changes that decrease the value of the objective function, tend to increase the kinetic energy of the system. Therefore a friction or heat dissipation term is included in the equations describing the motions of the atoms (temperature coupling). Ideally, temperature coupling prevents the system from leaving the global minimum configuration but allows it to move away from local minima.

#### 5.3. The Problem of the Short Time Scale

There is a variety of processes in which proteins are involved that take place in time ranges that cannot be fully simulated with conventional macromolecular modeling techniques. For instance, enzyme-mediated biological reactions do not only require the close proximity of the reactants but also the proper orientation of the reactants relative to each other. Many of these reactions are diffusion-limited and take considerably longer than the nanosecond range that is accessible to simulation. The time scale of protein folding events also remains a problem to computational investigation. Folding takes place in a period ranging from milliseconds to several minutes for many proteins. The computer simulation of the folding of a protein from a completely random initial configuration to the final native structure would consume a huge amount of computation time. An interesting algorithm that addresses this problem was recently presented by Rojnuckarin et al. (1998). The authors examine a simplifying diffusion-collision model of a four helix bundle protein. The complete folding process of this protein takes place in a few seconds, but the complete simulation of the folding process with Brownian dynamics would require approximately 10,000 years of CPU time on a current supercomputer. However, a modified method, called weighted ensemble Brownian dynamics, can be employed to simulate the folding process in several days on a current supercomputer. This is possible because weighted ensemble Brownian dynamics calculates the mean passage time of an event and furthermore samples events that lead to "productive collisions" even if the probability of these events is very small. The authors report that the calulated folding times for the helices of the four helix bundle are on the order of magnitude of experimentally measured folding times, which is probably due to inaccuracies of the model. It is important to note that the proposed algorithm can significantly reduce the computation time for the simulation that otherwise could not have been carried out.

### 6. A Special Application: Macromolecular Modeling and Combinatorial Chemistry

In the previous paragraphs many of the limitations of macromolecular modeling have been discussed. Although some strategies to circumvent these limitations have been explained, there still remain major difficulties. However, one application in which macromolecular modeling is of great importance is rational drug design. At first sight, rational drug design on the basis of structural modeling seems only to epitomize the limitations of the modeling efforts. It is an extremely difficult problem to correctly predict receptor-ligand interactions. A lot of effort has been concentrated on the development of accurate docking algorithms (Kuntz 1994); however, it remains very difficult to predict the free energy change associated with binding (Salemme 1997). On the other hand this does not mean that there are no applications for molecular modeling in rational drug design yet. A very promising approach seems to be the combination of structure-based drug design and combinatorial chemistry (Salemme 1997). Combinatorial chemistry allows the parallel synthesis of a great number of compounds derived from a few common building blocks. This can be effectively combined with structure based-design methods especially in screening a combinatorial chemistry library for certain structural feaatures of compounds. For instance, it can be tested how well a compound might function as a ligand molecule. It is important to note here that less precise methods of structure prediction are often better able to serve the purpose of selecting a certain number of compounds for further investigation of their potential use as a drug (Martin 1997). Another important combination of structure based design and combinatorial chemistry are so called virtual libraries. The size of combinatorial chemistry libraries created with current laboratory techniques is limited to about 106 compounds; automatic synthesis of compounds in successive rounds is even limited to about only 103 compunds per round (van Gunsteren 1997, Salemme 1997); however, computational methods could in principle survey a library with the number of compounds on a much higher order of magnitude. The bigger libraries of merely computational structures could be screened to aid in finding a subset of compounds that will actually be chemically synthesized. Combined application of combinatorial chemistry and structural design also facilitates the assessment of bioavailibility and safety of potential drugs. Structure-based drug design tends to neglect these issues because of its concentrated focus on the isolated receptor- ligand system. For instance, a potent drug might also bind to targets other than the primarily envisaged receptor molecule. However, bioavailability and safety can be tested with compounds created by combinatorial chemistry.

### 7. Conclusion

Macromolecular modeling is a useful technique in determining biomolecular structures and properties in addition to conventional structure determination methods such as X-ray crystallography and NMR. Some of the drawbacks of macromolecular modeling as well as some strategies that deal with these drawbacks have been outlined. In the refinement of experimental structures macromolecular modeling is an established technique. Despite its limitations there are promising applications in rational drug desgin, for instance the creation of virtual compund libraries in conjunction with automated combinatorial chemistry.

### 8. References

Brunger AT, Adams PD, Rice LM: New applications of simulated annealing in X-ray crystallography and solution NMR. Structure 1997, 5:325-336.

Field MJ, Bash PA, Karplus M: Combined quantum-mechanical and molecular mechanical potential for molecular-dynamics simulations. J. Comp. Chem. 1990, 11:700-733.

Gerstein M, Levitt M: A structural census of the current population of protein sequences. Proc. Natl. Acad. Sci. USA 1997 94:11911-11916.

Godzik A, Kolinski A, Skolnick J: Are proteins ideal mixtures of amino acids - analysis of energy parameter sets. Protein Sci. 1995, 4:2107-2177.

Hoffmann A, Sommer JU, Blumen A: Computer simulations of asymmetric block copolymers. J. Chem Phys 1997, 107:7559-7570.

Jernigan RL, Bahar I: Structure-derived potentials and protein simulations. Curr. Opin. Struct. Biol. 1996, 6:195-209.

Koczak RE, d'Mello MJ, Subramaniam S: Computer modeling of electrostatic steering and orientational effects in antibody-antigen association. Biophys. J. 1995, 68:807-814.

Kolinski A, Skolnick J: Monte-Carlo simulation of protein-folding .2. application to protein-A, rop, and crambin. Proteins 1994, 18:353-366.

Kuntz ID, Meng CE, Shoichet BK: Structure-based molecular design. Accounts Chem. Res. 1994, 27:117-123.

Martin EJ, Spellmeyer DC, Critchlow RE, Blaney JM: Does combinatorial chemistry obviate coomputer-aided drug design? Rev. Comput. Chem. 1997, 10:75-100.

Miyazawa S, Jernigan RL: Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. J. Mol. Biol. 1996, 256:623-644.

Prausnitz JM, Lichtenthaler RN, de Azevedo EG: Molecular thermodynamics of fluid- phase equilibria. Second edition, Prentice-Hall, Englewood Cliffs, NJ, 1986.

Rice LM, Brunger AT: Torsion angle dynamics: reduced variable conformational sampling enhances crystallographic structure refinement. Proteins 1994, 19:277-290.

Rojnuckarin A, Kim S, Subramanian S: Brownian dynamics simulation of protein folding: Access to milliseconds time scale and beyond. Proc. Natl. Acad. Sci. USA 1998, 95:4288-4292.

Rost B, Sander C: Bridging the protein sequence-structure gap by structure predicitions. Annu. Rev. Biophys. Biomol. Struct. 1996, 25:113-136.

Salemme FR, Spurlino J, Bone R: Serendipity meets precision: the integration of structure-based drug design and combinatorial chemistry for efficient drug discovery. Structure 1997, 5:319-324.

Sali A, Potterton L, Yuan F, Van Vlijmen H, Karplus M: Evaluation of comparative protein modeling by MODELLER. Proteins 1993, 17:355-362.

Sanchez R, Sali A: Advances in comparative protein-structure modeling. Curr. Opin. Struct. Biol. 1997, 7:206-214.

Van Gunsteren WF, Weiner PK, Wilkinson AJ (eds.): Computer simulation of biomolecular systems. Volume 3, Kluwer Academic Publishers, Dordrecht, 1997. A hybrid approach of molecular dynamics and Monte Carlo 16