John Newman
MB&B452a
Mark Gernstein
12/10/1999

Protein Docking and Rational Drug Design

 The development and testing of new drugs is a hugely expensive process. Following a single candidate drug from identification through human trials and F.D.A. approval can require six to twelve years and several billion dollars. The need to make the procedure more  rapid and cost-effective has never been more obvious, as the race continues to field effective anti-HIV and anti-tumor drugs, and especially as concerns grow about the potential public health consequences of the spread of multi-drug-resistant bacteria. There are six broad steps in drug development: discovery and lead generation, lead optimization, in vitro and in vivo assays, toxicology trials, human safety trials, and human efficacy trials. Each step can require 1-3 years, while the investment grows exponentially. (1)

 Modern crystallographic and computational methods promise to cut the time and effort required for the fist two steps dramatically. Traditionally, new drug candidates are identified by screening vast numbers of random compounds and peptides for a few that have a desired activity, such as binding securely to HIV reverse transcriptase. The development of automated high-throughput screening has aided this process, but it can still take a full year to screen a normal library of 500,000 small-molecule candidates. A large number of minor variants of identified candidates must then be screened in an attempt to optimize activity or minimize non-specific interactions.

 A structure-based computational approach may be a better way. If the crystal structure of the target is known, the surfaces of its active site and potential allosteric inhibitor binding sites can be modeled. Small molecules and peptides can then be screen virtually, by searching for candidates that are predicted to pack tightly and interact well with the target surface. Two enormous advantages are offered by this approach: the speed of screening is only limited by, and scales with, computing power (which increases at a pace far exceeding the rate of increase of high-throughput screening); and candidates can be easily conformationally and structurally modified to create better matches.

 Irwin Kuntz, at the University of California, San Francisco, has developed a computer program called DOCK that is used to rapidly find the best fit of a pair of structures.  It begins with the crystal coordinates of the target protein, from which a molecular surface is calculated for the active site. (3) There are several algorithms for accomplishing this; since the surface need be calculated only for the active site, the Lee & Richards algorithm, more precise though more computationally intensive than the Shrake & Rupley algorithm, could be used. A axis of view across the active site is chosen, and the site sectioned perpendicular to this axis. A contact surface is drawn over each atom at its van der Waals radius, and a sphere of radius similar to a water molecule or perhaps even a hydrogen atom is “rolled” along the surface to fill in reentrant spaces. (4) Once the molecular surface is defined, spheres are generated to fill the active site, forming a “negative image” of the site. The centers of these spheres are now potential locations for ligand atoms. These potential locations are then aligned to the actual locations of atoms in the candidate ligand, and a best fit is found. The fit is then scored by shape (a simple proximity evaluation based on an approximation to the Lennard-Jones potential), electrostatic potential, and force-field potential. (3)

 The scores returned by DOCK are not perfect predictions of how well the candidate will actually work. In a typical screen, 100-200 high-scoring candidates are visually examined in more detail by human investigators, and 10-50 selected for in vitro testing. Of these, usually 2-20% actually show inhibition at micromolar concentrations. A more stringent optimization can often improve the best candidates to show inhibition at 3-5mM concentrations. It is in this optimization phase that the shortcuts used to allow DOCK to screen large numbers of candidates in a reasonable time become apparent. The algorithms assume a rigid target and a rigid ligand, potentially coordinating water molecules and counterions are ignored, and evaluations of interaction energies are simplified. The net result is that DOCK has difficulty finding the proper ligand conformation and discriminating among interaction modes of similar energy. Still, an improved build of DOCK is capable of predicting the positions of ligand atoms to within 1-2 Å of the crystal structure values. DOCK can also be combined with other methods of structure analysis to yield more precise results. The original screen for thymidylate synthase inhibitors produced candidates that were inhibitory at 900mM concentrations. Optimization by applying the crystal structure of a weak inhibitor and searching for similar molecules in a chemicals structure database produced a revised candidate that inhibits its target at 3mM concentrations. (2)

 Many of DOCK’s limitations stem from heuristics and shortcuts designed to save on computational time. Currently, an average UNIX workstation could screen a large library for docking candidates in about a week; a supercomputer or parallel cluster in a day. Advances in the cost-effectiveness of computing power, however, could quickly shorten screen time or allow the incorporation of significantly more complicated algorithms.  (2)

 Crystallography is now emerging as the most important bottleneck in structure determination and rational drug design. Programs such as DOCK rely on a vast database of crystal structure data to serve as the basis for computational alignment. Structures for potential ligands, which most often are small molecules not terribly difficult to crystallize, are not a problem as much as structures for large, often multisubunit proteins. Crystal structures of similar proteins can be used as a guide to help find docking candidates for an uncrystallized target, but with a substantial sacrifice in precision that would complicate later optimization steps. This limitation would most obviously assert itself in designing new agents to counteract antibiotic resistant bacteria. A small structural or conformational change in an important bacterial protein could be sufficient to render it unrecognizable to a drug. Without a detailed description of the new structure, trial-and-error based on the old structure would be the only way to design adapted drugs - not a particularly cost-effective approach. (2)

 DOCK is representative of an important step towards that holy grail of biomedicine: rational drug design, eventually on-demand. Within a few decades a computer with a much more advanced program like DOCK could sit in a doctor’s office and virtually instantaneously churn out a number of strong inhibitor candidates for any target of clinical interest. One could even imagine a database of toxicological and human safety data that could be accessed to quickly clear a new candidate for patient use; or perhaps a reverse-DOCK approach of screening the candidate against the active surfaces of every known human protein to search for non-specific interactions. A huge computational task, to be sure; but the cost-effectiveness of computational power will only increase or the foreseeable future. The bottleneck is the target structure. There are few apparent alternative technologies to crystallization for true structure determination, and prediction of 3D structure from an amino acid sequence (the only data that can currently be rapidly collected from an unknown gene or protein) is still in its infancy. The future of rational medicine will depend on research into protein folding prediction; but as our database of protein structures steadily grows, an understanding of the intermediate steps to folding develops, and computational power increases to meet the need of comparing the energy of thousands or millions of possible final structures, this final hurdle will also be solved.

References
(1) Dimasi, J.A., N.R. Bryant, and L. Lasagna. 1991. New drug development in the United States from 1963 to 1990. Clin. Pharmacol. Ther. 50: 471-486.
(2) Kuntz, I.D. (1992) Structure-Based Strategies for Drug Design and Discovery. Science 257: 1078-1082.
(3) Kuntz, I.D. et al. “How DOCK Works.” 1999. http://www.cmpharm.ucsf.edu/kuntz/dock35/dock_demo2.html
(4) Gernstein, Mark. 1999. Yale University MB&B 452a course lecture notes.