Coarse Representations for the Essential Features of the Protein Surface Mark Gerstein (mbg@cb-iris.stanford.edu) The protein surface is usually represented (and viewed) in terms of thousands of intersecting atoms (spheres). The great amount of detail in such a representation obscures the overall shape of the surface and creates computational problems for many popular docking and surface matching schemes. Two approaches are presented for removing unnecessary detail and representing the essential features of the protein surface. (1) Reverse-crystallography: The protein surface is represented in terms of a resolution-dependent Fourier series. This approach allows for hierarchical, resolution-sensitive shape matching and very efficient docking. (2) The hydration surface: The protein surface is defined by the second shell of water molecules surrounding it. The hydration surface is similar to the commonly used molecular surface, but the "probe" water positions are determined in the course of a molecular simulation, rather than just purely geometrically, so it is argued that this surface is more chemically meaningful. MMBJC seminar Beckman 402 3:30 PM Wednesday 8 June 1994 The following references are summarize this work: 1 M Gerstein (1992). A Resolution-Sensitive Procedure for Comparing Protein Surfaces and its Application to the Comparison of Antigen-Combining Sites, Acta Crystallographica A48: 271- 276. 2 M Gerstein & R Lynden-Bell (1993). Simulation of Water around a Model Protein Helix. 1. Two-dimensional Projections of Solvent Structure, Journal of Physical Chemistry 97: 2982- 2990. 3 M Gerstein & R Lynden-Bell (1993). What is the Natural Boundary for a Protein in Solution? Journal of Molecular Biology 230: 641-650. 4 M Gerstein & R Lynden-Bell (1993). Simulation of Water around a Model Protein Helix. 2. The Relative Contributions of Packing, Hydrophobicity, and Hydrogen-Bonding, Journal of Physical Chemistry 97: 2991-2999. 1 Coarse Representations for the Essential Features of the Protein Surface Mark Gerstein R Lynden-Bell (Cambridge Chemistry) M Levitt (Stanford Struc. Biol.) 8 June 1994 Overheads Available Electronically with URL ftp://cb-iris.stanford.edu/pub/mbg/CoarseSurf/ CoarseSurf.94Jun8.talk. {ps, word.rtf, word.hqx, txt, abstract} 2 Coarse Representations for the Essential Features of the Protein Surface Surface representations are important for: docking & rational drug design understanding molecular recognition Customary representations: VDW surface: all the nooks & crannies Richards accessible surface & molecular surface smooth away some of the detail with a probe sphere dot surface Slide 01 8.VI.94 vdw surface Slide 02 8.VI.94 Connolly surface Slide 03 8.VI.94 My new representations: 1 Reverse-Crystallography: represent the protein surface in terms of a Fourier Series 2 A Hydration Surface: represent the protein surface in terms of the position of water molecules in a molecular simulation 3 Reverse-Crystallography Procedure Threshold VDW potential and fill in to create an envelope (1 inside, 0 outside) Fourier transform envelope F(s) = FFT[ f(x) ] x = 3D position in Cartesian space s = 3D position in frequency space Resolution-Sensitive Shape Comparisons Compare transforms vs resolution S Compare to reference R(s) = FFT[ r(x) ] Comparison metrics : 1. straight difference D(S) = (F-R)(F-R)* 2. correlation C(S) = dFdR*+dF*dR2sFsR where dF = F-F 4 Implementation Immunoglobulin CDR regions Surface diversity with common structural framework McPC603, 17/9, D1.3, HyHEL-10 REI is reference Cut out loops, threshold, put in P1 cell, and transform Outline Slide 04 8.VI.94 D & C vs S Slide 05 8.VI.94 Want to detect similarities in overall features, such as similar clefts Implicit scaling in the correlation, so: D best for very dissimilar shapes ( compare birds face with persons face ) C best for similar shapes ( compare two peoples faces) D(0) = (V-VR)2 5 Discussion of Resolution-Sensitive Procedure In the future can use other than rectangular basis sets Spherical harmonics Ylm (slow, but no choice of origin or reference necessary) Basis functions designed for discrete data: i.e., the Hadamard transforms from signal compression Hierarchial Representation A few numbers sum up overall low-res shape Automatic shape classification Other terms describe high-res texture (cf. fractal measures of surface roughness) Very Efficient for Docking.... 6 Docking Want to compute a score for many different relative orientations of two bodies Score = correlation = c(t) Orientations in terms of a 3D translation t & a 3D rotation R This involves a 6D search. Can use FFT to effectively reduce docking to a 3D search by eliminating the translation search Katchalski-Katzir et al. (1992). PNAS 89: 2195 Real-space correlation is a reciprocal-space multiplication Katchalski Insert 06 8.VI.94 N = number of grid points on an axis 7 Hydration surface Protein surface should be defined in terms of the environment in which it resides i.e. in water. Where is the protein surface chemically? Protein has region of influence through its Hbonds & hydrophobic hydration. What is the natural boundary of a protein in solution? Richards tried to take into solvent with Molecular & Accessible Surfaces Roll simple probe sphere (radius 1.4 ) on surface of protein. Locus of probe sphere centers is accessible surface (1971). Molecular surface = Contact Surface + Reentrant Surface (1977) Implemented by Connolly (1983) 8 The Plan Propose to use the 2nd shell of water molecules in a simulation to define a hydration surface Use real water as probe sphere Effectively a smoothed surface 1 Characterize water around single helix. Density, orientation, and energy. Represent results structurally 2 Competition between packing and Hbonding Compare water distribution to that of simple solvents, such as Ar (l) 3 Use 2nd shell to define natural boundary (hydration surface) When are 2 helices together & apart? 4 Think about constructing & comparing hydration surfaces of real proteins 9 Simple Model Systems 1 or 2 polyalanine a-helices 14 residues of : N(Me)Ca|CbC||O in 22.219 22.219 20.93 cell containing 321 waters Very hydrophobic for a protein surface but our interest is in hydrophobic hydration Exploit symmetry for averaging S2 Single Helix Slide S11 TwoSys 10 Monte-Carlo Method Want to calculate average densities, energies, dipoles, quadrupoles, &c (x,y) = 1Z ??phasespaceʵ(x,y)e-bH(qN) dqN Sample states randomly: from a uniform distribution & weight them by Boltzmann factor (Forever integration). from a Boltzmann distribution & weight uniformly (Monte-Carlo integration). Metropolis Rule tells how to sample according to Boltzmann distribution. Constructed Highly Parallel MC program for water in NVT ensemble 11 Simple Pair Potential Coulomb + Lennard-Jones Eij = ij ????qiqjrij+Aijr12ijBijr6ij Parameters CHARMM on 14 residues of alanine 321 TIP3 waters 3-site model consistent with protein H: q= +0.4e O: q= -0.8e s=3.2 e=0.6kJ/mol Other non-polarizable potentials (TIP4P) make little difference. 12 Projections Straight Projection down axis of helix g(Rf) = g(r) z g(r) oxygen density in 3d Helical Projection along path of polypeptide chain Or unwind helix then project straight g(R,f,z) HLX = g(R,f+2zp,z)z 2p = 3605.4 = 1001.5 Apply atoms in helix & get 4 quadrants: B = Backbone (NCaC) C = Carbonyl (=O) M = Methyl (Cb) G = Gap ( ) Show results in gray-level graphs (black is high) rather than featureless 1d plots 13 Oxygen Density Straight 3 shells: 6.3 , 1.3; 4.6 , 1.1; 9 Helical 3 peaks in Methyl & Backbone Inner 2 peaks merge in Carbonyl to 1 big peak (3.5 vs 1.4) Water not compressed Compare to Richards-Connolly Molecular Surface 1.4 sphere rolling on surface Argon hydration surface Use to tell when 2 helices touch. S3 Ost S4 ZO S5 Z0-3d lj321-Ost lj272-Ost S12 Oseq 14 Hydration Surface = 2nd shell = boundary relative to bulk orientation of waters & energetic effects die off after 1st shell Structural Aspects of Hydrophobicity Density in gap is low (0.23) Are narrow clefts more hydrophobic by virtue of their shape? Qualifications? Electrostatic Interactions Longer-ranged than 2nd shell (DNA 20) Our model system does not have strongly charged groups. 15 Long Calculation but Parallel Program 2 106 MC cycles for >900 waters 4 ns MD (2fs MD = MC cycle) 100 days CPU on 1 i860 (4.5 s/cycle) 200 days on an iris (R3000) 10 days on 8-16 i860s (Alliant = 24 i860s) Coarse-grained Parallelism (COVI) concurrent MC runs & then sum results heat to 800K, slow cool, then do averages ideal for networks of workstations (Linda) for(R=0; R< num_runs ; R++) if (fork() == IS_CHILD) { for(S=0; S