Marlynn H. Wei

December 13, 2000


Discovery of Novel Antibiotics: Application of Structural and Functional Genomics


Complete genome sequences of bacterial organisms have revolutionized the search for antibiotics.  The completion of ~30 bacterial whole-genome sequences and ongoing sequencing projects of over 100 microbial organisms will allow researchers to find novel therapeutic targets in innovative ways (1).   The search for new antibiotics can be assisted by computational methods such as homology-based analyses, structural genomics, motif analyses, protein-protein interactions, and experimental functional genomics (1).   The greatest obstacle is the massive wealth of data from the genome sequences.  The sequence of microbial pathogens catalogs every gene product that would be relevant for the host-parasite interaction and potential antibiotic drug target (2).  Therefore, scientists interested in discovering antibiotics must extract useful information from genomes through comparative, functional, or structural genomics in order to simplify drug target selection.


The advent of bacterial whole-genome sequences and establishment of useful genomic analyses comes at a crucial time for antibiotic development.  Increased resistance of commonly used antibiotics, a growing prevalence of infections, and the emergence of new pathogenic organisms challenge current use of antibiotic therapy (3).  Resistance is more likely when newly introduced antibiotics are chemically similar to ones already rendered ineffective.  Therefore, new antimicrobial compounds should ideally have novel mechanisms of action.  Effective drug targets are selected based on several important criteria: they must be necessary to bacterial survival or growth, highly conserved in either a broad- or narrow- range of pathogens, absent or very different in humans, and understood biochemically (3).  Currently, commonly used antibiotic drugs target series-specific genes, unique enzymes and membrane transporters (4).  Antibiotics have several different mechanisms of actions: preventing cell wall or membrane synthesis, protein synthesis, membrane transport, and nucleic acid replication (5).  The availability of whole genomes of many pathogenic bacteria allows one to speed up the process of drug target selection by finding novel genes in new and old functional categories previously mentioned.


The analysis of open reading frames of bacterial sequences makes all genes and gene products as possible drug targets (6).  Scientist must therefore isolate the genes that are essential to cell survival or growth, which would be most effective as antibiotic targets.  Traditionally, new genes that were necessary to bacterial survival or virulence were discovered through random mutagenesis and phenotyping of the bacterial genome (2).  However, scientists can now use automated comparisons of bacterial genomes to categorize genes and the proteins encoded.   Primary sequence comparison programs, like BLAST or PSI-BLAST, can determine gene functions by sequence homology.   Sequence homology is also used to determine clusters of orthologous groups (COGs).  COGs are groups of genes shared by evolutionarily distant organisms.  These orthologous families of genes are prime candidates for broad-spectrum antimicrobial agents (2).


Sequence homology based methods have disadvantages however. About 25-40% of the genes in a bacterial genome usually do not find matches with known genes (6).  Furthermore, sequence homology is based on the assumption that similar sequences will share similar functions—a presupposition that does not hold true in many cases where similar sequences are structurally and functionally diverse.


Therefore, one must turn to methods that do not rely on sequence homology.  One accurate way of accessing function is gene expression profiling with cluster analysis.  Cluster analysis uses microarray technology to analyze gene expression in order to organize genes into functional groups (7).  Unknown genes functions can be estimated based on the general pathways or metabolic functions of nearby clusters.  However, assigning gene function by cluster analyses are subject to inaccuracies as well.  The notion of function itself is ambiguous and often misannotated.  Second, some proteins have multiple functions and likewise, some functions require multiple proteins (8).


Therefore, structural genomics has been suggested as a better method of drug target selection.  Function is more directly a consequence of its structure than its sequence (3, 8).   One way to assign unknown gene function is by 3D structure comparison to a protein structure database.  Structural homologs tend to share functions.  Furthermore, a good drug target would be structurally different or nonexistent in humans.  Checking for structural homology against a human genome protein structure database would determine whether the antibiotic against that drug target would also interfere with any human functions.


Scientists can use phylogenetic groups that are based on the specific folds shared by organisms.  These fold and sequence families in bacterial pathogens can be useful antibiotic targets (9).  One can find a fold common to an entire phylogenetic group in order to target all of the organisms with a broad-spectrum antibiotic.  Alternatively, one can find a fold that is unique to one particular pathogen for an effective narrow-spectrum antibiotic target.    Structural methods are the ideal for selection of drug targets.  However, structural databases are not complete since quality protein-crystals are difficult to form and hinders x-ray crystallography (10).  However, nuclear magnetic resonance can determine 3D structure determination.  Also, computational modeling is approaching accurate functional predictions based on alignment of amino acid sequences (11).


Motif analysis is another strategy to identify potential antibiotic targets among genes with unknown functions.  Many databases, including PROSITE database, can search for motifs in a sequence (2).  The motifs may show the approximate biochemical function of the gene.  Fourth, gene fusion is a new computational method to infer protein interactions from genome sequences.  Proteins that interact with each other tend to have homologs in other organisms that are joined into a single protein chain.  This method would give additional functional information for target proteins (2).


Finally, drug targets can be characterized further by using gene expression profiles: DNA microarrays, large-scale protein interaction mapping, and proteomics (2).  Genes that are functionally related are assumed to have similar gene expression profile patterns.  Protein synthesis patterns are also useful to analyze the antimicrobial effect certain drugs would have on particular necessary or important proteins (12).


The use of computational methods and expression profiling all point to the need for a nonredundant, complete database of structural and functional annotation of the proteins from known pathogenic bacterial genomes and the human genome, once it is completed.  The organization, accuracy, and easy accessibility of such databases are crucial in the hunt for novel antibiotics.  Perhaps a program can be specifically designed to highlight antibiotic drug targets in query sequences.  This program would scan structural databases and other bacterial genomes for homology and similar folds.  The program could be complemented by a central, tailored database that reorganizes data for the most efficient search of novel antibiotic targets.  For example, each protein or gene that is essential to certain bacterial species.  For example, the database could include the protein’s phylogenetic group, 3D structure, proteins of similar structural homology, and whether any similar protein exists in humans.  It could also use foreign keys to connect to other databases that catalogue which known antibiotics and inhibitors are used against similar targets.


In conclusion, the need for functional and structural characterization and highly efficient management systems for the data is integral for antibiotic drug-hunting.  Database management of these complete microbial genomes should be constructed carefully and interconnected with other public and private databases.  These databases containing structural and functional annotation of whole-genome sequences of pathogenic bacteria will make the search for new antibiotics highly effective and efficient.   One should note that although the prospects for new antibiotics are brightened by whole-genome sequences, there are still many obstacles to developing an effective antibiotic.  Most target sites generated would be cytoplasmic and would be difficult to reach past the bacterial cell envelope (13).  The Federal Drug Agency is also reluctant to approve new antibiotics that use novel mechanisms and drug targets (20). Thus, although the use of comparative, functional, and structural genomics speeds up the process drug development, there are many more obstacles toward generating an effective and approved antibiotic.


Word count: 1278




1. Loferer, H. (2000) Mining bacterial genomes for antimicrobial targets. Molecular Medicine Today.  6: 470-474.

2. Hood, D.W.  (1999) The utility of complete genome sequences in the study of pathogenic bacteria.  Parasitology. 118: S3-S9.

3. Rosamond, J., Allsop, A. (2000) Harnessing the Power of the Genome in the Search for New Antibiotics.  Science.  287 (5460): 1973-6.

4. Galperin, M.Y., Koonin, E.V.  (1999) Searching for drug targets in microbial genomes.  Current Opinion in Biotechnology.  10: 571-8.

5. Adams, G. Lecture notes, Microbiology, 29 Feb. 2000.

6. Smith, D.R. .  (1996) Microbial pathogen genomes—new strategies for identifying therapeutics and vaccine targets.  Trends Biotechnology.  8: 290-3.

7. Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.  (1998) Cluster analysis and display of genome-wide expression patterns.  Proc. of Natl. Acad. Sci. 95 (25): 14863-14868. 

8. Gerstein, M., Jansen, R. (2000) The current excitement in bioinformatics—analysis of whole-genome expression data: how does it relate to protein structure and function? Current Opinion in Structural Biology. 10: 574-584.

9. Gerstein, M. (2000) Integrative database analysis in structural genomics.  Nature, Structural Biology. Sup, 960-3.

10. Holm, L. Sander, C. (1993) Protein structure comparison by alignment of distance matrices.  Journal of Molecular Biology.  233: 123-138.

11. Grigoriev, I.V., Kim, S-H. (1999) Detection of protein fold similarity based on correlation of amino acid properties. Proc. Natl. Acad. Sci. U.S.A. 96, 14318-14323.

12. Frosch, M., Reidl, J.  (1998) Genomics in infectious diseases: approaching the pathogens.  Trends in Microbiology. 6 (9): 346-9.

13. Kotra, L.P., Vakulenko, S., Mobashery, S. (2000) From genes to sequences to antibiotics: prospects for future developments from microbial genomics.  2:651-658.