TA Weikai Li
A substantial number of researchers are content in analyzing gene expression on the transcript level and extrapolating that data to the protein level. This is not sufficient. There are a number of reasons why proteomics must stand on its own [1, 2]. Verification of a gene product by proteomic methods is an important first step in “annotating the genome.” Modifications that are not apparent from the DNA sequence, such as isoforms and post-translational modifications, can be determined only by proteomic methodologies. Moreover, because mRNA levels do not necessarily correlate with protein levels, it may be crucial to assess protein expression directly. The localization of gene products, which is often difficult to deduce from the sequence, can be determined experimentally. Mechanisms, such as regulation of protein function by proteolysis, recycling, and isolation in cell compartments, affect gene products, not genes. Finally, protein-protein interactions and the molecular composition of cellular structures can be determined only at the protein level.
One of the principal tools of proteomics is two-dimensional gel electrophoresis (2DE) [3, 4]. This technology has been in existence for more than two decades, and resolves proteins based on isoelectric point and molecular weight. Even at its current level of refinement, however, 2DE cannot resolve more than approximately 1,000 proteins. Obviously, only the most abundant proteins are visualized if a crude lysate is used. Often times, affinity-based purification strategies are employed to obtain the desired set of proteins before performing electrophoresis. 2DE is in many ways like DNA microarrays; instead of giving a transcript expression pattern, 2DE produces a protein expression pattern. The pattern can then be analyzed by computer systems such as Quest and Melanie, which can align gel images, assign molecular cluster indexes (MCIs), and gauge the relative abundance of proteins at select spots.
Only quite recently have large-scale databases emerged into the public domain, offering annotated 2DE data obtained under different cellular conditions for various organisms. The Yeast Protein Database (YPD; http://www.proteome.com/databases/index.html) is one of the best and longest-established large-scale databases. There is no doubt that we will see a major drive to generate major proteome databases for more organisms as well as a wide variety of human tissues in the years to come.
The other key tool of proteomics is mass spectrometry (MS) [3,4]. It is through the integration of 2DE and MS that proteomics achieves its greatest power. In many ways, modern mass spectrometry has replaced the classical technique of Edman degradation, even in traditional protein chemistry. First, the gel-separated proteins are digested into peptides by sequence-specific proteases and an eluted peptide mixture is acquired. Then matrix-assisted laser desorption/ionization is performed to produce a mass spectrum or “peptide-mass fingerprint.” The second step in protein identification relies on the fragmentation of individual peptides in the mixture to gain sequence information. Electrospray ionization is performed in conjunction with tandem mass spectrometry. Both mass spectrum and sequence information can be searched against databases to identify proteins.
Applications to cancer research
The use of DNA microarrays to study cancer is as established as the technology itself [5, 6]. Transcriptome data is not only used to classify different types of cancer, but to shed light on known and unknown cancer genes: proto-oncogenes, oncogenes, and tumor suppressor genes. Proteome data, on the other hand, is not as pervasive, largely due to technological limitations. However, with the steady advancements in the tools mentioned above, “cancer proteomics” is becoming a reality. The second half of this paper discusses some projects that are exemplary of the research that is being conducted in the joint field of proteomics and oncology.
Page et al. presents the most extensive study to date of the protein expression map (PEM) of the normal human breast, which can be compared to the PEMs of breast cancer cells in further studies . Normal human luminal and myoepithelial breast cells were used in two-dimensional gel proteome studies. A total of 43,302 proteins were detected across 20 samples, and a master image for each cell type comprising a total of 1,738 unique proteins was derived. Differential analysis detected 170 proteins that were elevated two-fold or more between the cell types—including muscle-specific enzyme isoforms and contractile intermediate filaments as well as a large number of cytokeratin subclasses and isoforms—and 51 of these were annotated by tandem mass spectrometry. A further 134 nondifferentially regulated proteins were also annotated from the two breast cell types. The capability of proteomics to identify proteins and determine their relative abundance is clearly demonstrated here.
A strategy for proteomic analysis of human tumors is described by Emmert-Buck et al. . Normal squamous epithelium and corresponding tumor cells from two patients with esophageal cancer were procured by laser-capture microdissection and studied by two-dimensional polyacrylamide gel electrophoresis (2D-PAGE). 50,000 cells resolved approximately 675 distinct proteins (or isoforms). Comparison of the microdissection protein profiles pinpointed 17 proteins with tumor-specific alterations, including ten that were uniquely present in the tumors and seven that were observed only in normal epithelium. Two of the altered proteins were characterized by mass spectrometry and immunoblot analysis, and were identified as cytokeratin 1 and annexin I. In the future, comparison of microdissection protein profiles can be used to analyze suspected malignant tumors and diagnose patients with cancer.
On the clinical level, just recently this month, Voss et al. showed that the correlation of large-scale protein expression profiles with clinical data could be used to gain insights into the molecular aspects of cancer . To demonstrate this in a pilot study, they systematically compared the protein expression patterns obtained by two-dimensional gel electrophoresis with clinical features in human B-cell chronic lymphocytic leukemia, a disease characterized by broad clinical variability. Statistical methods were devised to analyze the spot pattern from 24 patient samples. This analysis allowed the identification of proteins that clearly discriminated between the patient groups with defined chromosomal characteristics or whose expression levels did correlate with clinical parameters such as patient survival.
All of the aforementioned experiments can and will be performed on a large scale, leading to the increased necessity for databases dealing with cancer proteomics. Indeed, Nelson et al. have already begun to integrate proteomic information, in the form of 2DE and MS data, into their database for prostate cancer, the Prostate Expression Database (PEDB; http://www.mbt.washington.edu/PEDB) . They predict that “ultimately the analysis of protein level and function, comprehensively termed the proteome, provides a more accurate assessment of gene expression activity [than the transcriptome].” Other steps in the future include more widespread use of protein chips to assay cancer genes, and the development of artificial learning models that can diagnose and prognose cancer based on proteomic profiles.
1. Banks RE, Dunn MJ, Hochstrasser DF, et al. Proteomics: new perspectives, new biomedical opportunities. The Lancet 2000; 356(9243): 1749-1756.
2. Celis JE, Kruhoffer M, Gromova I, et al. Gene expression profiling: monitoring transcription and translation products using DNA microarrays and proteomics. Federation of European Biochemical Societies 2000; 480(23892): 2-16.
3. Anderson NL, Matheson AD, Steiner S. Proteomics: applications in basic and applied biology. Current Opinion in Biotechnology 2000; 11: 408-412.
4. Celis JE, Ostergaard M, Jensen NA, et al. Human and mouse proteomic databases: novel resources in the protein universe. Federation of European Biochemical Societies 1998; 430(20292): 64-72.
5. Marx J. DNA arrays reveal cancer in its many forms. Science 2000; 289(5485): 1670-1672.
6. Golub TR, Slonim DK, Tamayo P, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999; 286: 531-537.
7. Page MJ, Amess B, Townsend RR, et al. Proteomic definition of normal human luminal and myoepithelial breast cells purified from reduction mammoplasties. Proc. Natl. Acad. Sci. 1999; 96: 12589-12594.
8. Emmert-Buck MR, Gillespie JW, Paweletz CP, et al. An approach to proteomic analysis of human tumors. Molecular Carcinogenesis 2000; 27: 158-165.
9. Voss T, Ahorn H, Haberl P, et al. (2000, December 11). Correlation of clinical data with proteomics profiles in 24 patients with B-cell chronic lymphocytic leukemia. International Journal of Cancer 2000 [Online]. Available: http://www3.interscience.wiley.com.
10. Nelson PS, Clegg N, Eroglu B, et al. The prostate expression database (PEDB): status and enhancements in 2000. Nucleic Acids Research 2000; 28: 212-213.