Jennifer Li

MB&B 452a

15 December 2000

 

Gene Expression Data: Breaking New Ground in Human Cancer Research

 

The advent of microarrays and whole-genome expression experiments have changed the face of bioinformatics.  Bioinformatics has always sought to integrate molecular biology and computer science in such a way as to produce and analyze information on a large-scale [1].  This has been aided by the development of complete genome sequences, and now, whole-genome expression experiments.  Currently there are three primary technologies that are being used for genome-wide expression experiments: cDNA microarrays, high-density oligonucleotide arrays, and SAGE (serial analysis of gene expression).  These experiments can generate data estimating the level of mRNA in cell populations systematically and on an unprecedented scale [1].  These experiments have changed the way in which biological discovery can now be approached, the implications of which will affect not only biological research, but aspects of medicine such as disease diagnosis and drug development.  In human cancer research, the development of microarray technology is of particular interest.  Whole-genome expression experiments could be used as a diagnostic and prognostic tool for tumorgenesis and tumor progression.  It may pave the way for new possibilities in treating what is currently one of the most common causes of death in the U.S.

 

Most of the research currently being conducted associating gene expression experiments with human cancer is in the realm of tumor classification.   Up until now, tumor classification has depended primarily on the histological appearances of the tumors.  These distinctions are not so clear; morphologically similar tumors often have radically different disease progressions and responses to therapy [2].  An attempt is being made to develop a systematic and unbiased approach to cancer classification and taxonomy based on whole-genome expression experiments that will clearly and accurately differentiate tumors.   This, as a result will allow for the development of therapies that can specifically target distinct tumor types.

 

The first important step in this process is to define tumor subtypes.  T.R. Golub et al. have presented one approach to this problem using the distinctions between human acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL) as their experimental model.  They have divided cancer classification into two categories: class prediction and class discovery.  Class prediction places tumor samples into previously defined subtypes.  Class discovery, on the other hand, would define previously unrecognized subtypes.  To determine class prediction, an “idealized expression pattern” was created based on the Affymetrix expression patterns of the known samples.  The 50 genes that most clearly distinguished between AML and ALL were used as the predictor for cancer class and new samples were compared to this predictor.  It was discovered that this was an extremely accurate way to determine the class of new samples.  Predictors created using between 10 to 200 genes had 100% accuracy in distinguishing AML from ALL, and making strong predictions (median PS=0.77) for over 80% of the samples [2].  It is supposed that this sort of class distinction method could be applied to any tumors subcategories that have measurable differentiating factors.  This would aid tumor diagnosis and prognosis, and help to determine a tumor’s origin, stage, or grade [2].

 

As for class discovery, Golub uses self-organizing maps (SOMs) to cluster the tumors.  This top-down clustering approach allows the user to define a specified number of clusters prior to clustering, and thus, is well suited for tumor classification.  It was discovered that this technique allowed for automatic class discovery without prior biological knowledge and not only differentiated between AML and ALL, but was also able to discern finer sub-classifications.  The method was approximately 90% effective (34 out of 38 samples) in clustering a group of samples in their respective classes.  This technique could potentially be used to distinguish between tumors with different tumorgenesis mechanisms in addition to varying tissue types [2].  However, as of now, neither of these techniques would serve to replace current diagnostic schemes via morphological characteristics.  Work must still be done experimentally do avoid potential experimental artifacts that would generate class distinctions that are experimentally reproducible, but biologically insignificant. 

 

Numerous other studies on the use of cDNA microarrays have supported the effectiveness of gene expression patterns for clustering cancerous tissues apart from each other and from normal tissues.  The challenge now to these analyses is determining physiologically relevant gene expression patterns in the human tumors.  Comparison of the observed gene expression data often revealed significant biases in the classification schemes.  Clusters corresponded most strongly with the origin tissue of the tumor or with the individual from whom the sample was obtained, than with physiological properties [3, 4]. 

 

However, as in the Golub class prediction scheme, if those prevalent genes are identified which can distinguish tumor samples on the basis of specific features that allow for physiological variation, differentiating patterns of expression can be observed.  Gene expression patterns and their use for possible molecular classifications have thus been observed for human breast cancers [4, 5], cutaneous malignant melanoma [6], diffuse large B-cell lymphoma [7], colon cancer [8], and ovarian carcinomas [9].  Most of these studies use a gene expression analysis scheme that is comparable to that which was used by Golub.  These tumors are placed in subtype categories that reflect genetic variations in tumor proliferation rate, host response and tumors states—properties with significant clinical implications. 

 

Gene expression patterns will also provide insight into the mechanism and stages of tumor progression and metastasis.  By comparing data from high and low metastatic cells from melanoma, patterns of gene expression have been defined which correlate to specific metastatic tumor phenotypes [10].  Moreover, the function of specific genes can be determined using gene expression patterns.  Particular genes were identified as being involved in enhancing metastasis while others, such as RhoC were shown to inhibit metastasis [10]. 

 

With all the potential benefits of these technologies, more must still be done experimentally prior to clinical application.  The challenge remains in developing an experimental design that avoids experimental artifacts.  Contamination by the tissues that surround the tumor may result in patterns that reflect not the tumor itself, but the contamination [2].  Computationally, more studies are taking place [11] and need to be undertaken to develop algorithms and assays to that will bring an even higher resolution and accuracy to gene expression data and be able to resolve the minute intricacies of the human genome.   Moreover, a database will need to be developed such that these data could be made accessible to physicians.  In addition to these challenges, the limitations of bioinformatics as a whole continues to be computational power and the speed at which we are able to analyze and lend biological relevance to these mass amounts of data.

 

The use of gene expression data will revolutionize the way we approach cancer diagnosis and treatment.  The research that has been done in the past few years has shown that this technology could be applied in such a way as to more systematically and accurately classify human tumors. Gene expression could be a valuable diagnostic tool, and even if it does not replace traditional morphological methods, may serve to confirm or aid in the diagnosis of unusual or difficult cases.  More importantly, these classification methods could be used to predict the progression and course of the disease and give an accurate prognosis for the cancer.  As a result, therapies for cancer could be specifically targeted so as to improve the efficacy and decrease the toxicity of cancer treatments.  Finally, gene expression data is also helping to push new frontiers in pharmacology which will lend to the development of novel drugs for the treatment of cancer.

 

References

 

1.                  Luscombe N, Greenbaum D, Gerstein M: What is bioinformatics?  An introduction and overview.  IMIA 2001 Yearbook.

 

2.                  Golub TR, Slonim DK, Tamayo P, Huard C, Gassenbeek M, Mesirov J P, et al: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring.  Science 1999,286 531-537.

 

3.                  Ross DT, Scherf U, Eisen MB, Perou CM, Rees CA, Spellman P, et al: Systematic variation in gene expression patterns in human cancer cell lines.  Nature Genetics 2000, 24 227-235.

 

4.                  Perou CM, Serile T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, et al: Molecular portraits of human breast tumours.  Nature 2000, 406 747-752.

 

5.                  Perou CM, Jeffrey SS, van de Rijn M, Rees CA, Eisen MB, Ross DT, et al: Distinctive gene expression patterns in human mammary epithelial cells and breast cancers.  Proc. Natl. Acad. Sci. 1999, 96 9212-9217.

 

6.                   Bittner M, Meltzer P, Chen Y, Jiang Y, Seftor E, Hendrix M, et al: Molecular classification of cutaneous malignant melanoma by gene expression profiling.  Nature 2000, 406 536-540.

 

7.                  Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, et al: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling.  Nature 2000, 406 503-511.

 

8.                  Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays.  Proc. Natl. Acad. Sci. 1999, 96 6745-6750.

 

9.                  Wang K, Gan L, Jeffery E, Gayle M, Gown AM, Skelly M, et al: Monitoring gene expression profile changes in ovarian carcinomas using cDNA microarray.  Gene 1999, 229 101-108.

 

10.              Clark EA, Golub TR, Lander ES, Hynes RO: Genomic analysis of metastasis reveals an essential role for RhoC.  Nature 2000, 406 532-535.

 

11.              Pollack JR, Perou CM, Alizadeh AA, Eisen MB, Pergamenschikov A, Williams CF, et al: Genome-wide analysis of DNA copy-number changes using cDNA microarrays.  Nature Genetics 1999, 23 41-46.