MB&B
452a
15
December 2000
The
advent of microarrays and whole-genome expression experiments have changed the
face of bioinformatics. Bioinformatics
has always sought to integrate molecular biology and computer science in such a
way as to produce and analyze information on a large-scale [1]. This has been aided by the development of
complete genome sequences, and now, whole-genome expression experiments. Currently there are three primary
technologies that are being used for genome-wide expression experiments: cDNA
microarrays, high-density oligonucleotide arrays, and SAGE (serial analysis of
gene expression). These experiments can
generate data estimating the level of mRNA in cell populations systematically
and on an unprecedented scale [1].
These experiments have changed the way in which biological discovery can
now be approached, the implications of which will affect not only biological
research, but aspects of medicine such as disease diagnosis and drug
development. In human cancer research,
the development of microarray technology is of particular interest. Whole-genome expression experiments could be
used as a diagnostic and prognostic tool for tumorgenesis and tumor progression. It may pave the way for new possibilities in
treating what is currently one of the most common causes of death in the U.S.
Most
of the research currently being conducted associating gene expression
experiments with human cancer is in the realm of tumor classification. Up until now, tumor classification has
depended primarily on the histological appearances of the tumors. These distinctions are not so clear;
morphologically similar tumors often have radically different disease
progressions and responses to therapy [2].
An attempt is being made to develop a systematic and unbiased approach
to cancer classification and taxonomy based on whole-genome expression
experiments that will clearly and accurately differentiate tumors. This, as a result will allow for the
development of therapies that can specifically target distinct tumor types.
The
first important step in this process is to define tumor subtypes. T.R. Golub et al. have presented one
approach to this problem using the distinctions between human acute myeloid
leukemia (AML) and acute lymphoblastic leukemia (ALL) as their experimental
model. They have divided cancer
classification into two categories: class prediction and class discovery. Class prediction places tumor samples into
previously defined subtypes. Class
discovery, on the other hand, would define previously unrecognized subtypes. To determine class prediction, an “idealized
expression pattern” was created based on the Affymetrix expression patterns of
the known samples. The 50 genes that
most clearly distinguished between AML and ALL were used as the predictor for
cancer class and new samples were compared to this predictor. It was discovered that this was an extremely
accurate way to determine the class of new samples. Predictors created using between 10 to 200 genes had 100%
accuracy in distinguishing AML from ALL, and making strong predictions (median
PS=0.77) for over 80% of the samples [2].
It is supposed that this sort of class distinction method could be
applied to any tumors subcategories that have measurable differentiating
factors. This would aid tumor diagnosis
and prognosis, and help to determine a tumor’s origin, stage, or grade [2].
As
for class discovery, Golub uses self-organizing maps (SOMs) to cluster the
tumors. This top-down clustering
approach allows the user to define a specified number of clusters prior to
clustering, and thus, is well suited for tumor classification. It was discovered that this technique
allowed for automatic class discovery without prior biological knowledge and
not only differentiated between AML and ALL, but was also able to discern finer
sub-classifications. The method was
approximately 90% effective (34 out of 38 samples) in clustering a group of
samples in their respective classes.
This technique could potentially be used to distinguish between tumors
with different tumorgenesis mechanisms in addition to varying tissue types [2]. However, as of now, neither of these
techniques would serve to replace current diagnostic schemes via morphological
characteristics. Work must still be
done experimentally do avoid potential experimental artifacts that would
generate class distinctions that are experimentally reproducible, but
biologically insignificant.
Numerous
other studies on the use of cDNA microarrays have supported the effectiveness
of gene expression patterns for clustering cancerous tissues apart from each
other and from normal tissues. The
challenge now to these analyses is determining physiologically relevant gene
expression patterns in the human tumors.
Comparison of the observed gene expression data often revealed
significant biases in the classification schemes. Clusters corresponded most strongly with the origin tissue of the
tumor or with the individual from whom the sample was obtained, than with
physiological properties [3, 4].
However,
as in the Golub class prediction scheme, if those prevalent genes are
identified which can distinguish tumor samples on the basis of specific
features that allow for physiological variation, differentiating patterns of
expression can be observed. Gene
expression patterns and their use for possible molecular classifications have
thus been observed for human breast cancers [4, 5], cutaneous malignant
melanoma [6], diffuse large B-cell lymphoma [7], colon cancer [8], and ovarian
carcinomas [9]. Most of these studies
use a gene expression analysis scheme that is comparable to that which was used
by Golub. These tumors are placed in
subtype categories that reflect genetic variations in tumor proliferation rate,
host response and tumors states—properties with significant clinical
implications.
Gene
expression patterns will also provide insight into the mechanism and stages of
tumor progression and metastasis. By
comparing data from high and low metastatic cells from melanoma, patterns of
gene expression have been defined which correlate to specific metastatic tumor
phenotypes [10]. Moreover, the function
of specific genes can be determined using gene expression patterns. Particular genes were identified as being
involved in enhancing metastasis while others, such as RhoC were shown to
inhibit metastasis [10].
With
all the potential benefits of these technologies, more must still be done
experimentally prior to clinical application.
The challenge remains in developing an experimental design that avoids
experimental artifacts. Contamination
by the tissues that surround the tumor may result in patterns that reflect not
the tumor itself, but the contamination [2].
Computationally, more studies are taking place [11] and need to be
undertaken to develop algorithms and assays to that will bring an even higher
resolution and accuracy to gene expression data and be able to resolve the
minute intricacies of the human genome.
Moreover, a database will need to be developed such that these data
could be made accessible to physicians.
In addition to these challenges, the limitations of bioinformatics as a
whole continues to be computational power and the speed at which we are able to
analyze and lend biological relevance to these mass amounts of data.
The
use of gene expression data will revolutionize the way we approach cancer
diagnosis and treatment. The research
that has been done in the past few years has shown that this technology could
be applied in such a way as to more systematically and accurately classify
human tumors. Gene expression could be a valuable diagnostic tool, and even if
it does not replace traditional morphological methods, may serve to confirm or
aid in the diagnosis of unusual or difficult cases. More importantly, these classification methods could be used to
predict the progression and course of the disease and give an accurate
prognosis for the cancer. As a result,
therapies for cancer could be specifically targeted so as to improve the
efficacy and decrease the toxicity of cancer treatments. Finally, gene expression data is also
helping to push new frontiers in pharmacology which will lend to the
development of novel drugs for the treatment of cancer.
1.
Luscombe
N, Greenbaum D, Gerstein M: What is
bioinformatics? An introduction and
overview. IMIA 2001 Yearbook.
2.
Golub
TR, Slonim DK, Tamayo P, Huard C, Gassenbeek M, Mesirov J P, et al: Molecular Classification of Cancer: Class
Discovery and Class Prediction by Gene Expression Monitoring. Science
1999,286 531-537.
3.
Ross
DT, Scherf U, Eisen MB, Perou CM, Rees CA, Spellman P, et al: Systematic variation in gene expression
patterns in human cancer cell lines.
Nature Genetics 2000, 24 227-235.
4.
Perou
CM, Serile T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, et al: Molecular portraits of human breast
tumours. Nature 2000, 406
747-752.
5.
Perou
CM, Jeffrey SS, van de Rijn M, Rees CA, Eisen MB, Ross DT, et al: Distinctive gene expression patterns in
human mammary epithelial cells and breast cancers. Proc.
Natl. Acad. Sci. 1999, 96
9212-9217.
6.
Bittner M, Meltzer P, Chen Y, Jiang Y, Seftor
E, Hendrix M, et al: Molecular
classification of cutaneous malignant melanoma by gene expression
profiling. Nature 2000, 406
536-540.
7.
Alizadeh
AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, et al: Distinct types of diffuse large B-cell
lymphoma identified by gene expression profiling. Nature 2000, 406 503-511.
8.
Alon
U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ: Broad patterns of gene expression revealed
by clustering analysis of tumor and normal colon tissues probed by
oligonucleotide arrays. Proc. Natl. Acad. Sci. 1999, 96 6745-6750.
9.
Wang
K, Gan L, Jeffery E, Gayle M, Gown AM, Skelly M, et al: Monitoring gene expression profile changes in ovarian carcinomas using
cDNA microarray. Gene 1999, 229 101-108.
10.
Clark
EA, Golub TR, Lander ES, Hynes RO: Genomic
analysis of metastasis reveals an essential role for RhoC. Nature
2000, 406 532-535.
11.
Pollack
JR, Perou CM, Alizadeh AA, Eisen MB, Pergamenschikov A, Williams CF, et al: Genome-wide analysis of DNA copy-number
changes using cDNA microarrays. Nature Genetics 1999, 23 41-46.