Jeff
Sabina
MBB
452a
December
15, 2000
DNA Microarrays: Comments and Applications
With
the advent of DNA microarray technology and the genome sequences of many model
organisms, the simultaneous monitoring of transcription levels for all genes in
a genome has become possible, and although a relatively new technology, the use
of microarrays has spread to almost all branches of biochemistry and molecular
biology. From uses in drug target
discovery (Giaever
et al., 1999; Hughes et al., 2000), to the study of quorum
sensing in gram-positive bacteria where regulation of an entire unknown regulon
was characterized (de
Saizieu et al., 2000) it is changing the way
we study cellular processes.
Beyond the microbial world it has been used to characterize the
expression “fingerprints” of different types of cancer (Perou et al., 1999). Classification of tumor types in this
way could aid in detection of tumorogenesis before any visible signs are
apparent. In short, DNA
microarrays are being used to address an increasingly diverse set of problems
and there is no limit in sight.
There
are several issues associated with the interpretation and utilization of the
data generated from microarrays.
One very troubling issue is the fact that despite gargantuan effort,
microarray experiments are not on the average absolutely reproducible. The process of preparing total cellular
RNA or labeled cDNA remains one that varies from run to run and from person to
person performing the prep. In a
recent paper in PNAS, Lee et al. discuss a statistical model that they developed
to describe the probability that an mRNA in the initial sample is detected as
present in the final analysis (Lee et al., 2000). It was shown that a given mRNA had at
most a 5% chance of being totally overlooked by the method (false negative) and
a 10% chance of being detected when no mRNA was present (false positive). Since one experiment appears to have a
great degree of variability, replication of the experiment seems to be the only
way to weed out the false signals.
The idea that an experiment may need to be replicated is not something
foreign to us, although with the high cost of these chips, it is much more cost
intensive than repeating a calculation or rerunning a gel.
Measurement
of a gene’s activity in the cell is often expressed as being induced or
repressed by some factor. One might
say that a gene is 2-fold induced in response to some environmental
stimuli. Ideally, one would like
to measure the absolute quantity of mRNA present in the cell at a given time,
but at this time, the technology has yet to reach this stage. Until then, we will continue to talk
about n-fold inductions. One of
the simplest ways of interpreting the data from a microarray experiment is to
look only at the induction factors of the genes from a particular experiment
and ask, “Is gene X induced in response to stimuli Y?’ or the converse, “What genes are
induced when the cells are treated with compound Y?” There are many papers trying to answer
these types of questions. Jia et
al. describe experiments in yeast where the cell cultures were treated with an
inhibitor of amino acid biosynthesis (Jia et al., 2000). Relatively simple comparisons between
microarray experiments using treated and untreated cultures identified several
genes that were differentially regulated in response to the drug. Experiments like these, though
relatively simple, can be powerful tools for probing the relationships between
processes inside the cell. In their paper, Jia et al. found that genes involved
in sulfur uptake were significantly repressed in response to the drug,
something not at all expected.
Many experiments of this type can be found in the literature
investigating topics from yeast sporulation (Fawcett et al., 2000) to metabolic pathways
in E. coli
(Oh and Liao, 2000).
Analysis
of microarray data is not limited to a single set of experimental conditions as
described above. By combining gene
profiling data from several experimental conditions, one can generate a
compendium of responses suitable for analysis en masse. Cluster analysis of multiple sets of
expression data is a useful tool for grouping classifying genes into groups
that are commonly regulated. One of
the most popular methods used for this type of analysis was described by Eisen et
al. (Eisen et al., 1998) and involves not only
constructing a hierarchical tree of commonly regulated genes, but assigns them a
color based on their degree of induction.
This results on a very eye-friendly display of the expression data. This has several advantages over the
method described above. In the
above example, a gene must typically be induced by a factor of two in order to
be considered significant. This is
done primarily to reduce the occurrence of false positives by only looking at
the most drastically changing genes.
In the case of cluster analysis, since there is a virtual repetition of
many genes, false positives are more unlikely and this in effect can lower the
significance threshold to allow for inclusion of a broader spectrum of the
data. The resulting clusters often
bring out some of the subtler co variations that could not be detected using
the simple cut-off method.
The
amount of data generated from each one of these gene profiling experiments is
enormous. Until now, our focus has
been on ways to examine and interpret the data, but perhaps an even more
important issue is the storage of that data for public use. As more and more
laboratories begin experimenting with this microarray technology, and the
equipment becomes less expensive, the need will arise for a way to store and
organize this tremendous amount of data.
Currently, the NCBI is beginning to implement a microarray data
storehouse called the Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/),
but keeping records of these data is not as simple as recording a gene
sequence. The high variability and
lack of standards adds to the problem of storing this data in a format readily
usable to an outside party. It is
certainly going to be a useful tool in the future when much of these problems
are overcome. Imagine planning an
expensive microarray experiment and after consulting the database, finding that
it has been made available to the public.
The time and cost saved will be considerable. So storage and public availability of this data will be a
matter discussion for some now.
Just
as the availability of whole genome sequences has impacted the directions of
science, so will the coming of the great gene expression profile databases of
the near future. Though more
widespread use of microarray technology has only recently come about, it has
found applications in fields ranging from cancer profiling to the study of
yeast metabolism, to drug target identification. As the technology behind the
DNA microarray becomes more fine-tuned, the methods for interpreting and analyzing
the data produced will also grow more precise and informative. It is important to note that generation
of this data is not an endpoint, but it is in the interpretation and application
of this information to living systems that will prove most useful.
References
de Saizieu, A., Gardes, C., Flint, N., Wagner,
C., Kamber, M., Mitchell, T. J., Keck, W., Amrein, K. E., and Lange, R. (2000).
“Microarray-based identification of a novel Streptococcus pneumoniae
regulon controlled by an autoinduced peptide.” J Bacteriol, 182(17): 4696-703.
Eisen,
M. B., Spellman, P. T., Brown, P. O., and Botstein, D. (1998). “Cluster
analysis and display of genome-wide expression patterns.” Proc Natl
Acad Sci U S A,
95(25):
14863-8.
Fawcett,
P., Eichenberger, P., Losick, R., and Youngman, P. (2000). “The
transcriptional profile of early to middle sporulation in Bacillus
subtilis.” Proc Natl Acad Sci U S A, 97(14): 8063-8.
Giaever,
G., Shoemaker, D. D., Jones, T. W., Liang, H., Winzeler, E. A., Astromoff, A.,
and Davis, R. W. (1999). “Genomic profiling of drug sensitivities via
induced haploinsufficiency.” Nat Genet, 21(3): 278-83.
Hughes,
T. R., Marton, M. J., Jones, A. R., Roberts, C. J., Stoughton, R., Armour, C.
D., Bennett, H. A., Coffey, E., Dai, H., He, Y. D., Kidd, M. J., King, A. M.,
Meyer, M. R., Slade, D., Lum, P. Y., Stepaniants, S. B., Shoemaker, D. D.,
Gachotte, D., Chakraburtty, K., Simon, J., Bard, M., and Friend, S. H. (2000).
“Functional discovery via a compendium of expression profiles.” Cell, 102(1): 109-26.
Jia,
M. H., Larossa, R. A., Lee, J. M., Rafalski, A., Derose, E., Gonye, G., and
Xue, Z. (2000). “Global expression profiling of yeast treated with an
inhibitor of amino acid biosynthesis, sulfometuron methyl.” Physiol
Genomics,
3(2):
83-92.
Lee,
M. L., Kuo, F. C., Whitmore, G. A., and Sklar, J. (2000). “Importance of
replication in microarray gene expression studies: statistical methods and
evidence from repetitive cDNA hybridizations.” Proc Natl Acad Sci U S
A, 97(18): 9834-9.
Oh,
M. K., and Liao, J. C. (2000). “Gene expression profiling by DNA
microarrays and metabolic fluxes in Escherichia coli.” Biotechnol Prog, 16(2): 278-86.
Perou,
C. M., Jeffrey, S. S., van de Rijn, M., Rees, C. A., Eisen, M. B., Ross, D. T.,
Pergamenschikov, A., Williams, C. F., Zhu, S. X., Lee, J. C., Lashkari, D.,
Shalon, D., Brown, P. O., and Botstein, D. (1999). “Distinctive gene
expression patterns in human mammary epithelial cells and breast
cancers.” Proc Natl Acad Sci U S A, 96(16): 9212-7.