With the advent of DNA microarray technology and the genome sequences of many model organisms, the simultaneous monitoring of t

Jeff Sabina

MBB 452a

December 15, 2000

DNA Microarrays: Comments and Applications

With the advent of DNA microarray technology and the genome sequences of many model organisms, the simultaneous monitoring of transcription levels for all genes in a genome has become possible, and although a relatively new technology, the use of microarrays has spread to almost all branches of biochemistry and molecular biology. From uses in drug target discovery (Giaever et al., 1999; Hughes et al., 2000), to the study of quorum sensing in gram-positive bacteria where regulation of an entire unknown regulon was characterized (de Saizieu et al., 2000) it is changing the way we study cellular processes. Beyond the microbial world it has been used to characterize the expression “fingerprints” of different types of cancer (Perou et al., 1999). Classification of tumor types in this way could aid in detection of tumorogenesis before any visible signs are apparent. In short, DNA microarrays are being used to address an increasingly diverse set of problems and there is no limit in sight.

There are several issues associated with the interpretation and utilization of the data generated from microarrays. One very troubling issue is the fact that despite gargantuan effort, microarray experiments are not on the average absolutely reproducible. The process of preparing total cellular RNA or labeled cDNA remains one that varies from run to run and from person to person performing the prep. In a recent paper in PNAS, Lee et al. discuss a statistical model that they developed to describe the probability that an mRNA in the initial sample is detected as present in the final analysis (Lee et al., 2000). It was shown that a given mRNA had at most a 5% chance of being totally overlooked by the method (false negative) and a 10% chance of being detected when no mRNA was present (false positive). Since one experiment appears to have a great degree of variability, replication of the experiment seems to be the only way to weed out the false signals. The idea that an experiment may need to be replicated is not something foreign to us, although with the high cost of these chips, it is much more cost intensive than repeating a calculation or rerunning a gel.

Measurement of a gene’s activity in the cell is often expressed as being induced or repressed by some factor. One might say that a gene is 2-fold induced in response to some environmental stimuli. Ideally, one would like to measure the absolute quantity of mRNA present in the cell at a given time, but at this time, the technology has yet to reach this stage. Until then, we will continue to talk about n-fold inductions. One of the simplest ways of interpreting the data from a microarray experiment is to look only at the induction factors of the genes from a particular experiment and ask, “Is gene X induced in response to stimuli Y?’ or the converse, “What genes are induced when the cells are treated with compound Y?” There are many papers trying to answer these types of questions. Jia et al. describe experiments in yeast where the cell cultures were treated with an inhibitor of amino acid biosynthesis (Jia et al., 2000). Relatively simple comparisons between microarray experiments using treated and untreated cultures identified several genes that were differentially regulated in response to the drug. Experiments like these, though relatively simple, can be powerful tools for probing the relationships between processes inside the cell. In their paper, Jia et al. found that genes involved in sulfur uptake were significantly repressed in response to the drug, something not at all expected. Many experiments of this type can be found in the literature investigating topics from yeast sporulation (Fawcett et al., 2000) to metabolic pathways in E. coli (Oh and Liao, 2000).

Analysis of microarray data is not limited to a single set of experimental conditions as described above. By combining gene profiling data from several experimental conditions, one can generate a compendium of responses suitable for analysis en masse. Cluster analysis of multiple sets of expression data is a useful tool for grouping classifying genes into groups that are commonly regulated. One of the most popular methods used for this type of analysis was described by Eisen et al. (Eisen et al., 1998) and involves not only constructing a hierarchical tree of commonly regulated genes, but assigns them a color based on their degree of induction. This results on a very eye-friendly display of the expression data. This has several advantages over the method described above. In the above example, a gene must typically be induced by a factor of two in order to be considered significant. This is done primarily to reduce the occurrence of false positives by only looking at the most drastically changing genes. In the case of cluster analysis, since there is a virtual repetition of many genes, false positives are more unlikely and this in effect can lower the significance threshold to allow for inclusion of a broader spectrum of the data. The resulting clusters often bring out some of the subtler co variations that could not be detected using the simple cut-off method.

The amount of data generated from each one of these gene profiling experiments is enormous. Until now, our focus has been on ways to examine and interpret the data, but perhaps an even more important issue is the storage of that data for public use. As more and more laboratories begin experimenting with this microarray technology, and the equipment becomes less expensive, the need will arise for a way to store and organize this tremendous amount of data. Currently, the NCBI is beginning to implement a microarray data storehouse called the Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/), but keeping records of these data is not as simple as recording a gene sequence. The high variability and lack of standards adds to the problem of storing this data in a format readily usable to an outside party. It is certainly going to be a useful tool in the future when much of these problems are overcome. Imagine planning an expensive microarray experiment and after consulting the database, finding that it has been made available to the public. The time and cost saved will be considerable. So storage and public availability of this data will be a matter discussion for some now.

Just as the availability of whole genome sequences has impacted the directions of science, so will the coming of the great gene expression profile databases of the near future. Though more widespread use of microarray technology has only recently come about, it has found applications in fields ranging from cancer profiling to the study of yeast metabolism, to drug target identification. As the technology behind the DNA microarray becomes more fine-tuned, the methods for interpreting and analyzing the data produced will also grow more precise and informative. It is important to note that generation of this data is not an endpoint, but it is in the interpretation and application of this information to living systems that will prove most useful.

References

de Saizieu, A., Gardes, C., Flint, N., Wagner, C., Kamber, M., Mitchell, T. J., Keck, W., Amrein, K. E., and Lange, R. (2000). “Microarray-based identification of a novel Streptococcus pneumoniae regulon controlled by an autoinduced peptide.” J Bacteriol, 182(17): 4696-703.

Eisen, M. B., Spellman, P. T., Brown, P. O., and Botstein, D. (1998). “Cluster analysis and display of genome-wide expression patterns.” Proc Natl Acad Sci U S A, 95(25): 14863-8.

Fawcett, P., Eichenberger, P., Losick, R., and Youngman, P. (2000). “The transcriptional profile of early to middle sporulation in Bacillus subtilis.” Proc Natl Acad Sci U S A, 97(14): 8063-8.

Giaever, G., Shoemaker, D. D., Jones, T. W., Liang, H., Winzeler, E. A., Astromoff, A., and Davis, R. W. (1999). “Genomic profiling of drug sensitivities via induced haploinsufficiency.” Nat Genet, 21(3): 278-83.

Hughes, T. R., Marton, M. J., Jones, A. R., Roberts, C. J., Stoughton, R., Armour, C. D., Bennett, H. A., Coffey, E., Dai, H., He, Y. D., Kidd, M. J., King, A. M., Meyer, M. R., Slade, D., Lum, P. Y., Stepaniants, S. B., Shoemaker, D. D., Gachotte, D., Chakraburtty, K., Simon, J., Bard, M., and Friend, S. H. (2000). “Functional discovery via a compendium of expression profiles.” Cell, 102(1): 109-26.

Jia, M. H., Larossa, R. A., Lee, J. M., Rafalski, A., Derose, E., Gonye, G., and Xue, Z. (2000). “Global expression profiling of yeast treated with an inhibitor of amino acid biosynthesis, sulfometuron methyl.” Physiol Genomics, 3(2): 83-92.

Lee, M. L., Kuo, F. C., Whitmore, G. A., and Sklar, J. (2000). “Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations.” Proc Natl Acad Sci U S A, 97(18): 9834-9.

Oh, M. K., and Liao, J. C. (2000). “Gene expression profiling by DNA microarrays and metabolic fluxes in Escherichia coli.” Biotechnol Prog, 16(2): 278-86.

Perou, C. M., Jeffrey, S. S., van de Rijn, M., Rees, C. A., Eisen, M. B., Ross, D. T., Pergamenschikov, A., Williams, C. F., Zhu, S. X., Lee, J. C., Lashkari, D., Shalon, D., Brown, P. O., and Botstein, D. (1999). “Distinctive gene expression patterns in human mammary epithelial cells and breast cancers.” Proc Natl Acad Sci U S A, 96(16): 9212-7.