A Bioinformatic analysis of upstream regulatory sequences of glycolytic - gluconeogenesis pathway genes of yeast.

Anandasankar Ray

The recent progress in DNA sequencing has made available to researchers the sequences of a large number of genomes. Most of the bioinformatic analysis since then has focussed on coding sequences. An important aspect of protein function is regulated by transcriptional control of expression of a gene, which is mediated by transcription factors acting on the non-coding cis- DNA sequences of a particular gene. The question I would like to address is how to extract motifs that are shared by upstream sequences, and that may therefore be responsible for coregulation of a group of genes, in this study - metabolic enzymes in yeast.

Typically researchers search for known regulatory patterns (eg. The consensus for a transcription factor) in the upstream regions of genes and look for correlation to the expression profile. Such methods are heuristic and limited by the knowledge of known transcription factor binding sites and identified candidate factors that may be responsible for such control. Typical programs used are MATINSPECTOR and PATTERN SEARCH that use databases like TRANSFAC to match known binding sites to non-coding DNA upstream of a gene.

In recent years however, computer programs have been developed that can identify potential regulatory sequences from within a group of co-regulated genes. The two programs I have used for my analysis are discussed in more detail below.

To detect unknown regulatory sites from a set of potentially co-regulated metabolic pathway genes.

The crucial issue for success of any pattern discovery program is to select families that are likely to be co-regulated by common sets of transcription factors. Enzymes that belong to a same metabolic pathway can be expected to be present at the same time in the same regions so as to carry out the functions sequentially and hence maintain the flux of product formation. Hence their expression may be regulated in a similar manner with their cis-regulatory DNA having similar transcription factor binding sites. In my analysis I have used 800 bps upstream of the genes of the Glycolysis and Gluconeogenesis metabolic pathways of S. cerevisiae. (See Figure 1)

I have used two programs: the word detector OLIGO-ANALYSIS and the DYAD-DETECTOR (van Helden et al., 1998; van Helden et al., 2000a; reviewed in van Helden et al.,2000b). The OLIGO-ANALYSIS method is based on the detection of over-represented oligonucleotides, specifically defined on the basis of statistical significance of a site based on oligo frequency observed in all non-coding sequences in a genome. In contrast to the heuristic methods listed above this analysis is rigorous and exhaustive. The method has been optimized for use with the yeast genomic sequence database. The efficacy of the program has been demonstrated in S. cerevisiae where differential expression data obtained from microarray analysis for different conditions of nitrogen metabolism were used to predict potential regulatory motifs and most known transcription factors involved with the process could be mapped onto the differentially regulated genes (van Helden et al., 1998).

DYAD-ANALYSIS is designed to identify motifs that consist of trinucleotides, spaced by non-conserved regions of variable width. This method is efficient for the detection of binding sites of zinc finger proteins and other transcription factors. This method has been shown to detect regulatory patterns in gene clusters from DNA chip experiments for cell-cycle fluctuating genes in yeast (van Helden, 2000a).

Both these programs are available through a web based interface and can be used either alone or sequentially with a feature map program that automatically generates visual representation of the positions at which patterns were found. (http://copan.cifn.unam.mx/~jvanheld/rsa-tools/)

RESULTS

A total of 15 potential regulatory sites were found in upstream regions of the 22 genes analyzed. Ten of these were hexanucleotide motifs detected by the OLIGOANALYSIS program that may be recognized by different yeast transcription factors, while five others were spaced dyads recognized by the DYAD-DETECTOR program, that could be binding sites for zinc finger transcription factors. The distribution of the potential regulatory elements are shown in figures 2 and figures 3.

Given below is a sumary of the results obtained. This analysis shows that genes whose products take part in the common metabolic pathway of glycolysis /gluconeogenesis have very similar potential regulatory regions within 800 bps upstream of the coding regions. Such identification of potential regulatory sites can help identify not only the cis-elements important for transcriptional regulation of the group of genes, but also help in identifying transcription factors that may play a role in regulation. Experimental evidence would finally be needed to prove that some or all of these sites are indeed involved in regulation.

TABLE 1. Frequency of occurrence of potential regulatory sites.

Gene Name

CAGGGG

ACGTGG

AGGGGG

CCCTGA

ACAGGG

CCCACG

CCCCAC

AGGAAG

CACGTG

GGAN{3}CAC

CAGN{6}GGG

GGAN{1}GCC

PGM1

1

1

1

1

1

0

0

0

0

0

0

0

PGM2

3

1

2

1

2

2

1

0

0

1

1

0

PGI1

0

1

0

0

0

0

0

2

0

1

0

1

HXK1

2

0

0

1

0

1

1

1

1

0

2

0

HXK2

0

0

0

1

0

0

0

0

0

2

0

0

GLK1

1

0

2

1

1

0

0

0

2

1

0

0

GAL10

0

0

0

0

0

0

1

0

0

0

0

0

FBP1

0

0

0

0

1

0

1

1

0

0

1

0

PFK1

0

0

0

0

0

0

0

1

0

0

0

0

PFK2

0

0

0

0

0

0

0

1

0

1

0

2

FBA1

0

2

0

0

0

1

0

1

0

1

0

1

TDH1

1

2

2

1

0

1

1

2

1

0

1

0

TDH2

0

1

0

0

0

0

0

1

1

1

0

1

TDH3

1

0

0

1

1

1

0

0

0

0

1

3

PGK1

0

1

0

1

0

0

0

4

1

1

0

0

GPM1

0

0

0

0

0

0

0

3

0

1

1

0

GPM2

0

1

0

0

0

1

1

0

0

1

0

2

GPM3

0

0

0

1

2

0

1

0

0

0

0

0

END1

1

0

0

1

0

0

0

1

0

1

2

0

END2

0

0

0

0

0

0

0

1

0

1

0

1

CDC19

0

2

0

0

0

1

0

1

1

2

0

1

PYK2

0

1

1

1

1

1

1

1

1

1

1

1