Protein Folds in the Worm Genome

M Gerstein, J Lin & H Hegyi (2000). Pac. Symp. Biocomp. (in press). [preprint]

This site gives an analysis of protein folds in the worm genome. The methods we used include pairwise and multiple-sequence comparison methods (i.e. FASTA and PSI-blast). Overall, we find that ~250 folds match ~8000 domains in ~4500 ORFs, about 32 matches per fold involving a quarter of the total worm ORFs. We compare the folds in the worm genome to those in other model organisms, in particular yeast and E. coli, and find that the worm shares more folds with the phylogenetically closer yeast than with E. coli. There appear to be 36 folds unique to the worm compared to these two model organisms, and many of these are obviously implicated in aspects of multicellularity. The most common fold in the worm genome is the immunoglobulin fold, and many of the common folds are repeated in various combinations and permutations in multidomain proteins. In addition, an approach is presented for the identification of “sure” and “marginal” membrane proteins. When applied to the worm genome, this reveals a much greater relative prevalence of proteins with seven transmembrane helices in comparison to the other completely sequenced genomes, which are not of metazoans. Combining these analyses with some other simple filters allows one to identify ORFs that potentially code for soluble proteins of unknown fold, which may be promising targets for experimental investigation in structural genomics.

Old Site