Comprehensive statistical analysis of thermophilic and mesophilic genomes

This page describes an analysis of the thermophilic and mesophilic genomes that tried to identify protein structural features which lead to thermal stability of proteins in thermophiles. We found that salt bridge interaction is more prevalent in thermophilic proteins than in their mesophilic counterparts and it can play an important role in proteins thermotability. We also studied the effect of other factors such as protein length and deamidaion.

Table name Links Fields Description
Organisms

data,

head

Organism, Genome id, No of proteins, Physiological condition This table lists all the organisms that are included in calculation

Raw Sequences

[Seq.fa]

AA,AF, MJ, MT, OT, EC, HI, HP, MG, MP, SC, SS AA, AF, MJ, MT, OT, EC, HI, HP, MG, MP, SC, SS Contains sequence file for 11 genomes

Secondary Structure

[Gorss.fa]

AA,AF, MJ, MT, OT, EC, HI, HP, MG, MP, SC, SS AA, AF, MJ, MT, OT, EC, HI, HP, MG, MP, SC, SS Contains predicted secondary structure sequence file for 11 genomes
amino acid composition in entire genome data Thermophiles, mesophiles, AA, AF, MJ, MT, OT, EC, HI, HP, MG, MP, SC, SS, A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T V, W, Y Amino acid composition
amino acid composition in helix

data,

head

Thermophiles, mesophiles, AA, AF, MJ, MT, OT, EC, HI, HP, MG, MP, SC, SS, A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T V, W, Y Amino acid composition
Plot of helix amino acid composition plot Total content of any amino acid is summed upto 100% Plot shows that K,R and E,D content in helix increases from mesophilic to thermophilic genomes
Comprehensive LOD value table for helix(data1) and genome (data2)

data1,

data2,

aa, sep

AA, AF, MJ, MT, OT, EC, HI, HP, MG, SC, SS, A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T V, W, Y

Tables includes amino acid composition, raw counts and expected counts, odd values and lod values for all 400 pairs at different spacing
LOD values of saltbridge pairs data Spacing, pair, AA, AF, MJ, MT, OT, EC, HI, HP, MG, SC, SS Table contains LOD values for EK, ER, DR at spacing of 3 and 4 for helix and g enome.
LOD values of saltbridge pairs data Spacing, pair, AA, AF, MJ, MT, OT, EC, HI, HP, MG, SC, SS Table contains LOD values for EK, ER, DR at spacing of 3 and 4 for helix and genome.
Diagram showing LOD values for EK(3) and Ek(4) plot - -
Rank statistics of salt bridge pairs

data,

head

SEP, PAIR, AA, AF, MJ, MT, OT, EC, HI, HP, MG, SC, SS Table lists only those ranks which appear within first 20. ( both for helix and genome)
List of 52 COGs data COGcat, COGid, COGrib_pro, COGtrna_syn, COGother Table lists 52 COGs selected for this study
List of COG proteins having PDB structure

data,

head

cogid, class, cat, pthermophileid (pdbid for correspondong cog sequencs) "cat" means category. This of COG proteins found in all of the eleven genomes. "class" defines the functional classification as described by NCBI.
Statistics for COG salt-bridges

data,

head

cat, pdbid, arc_avg, mes_avg,

arc-mes ( means difference between arc_avg and mes_avg)

arc_avg and mes_avg are the average number of salt-bridge counts in archaeal cog and mesophilic cog respectively
Length data of all eleven genomes

data,

head

binnam_, ibin, AA, AF, MJ, MT, OT, EC, HI, HP, MG, SC, SS Table contains the distrbution of protein at different length
Plot of length distribution plot %frequency, protein length Distribution of length data for all 11 genomes,
Plot of length distribution in terms of percentage composition plot protein content, length Distribution of length composition data for all 11 genomes,
Distribution of random LOD values plot - Plot of EK(3) LOD values for randomly generated thermophilic and mesophilic genomes
Plot of fit curves for length distribution plot - Plot of fit curve with length distribution data of overall genome sequences and that of 52 COG sequences for all genomes

Yale Genomes Home