Tables Comparing the Known Folds in Various Genomes

Table Name Size (kb), Format Links Fields (keys bold) Description
minscop summary 54 k, tab delim. data, head EC, SC, SS, HI, HP, MJ, MP, MG, |, class, SF, type, count

The table summarizes the patterns of fold usage in minscop_report 
(which, in turn, is derived from merging descrip_did and the many
minscop_occurrence).
This is derived from an analysis of the genomes EC SC SS HI HP MJ MP MG.
For all * is the wildcard and matches all of types. 
class describes the fold class 
SF is whether or not this applies to superfolds
type is as follows:
pattern_exist -- for all genomes 
pattern_exist_unordered  -- for all genomes just considering the number of genomes
exist_in_a_genome -- whether or not a fold exists in a genome
total_in_a_genome -- accumulates the count of folds in a particular genome


minscop report 140 k, tab delim. data, head obj_id_, class, Fold, EC, SC, SS, HI, HP, MJ, MP, MG, total, Fam., PDB, Rep., Struc., Name, totexist, sortidx, SF, nclass, class2, did, fids, longid, EC, SC, SS, HI, HP, MJ, MP, MG

Detailed report on the fold usage in the genomes
EC SC SS HI HP MJ MP MG
This large joined table is derived from merging the 
following tables: minscop,
descrip_did, and many minscop_occurrence.
It contains the name of each fold, a best representative scop domain
id (did), with associated pdb id and residue selection, the number of
times the fold appears in scop and minscop.
Some of the most important fields are described below. 
did      = a best representative (scop domain id)
Fam.     = the number in minscop (number of seq. families)
PDB      = the number of these domains in the PDB, according to scop 1.35
Name     = the name for this fold object
total    = total number of a given fold in all the genomes
totexist = how many genomes a given fold exists in
sortidx  = totexist + total / 1000
SF       = whether or not the fold is a superfold
class    = a representation for the fold's class
Fold#    = scop fold number corresponding to the domain
The final columns just given a representation of whether or not the
fold exists in a given genome.
Here are the actual db storing lines (for reference):
$fold_report->store($obj_id_,$csym2{$class},$foldnum,
			@tuple,$totfolds,$N_minsp,
			$N_scop,$pdbsel{$did},$name,$totexist,$sortidx,$superfold,
			$class,$csym{$class},$did,$fids,$a,\@tuple_exist);


merged summary 77 k, tab delim. data, head EC, SC, HI, SS, HP, MJ, MP, MG, |, class, SF, type, fold_n, sfam_n, fam_n

This table summarizes the patterns of fold, superfamily, and family
usage in the genomes (EC SC HI SS HP MJ MP MG).
For all * is the wildcard and matches all of types. 
class describes the fold class
SF is whether or not this applies to superfolds
type is as follows:
pattern_exist -- for all genomes 
pattern_exist_unordered  -- for all genomes just considering the number of genomes
exist_in_a_genome -- whether or not a fold exists in a genome
total_in_a_genome -- accumulates the count of folds in a particular genome


fold summary 52 k, tab delim. data, head EC, SC, HI, SS, HP, MJ, MP, MG, |, class, SF, type, count

The table summarizes the patterns of fold usage in fold_report 
(which, in turn, is derived from merging descrip_fold and the many
fold_occurrence).
This is derived from an analysis of the genomes EC SC HI SS HP MJ MP MG.
For all * is the wildcard and matches all of types. 
class describes the fold class 
SF is whether or not this applies to superfolds
type is as follows:
pattern_exist -- for all genomes 
pattern_exist_unordered  -- for all genomes just considering the number of genomes
exist_in_a_genome -- whether or not a fold exists in a genome
total_in_a_genome -- accumulates the count of folds in a particular genome


fold report 48 k, tab delim. data, head obj_id_, class, Fold, EC, SC, HI, SS, HP, MJ, MP, MG, total, Fam., PDB, Rep., Struc., Name, totexist, sortidx, SF, nclass, class2, did, fids, longid, EC, SC, HI, SS, HP, MJ, MP, MG

Detailed report on the fold usage in the genomes
EC SC HI SS HP MJ MP MG
This large joined table is derived from merging the 
following tables: minscop,
descrip_fold, and many fold_occurrence.
It contains the name of each fold, a best representative scop domain
id (did), with associated pdb id and residue selection, the number of
times the fold appears in scop and minscop.
Some of the most important fields are described below. 
did      = a best representative (scop domain id)
Fam.     = the number in minscop (number of seq. families)
PDB      = the number of these domains in the PDB, according to scop 1.35
Name     = the name for this fold object
total    = total number of a given fold in all the genomes
totexist = how many genomes a given fold exists in
sortidx  = totexist + total / 1000
SF       = whether or not the fold is a superfold
class    = a representation for the fold's class
Fold#    = scop fold number corresponding to the domain
The final columns just given a representation of whether or not the
fold exists in a given genome.
Here are the actual db storing lines (for reference):
$fold_report->store($obj_id_,$csym2{$class},$foldnum,
			@tuple,$totfolds,$N_minsp,
			$N_scop,$pdbsel{$did},$name,$totexist,$sortidx,$superfold,
			$class,$csym{$class},$did,$fids,$a,\@tuple_exist);


fold dist ratio 1 k, phylip dist. matrix data, head key_, val1, val2

Distance between each genome in terms of the number of shared folds. 
This phylip formatted matrix contains the ratio of non-shared folds
to shared ones.


crosstab summary 13 k, tab delim. data, head 9, EC, SC, HI, SS, HP, MJ, MP, MG, fold, sfam., fam., fold-A, fold-B, fold-SF, fold-AB, mA*, mB*, mN*, m*S, sA*, sB*, sN*, s*S

This table crosstabulates the fields in the merged_summary
table. 
For each pattern of occurences in the genomes (EC SC HI SS HP MJ MP MG),
a number of different counts are given. Here are the main ones:
fold  = number of folds
sfam. = number of superfamilies 
fam.  = number of distinct minscop families
fold-A = number of all-alpha folds
fold-B =   ..   .. all-beta   ..
fold-AB =  ..   .. mixed      ..
fold-SF =  ..   .. superfolds
Patterns with 1 and _ are to be read literally.  Those with + and -
are unordered.


fold dist both have fold 1 k, phylip dist. matrix data, head key_, val1, val2

Distance between each genome in terms of the number of shared folds. 
This phylip formatted matrix contains the number of shared folds. 


fold dist neither has fold 1 k, phylip dist. matrix data, head key_, val1, val2

Distance between each genome in terms of the number of shared folds. 
This phylip formatted matrix contains the number of folds
that are in neither genome. 


fold dist one has fold 1 k, phylip dist. matrix data, head key_, val1, val2

Distance between each genome in terms of the number of shared folds. 
This phylip formatted matrix contains the number of folds
that are contained in one but not the other genome. 


gen aa comp dist 1 k, phylip dist. matrix data, head



minscop dist both have fold 1 k, phylip dist. matrix data, head key_, val1, val2

Distance between each genome in terms of the number of shared folds. 
This phylip formatted matrix contains the number of shared folds. 


minscop dist neither has fold 1 k, phylip dist. matrix data, head key_, val1, val2

Distance between each genome in terms of the number of shared folds. 
This phylip formatted matrix contains the number of folds
that are in neither genome. 


minscop dist one has fold 1 k, phylip dist. matrix data, head key_, val1, val2

Distance between each genome in terms of the number of shared folds. 
This phylip formatted matrix contains the number of folds
that are contained in one but not the other genome. 


minscop dist ratio 1 k, phylip dist. matrix data, head key_, val1, val2

Distance between each genome in terms of the number of shared folds. 
This phylip formatted matrix contains the ratio of non-shared folds
to shared ones.


sfams dist both have fold 1 k, phylip dist. matrix data, head key_, val1, val2

Distance between each genome in terms of the number of shared folds. 
This phylip formatted matrix contains the number of shared folds. 


sfams dist neither has fold 1 k, phylip dist. matrix data, head key_, val1, val2

Distance between each genome in terms of the number of shared folds. 
This phylip formatted matrix contains the number of folds
that are in neither genome. 


sfams dist one has fold 1 k, phylip dist. matrix data, head key_, val1, val2

Distance between each genome in terms of the number of shared folds. 
This phylip formatted matrix contains the number of folds
that are contained in one but not the other genome. 


sfams dist ratio 1 k, phylip dist. matrix data, head key_, val1, val2

Distance between each genome in terms of the number of shared folds. 
This phylip formatted matrix contains the ratio of non-shared folds
to shared ones.


sfams report 71 k, tab delim. data, head obj_id_, class, Fold, EC, SC, HI, SS, HP, MJ, MP, MG, total, Fam., PDB, Rep., Struc., Name, totexist, sortidx, SF, nclass, class2, did, fids, longid, EC, SC, HI, SS, HP, MJ, MP, MG

Detailed report on the fold usage in the genomes
EC SC HI SS HP MJ MP MG
This large joined table is derived from merging the 
following tables: minscop,
descrip_sfam, and many sfam_occurrence.
It contains the name of each fold, a best representative scop domain
id (did), with associated pdb id and residue selection, the number of
times the fold appears in scop and minscop.
Some of the most important fields are described below. 
did      = a best representative (scop domain id)
Fam.     = the number in minscop (number of seq. families)
PDB      = the number of these domains in the PDB, according to scop 1.35
Name     = the name for this fold object
total    = total number of a given fold in all the genomes
totexist = how many genomes a given fold exists in
sortidx  = totexist + total / 1000
SF       = whether or not the fold is a superfold
class    = a representation for the fold's class
Fold#    = scop fold number corresponding to the domain
The final columns just given a representation of whether or not the
fold exists in a given genome.
Here are the actual db storing lines (for reference):
$fold_report->store($obj_id_,$csym2{$class},$foldnum,
			@tuple,$totfolds,$N_minsp,
			$N_scop,$pdbsel{$did},$name,$totexist,$sortidx,$superfold,
			$class,$csym{$class},$did,$fids,$a,\@tuple_exist);


sfams summary 54 k, tab delim. data, head EC, SC, HI, SS, HP, MJ, MP, MG, |, class, SF, type, count

The table summarizes the patterns of fold usage in sfams_report 
(which, in turn, is derived from merging descrip_sfam and the many
sfam_occurrence).
This is derived from an analysis of the genomes EC SC HI SS HP MJ MP MG.
For all * is the wildcard and matches all of types. 
class describes the fold class 
SF is whether or not this applies to superfolds
type is as follows:
pattern_exist -- for all genomes 
pattern_exist_unordered  -- for all genomes just considering the number of genomes
exist_in_a_genome -- whether or not a fold exists in a genome
total_in_a_genome -- accumulates the count of folds in a particular genome


unsorted fold dist both have fold 1 k, phylip dist. matrix data, head key_, val1, val2

Distance between each genome in terms of the number of shared folds. 
This phylip formatted matrix contains the number of shared folds. 


unsorted fold dist neither has fold 1 k, phylip dist. matrix data, head key_, val1, val2

Distance between each genome in terms of the number of shared folds. 
This phylip formatted matrix contains the number of folds
that are in neither genome. 


unsorted fold dist one has fold 1 k, phylip dist. matrix data, head key_, val1, val2

Distance between each genome in terms of the number of shared folds. 
This phylip formatted matrix contains the number of folds
that are contained in one but not the other genome. 


unsorted fold dist ratio 1 k, phylip dist. matrix data, head key_, val1, val2

Distance between each genome in terms of the number of shared folds. 
This phylip formatted matrix contains the ratio of non-shared folds
to shared ones.


unsorted fold report 48 k, tab delim. data, head obj_id_, class, Fold, MG, MP, MJ, HP, SS, HI, SC, EC, total, Fam., PDB, Rep., Struc., Name, totexist, sortidx, SF, nclass, class2, did, fids, longid, MG, MP, MJ, HP, SS, HI, SC, EC

Detailed report on the fold usage in the genomes
MG MP MJ HP SS HI SC EC
This large joined table is derived from merging the 
following tables: minscop,
descrip_fold, and many fold_occurrence.
It contains the name of each fold, a best representative scop domain
id (did), with associated pdb id and residue selection, the number of
times the fold appears in scop and minscop.
Some of the most important fields are described below. 
did      = a best representative (scop domain id)
Fam.     = the number in minscop (number of seq. families)
PDB      = the number of these domains in the PDB, according to scop 1.35
Name     = the name for this fold object
total    = total number of a given fold in all the genomes
totexist = how many genomes a given fold exists in
sortidx  = totexist + total / 1000
SF       = whether or not the fold is a superfold
class    = a representation for the fold's class
Fold#    = scop fold number corresponding to the domain
The final columns just given a representation of whether or not the
fold exists in a given genome.
Here are the actual db storing lines (for reference):
$fold_report->store($obj_id_,$csym2{$class},$foldnum,
			@tuple,$totfolds,$N_minsp,
			$N_scop,$pdbsel{$did},$name,$totexist,$sortidx,$superfold,
			$class,$csym{$class},$did,$fids,$a,\@tuple_exist);


unsorted fold summary 52 k, tab delim. data, head MG, MP, MJ, HP, SS, HI, SC, EC, |, class, SF, type, count

The table summarizes the patterns of fold usage in unsorted_fold_report 
(which, in turn, is derived from merging descrip_fold and the many
fold_occurrence).
This is derived from an analysis of the genomes MG MP MJ HP SS HI SC EC.
For all * is the wildcard and matches all of types. 
class describes the fold class 
SF is whether or not this applies to superfolds
type is as follows:
pattern_exist -- for all genomes 
pattern_exist_unordered  -- for all genomes just considering the number of genomes
exist_in_a_genome -- whether or not a fold exists in a genome
total_in_a_genome -- accumulates the count of folds in a particular genome


unsorted minscop dist both have fold 1 k, phylip dist. matrix data, head key_, val1, val2

Distance between each genome in terms of the number of shared folds. 
This phylip formatted matrix contains the number of shared folds. 


unsorted minscop dist neither has fold 1 k, phylip dist. matrix data, head key_, val1, val2

Distance between each genome in terms of the number of shared folds. 
This phylip formatted matrix contains the number of folds
that are in neither genome. 


unsorted minscop dist one has fold 1 k, phylip dist. matrix data, head key_, val1, val2

Distance between each genome in terms of the number of shared folds. 
This phylip formatted matrix contains the number of folds
that are contained in one but not the other genome. 


unsorted minscop dist ratio 1 k, phylip dist. matrix data, head key_, val1, val2

Distance between each genome in terms of the number of shared folds. 
This phylip formatted matrix contains the ratio of non-shared folds
to shared ones.


unsorted minscop report 140 k, tab delim. data, head obj_id_, class, Fold, MG, MP, MJ, HP, SS, HI, SC, EC, total, Fam., PDB, Rep., Struc., Name, totexist, sortidx, SF, nclass, class2, did, fids, longid, MG, MP, MJ, HP, SS, HI, SC, EC

Detailed report on the fold usage in the genomes
MG MP MJ HP SS HI SC EC
This large joined table is derived from merging the 
following tables: minscop,
descrip_did, and many minscop_occurrence.
It contains the name of each fold, a best representative scop domain
id (did), with associated pdb id and residue selection, the number of
times the fold appears in scop and minscop.
Some of the most important fields are described below. 
did      = a best representative (scop domain id)
Fam.     = the number in minscop (number of seq. families)
PDB      = the number of these domains in the PDB, according to scop 1.35
Name     = the name for this fold object
total    = total number of a given fold in all the genomes
totexist = how many genomes a given fold exists in
sortidx  = totexist + total / 1000
SF       = whether or not the fold is a superfold
class    = a representation for the fold's class
Fold#    = scop fold number corresponding to the domain
The final columns just given a representation of whether or not the
fold exists in a given genome.
Here are the actual db storing lines (for reference):
$fold_report->store($obj_id_,$csym2{$class},$foldnum,
			@tuple,$totfolds,$N_minsp,
			$N_scop,$pdbsel{$did},$name,$totexist,$sortidx,$superfold,
			$class,$csym{$class},$did,$fids,$a,\@tuple_exist);


unsorted minscop summary 54 k, tab delim. data, head MG, MP, MJ, HP, SS, HI, SC, EC, |, class, SF, type, count

The table summarizes the patterns of fold usage in unsorted_minscop_report 
(which, in turn, is derived from merging descrip_did and the many
minscop_occurrence).
This is derived from an analysis of the genomes MG MP MJ HP SS HI SC EC.
For all * is the wildcard and matches all of types. 
class describes the fold class 
SF is whether or not this applies to superfolds
type is as follows:
pattern_exist -- for all genomes 
pattern_exist_unordered  -- for all genomes just considering the number of genomes
exist_in_a_genome -- whether or not a fold exists in a genome
total_in_a_genome -- accumulates the count of folds in a particular genome


unsorted sfams dist both have fold 1 k, phylip dist. matrix data, head key_, val1, val2

Distance between each genome in terms of the number of shared folds. 
This phylip formatted matrix contains the number of shared folds. 


unsorted sfams dist neither has fold 1 k, phylip dist. matrix data, head key_, val1, val2

Distance between each genome in terms of the number of shared folds. 
This phylip formatted matrix contains the number of folds
that are in neither genome. 


unsorted sfams dist one has fold 1 k, phylip dist. matrix data, head key_, val1, val2

Distance between each genome in terms of the number of shared folds. 
This phylip formatted matrix contains the number of folds
that are contained in one but not the other genome. 


unsorted sfams dist ratio 1 k, phylip dist. matrix data, head key_, val1, val2

Distance between each genome in terms of the number of shared folds. 
This phylip formatted matrix contains the ratio of non-shared folds
to shared ones.


unsorted sfams report 71 k, tab delim. data, head obj_id_, class, Fold, MG, MP, MJ, HP, SS, HI, SC, EC, total, Fam., PDB, Rep., Struc., Name, totexist, sortidx, SF, nclass, class2, did, fids, longid, MG, MP, MJ, HP, SS, HI, SC, EC

Detailed report on the fold usage in the genomes
MG MP MJ HP SS HI SC EC
This large joined table is derived from merging the 
following tables: minscop,
descrip_sfam, and many sfam_occurrence.
It contains the name of each fold, a best representative scop domain
id (did), with associated pdb id and residue selection, the number of
times the fold appears in scop and minscop.
Some of the most important fields are described below. 
did      = a best representative (scop domain id)
Fam.     = the number in minscop (number of seq. families)
PDB      = the number of these domains in the PDB, according to scop 1.35
Name     = the name for this fold object
total    = total number of a given fold in all the genomes
totexist = how many genomes a given fold exists in
sortidx  = totexist + total / 1000
SF       = whether or not the fold is a superfold
class    = a representation for the fold's class
Fold#    = scop fold number corresponding to the domain
The final columns just given a representation of whether or not the
fold exists in a given genome.
Here are the actual db storing lines (for reference):
$fold_report->store($obj_id_,$csym2{$class},$foldnum,
			@tuple,$totfolds,$N_minsp,
			$N_scop,$pdbsel{$did},$name,$totexist,$sortidx,$superfold,
			$class,$csym{$class},$did,$fids,$a,\@tuple_exist);


unsorted sfams summary 54 k, tab delim. data, head MG, MP, MJ, HP, SS, HI, SC, EC, |, class, SF, type, count

The table summarizes the patterns of fold usage in unsorted_sfams_report 
(which, in turn, is derived from merging descrip_sfam and the many
sfam_occurrence).
This is derived from an analysis of the genomes MG MP MJ HP SS HI SC EC.
For all * is the wildcard and matches all of types. 
class describes the fold class 
SF is whether or not this applies to superfolds
type is as follows:
pattern_exist -- for all genomes 
pattern_exist_unordered  -- for all genomes just considering the number of genomes
exist_in_a_genome -- whether or not a fold exists in a genome
total_in_a_genome -- accumulates the count of folds in a particular genome


[census home]