For Genome CE, Tables with Specific Analysis

Table Name Size (kb), Format Links Fields (keys bold) Description
fold occurrence 5 k, tab delim. data, head fold_, count

Number of times each fold (represented by two scop fid numbers) occurs in genome CE
This table should be sorted into a standard order.
Using no_overlap (minscop_soluble_matches_no_overlap) for CE, 1998,12.20


fold occurrence ceonly 5 k, tab delim. data, head fold_, count

Number of times each fold (represented by two scop fid numbers) occurs in genome CE
Using no_overlap (minscop_soluble_matches_no_overlap) for CE, 1998,12.20The table should be sorted into a standard order, with the minscop one containing 990 entries.The current restriction is [_ceonly]


full len segs 275 k, tab delim. data, head id_, start_I, stop_n

Full length segments.


genome v minscop 707 k, tab delim. data, head did_, gid_, TargStart_I, TargStop_n, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f

Result of running genome CE against Ted's minscop (scop 1.35)


id ntm nofilt 239 k, tab delim. data, head id_, signalp, ntm_n

  This table contains data on whether there is a signal sequence
  and the number of transmembrane segments.
  (version 2, revised 971113).
  (Renamed table on 980101: id_ntm --> id_ntm_nofilt)


minscop occurrence 10 k, tab delim. data, head did_, count

Number of times each minscop domain id (did) occurs in genome CE
This table should be sorted into a standard order and contain 990 entries. 
Using no_overlap (minscop_soluble_matches_no_overlap) for CE, 1998,12.20


minscop occurrence ceonly 10 k, tab delim. data, head did_, count

Number of times each minscop domain id (did) occurs in genome CE
Using no_overlap (minscop_soluble_matches_no_overlap) for CE, 1998,12.20The table should be sorted into a standard order, with the minscop one containing 990 entries.The current restriction is [_ceonly]


minscop soluble matches 221 k, tab delim. data, head gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f

These are the good matches to an e-value cutoff of .01 
for just the soluble proteins, scop classes 1-5,7
This is with a year cutoff of 97
good_scop_matches_to_mask_w_yr() running on genome CE...
...with year cutoff of 97 and table genome_v_minscop 


minscop soluble matches no overlap 204 k, tab delim. data, head gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f

These are the good matches to an e-value cutoff of .01 
for just the soluble proteins, scop classes 1-5,7
This table is the result of filtering out the matches from 
minscop_soluble_matches that hit the same sequence on the genome. 


minscop soluble matches overlap 18 k, tab delim. data, head gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f

These are the good matches to an e-value cutoff of .01 
for just the soluble proteins, scop classes 1-5,7
This table is the matches from 
minscop_soluble_matches that hit the same sequence on the genome. 
That is, it contains duplicate matches that should not be used. 


seq 8256 k, fasta data, head



seq MBY pdb 8230 k, fasta data, head gid_, masked_seq

This fasta file is the result of masking seq 
with the mask minscop_soluble_matches


seq MBY pdb COMP 1 k, tab delim. data, head aa_, count_n

This is the aa composition of the 
masked file from masking the fasta file seq with
the mask minscop_soluble_matches to generate the masked fasta file seq_MBY_pdb.


seq MBY pdb STAT 1 k, tab delim. data, head stat_, value

This are the statistics from masking the fasta file seq with
the mask minscop_soluble_matches to generate the masked fasta file seq_MBY_pdb.
MASKED_CHARS  = number of characters masked with the application of this mask.
Masked_Seqs   = number of sequences masked with the application of this mask.
Masking_Segs  = number of segments used in the application of the mask


sfam occurrence 8 k, tab delim. data, head sfam_, count

Number of times each sfam (represented by three scop fid numbers) occurs in genome CE
This table should be sorted into a standard order.
Using no_overlap (minscop_soluble_matches_no_overlap) for CE, 1998,12.20


sfam occurrence ceonly 8 k, tab delim. data, head sfam_, count

Number of times each sfam (represented by three scop fid numbers) occurs in genome CE
Using no_overlap (minscop_soluble_matches_no_overlap) for CE, 1998,12.20The table should be sorted into a standard order, with the minscop one containing 990 entries.The current restriction is [_ceonly]


signal segs 50 k, tab delim. data, head id_, start_I, stop_n

  Signal sequences.


tm scores 655 k, tab delim. data, head id_, sumscr, sig, minhall, ntmproc, totaa, avg_en, minhseg

  This table contains scores determining whether to what
  degree the sequences is an integral membrane protein.   
   sig = does it have a signal sequence?
   sumscor = overall evaluation score (see below)
   minhall = min hydrophobicity value for 20 res. window moved over whole prot.
   totaa  = total number of aa under -1 threshold
   ntmproc = tot num of TM helices after processing
   avg_en = average hydrophobicity of all the TM segments (per residue)
   minhseg = min hydrophobicity of a TM segments (per residue)
   #
   # These parameters were refined on MG
   # see genomes/mg-analyze-maxh-981127.xls
   # 
   my  = (minhall<-2 ? 4 :
		 (tot_aa > 50 ? 3 : 
		  ( minhall <-1.75 ? 2 :
		    ( tot_aa > 20 ? 1 : 0))));


tm segs 774 k, tab delim. data, head id_, start_I, stop_n, sumscor, energy_f

Transmembrane segments. 
(version 2, revised 971113)
(version 3, revised 981127, now sumscor based on calc_istm_score)
sumscor gives a confidence value in the TM helix based on an analysis
of the TM helices in the WHOLE protein.
   #
   # These parameters were refined on MG
   # see genomes/mg-analyze-maxh-981127.xls
   # 
   my  = (minhall<-2 ? 4 :
		 (tot_aa > 50 ? 3 : 
		  ( minhall <-1.75 ? 2 :
		    ( tot_aa > 20 ? 1 : 0))));


tm segs best 505 k, tab delim. data, head id_, start_I, stop_n, sumscor, energy_f

This is the segments from TABLE tm_segs 
that have a sumscor = 4.


worm only p10 132 k, tab delim. data, head



[census home]