For Genome `CE`, Tables with Specific Analysis

Table Name	Size (kb), Format	Links	Fields (keys bold)	Description
fold occurrence	5 k, tab delim.	data, head	fold_, count	Number of times each fold (represented by two scop fid numbers) occurs in genome CE This table should be sorted into a standard order. Using no_overlap (minscop_soluble_matches_no_overlap) for CE, 1998,12.20
fold occurrence ceonly	5 k, tab delim.	data, head	fold_, count	Number of times each fold (represented by two scop fid numbers) occurs in genome CE Using no_overlap (minscop_soluble_matches_no_overlap) for CE, 1998,12.20The table should be sorted into a standard order, with the minscop one containing 990 entries.The current restriction is [_ceonly]
full len segs	275 k, tab delim.	data, head	id_, start_I, stop_n	Full length segments.
genome v minscop	707 k, tab delim.	data, head	did_, gid_, TargStart_I, TargStop_n, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f	Result of running genome CE against Ted's minscop (scop 1.35)
id ntm nofilt	239 k, tab delim.	data, head	id_, signalp, ntm_n	This table contains data on whether there is a signal sequence and the number of transmembrane segments. (version 2, revised 971113). (Renamed table on 980101: id_ntm --> id_ntm_nofilt)
minscop occurrence	10 k, tab delim.	data, head	did_, count	Number of times each minscop domain id (did) occurs in genome CE This table should be sorted into a standard order and contain 990 entries. Using no_overlap (minscop_soluble_matches_no_overlap) for CE, 1998,12.20
minscop occurrence ceonly	10 k, tab delim.	data, head	did_, count	Number of times each minscop domain id (did) occurs in genome CE Using no_overlap (minscop_soluble_matches_no_overlap) for CE, 1998,12.20The table should be sorted into a standard order, with the minscop one containing 990 entries.The current restriction is [_ceonly]
minscop soluble matches	221 k, tab delim.	data, head	gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f	These are the good matches to an e-value cutoff of .01 for just the soluble proteins, scop classes 1-5,7 This is with a year cutoff of 97 good_scop_matches_to_mask_w_yr() running on genome CE... ...with year cutoff of 97 and table genome_v_minscop
minscop soluble matches no overlap	204 k, tab delim.	data, head	gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f	These are the good matches to an e-value cutoff of .01 for just the soluble proteins, scop classes 1-5,7 This table is the result of filtering out the matches from minscop_soluble_matches that hit the same sequence on the genome.
minscop soluble matches overlap	18 k, tab delim.	data, head	gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f	These are the good matches to an e-value cutoff of .01 for just the soluble proteins, scop classes 1-5,7 This table is the matches from minscop_soluble_matches that hit the same sequence on the genome. That is, it contains duplicate matches that should not be used.
seq	8256 k, fasta	data, head
seq MBY pdb	8230 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking seq with the mask minscop_soluble_matches
seq MBY pdb COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file seq with the mask minscop_soluble_matches to generate the masked fasta file seq_MBY_pdb.
seq MBY pdb STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file seq with the mask minscop_soluble_matches to generate the masked fasta file seq_MBY_pdb. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
sfam occurrence	8 k, tab delim.	data, head	sfam_, count	Number of times each sfam (represented by three scop fid numbers) occurs in genome CE This table should be sorted into a standard order. Using no_overlap (minscop_soluble_matches_no_overlap) for CE, 1998,12.20
sfam occurrence ceonly	8 k, tab delim.	data, head	sfam_, count	Number of times each sfam (represented by three scop fid numbers) occurs in genome CE Using no_overlap (minscop_soluble_matches_no_overlap) for CE, 1998,12.20The table should be sorted into a standard order, with the minscop one containing 990 entries.The current restriction is [_ceonly]
signal segs	50 k, tab delim.	data, head	id_, start_I, stop_n	Signal sequences.
tm scores	655 k, tab delim.	data, head	id_, sumscr, sig, minhall, ntmproc, totaa, avg_en, minhseg	This table contains scores determining whether to what degree the sequences is an integral membrane protein. sig = does it have a signal sequence? sumscor = overall evaluation score (see below) minhall = min hydrophobicity value for 20 res. window moved over whole prot. totaa = total number of aa under -1 threshold ntmproc = tot num of TM helices after processing avg_en = average hydrophobicity of all the TM segments (per residue) minhseg = min hydrophobicity of a TM segments (per residue) # # These parameters were refined on MG # see genomes/mg-analyze-maxh-981127.xls # my = (minhall<-2 ? 4 : (tot_aa > 50 ? 3 : ( minhall <-1.75 ? 2 : ( tot_aa > 20 ? 1 : 0))));
tm segs	774 k, tab delim.	data, head	id_, start_I, stop_n, sumscor, energy_f	Transmembrane segments. (version 2, revised 971113) (version 3, revised 981127, now sumscor based on calc_istm_score) sumscor gives a confidence value in the TM helix based on an analysis of the TM helices in the WHOLE protein. # # These parameters were refined on MG # see genomes/mg-analyze-maxh-981127.xls # my = (minhall<-2 ? 4 : (tot_aa > 50 ? 3 : ( minhall <-1.75 ? 2 : ( tot_aa > 20 ? 1 : 0))));
tm segs best	505 k, tab delim.	data, head	id_, start_I, stop_n, sumscor, energy_f	This is the segments from TABLE tm_segs that have a sumscor = 4.
worm only p10	132 k, tab delim.	data, head

[census home]

For Genome CE, Tables with Specific Analysis

For Genome `CE`, Tables with Specific Analysis