Table Name
|
Size (kb),
Format
|
Links
|
Fields
(keys bold)
|
Description
|
fold occurrence
|
5 k, tab delim.
|
data,
head
|
fold_, count
|
Number of times each fold (represented by two scop fid numbers) occurs in genome CE
This table should be sorted into a standard order.
Using no_overlap (minscop_soluble_matches_no_overlap) for CE, 1998,12.20
|
fold occurrence ceonly
|
5 k, tab delim.
|
data,
head
|
fold_, count
|
Number of times each fold (represented by two scop fid numbers) occurs in genome CE
Using no_overlap (minscop_soluble_matches_no_overlap) for CE, 1998,12.20The table should be sorted into a standard order, with the minscop one containing 990 entries.The current restriction is [_ceonly]
|
full len segs
|
275 k, tab delim.
|
data,
head
|
id_, start_I, stop_n
|
Full length segments.
|
genome v minscop
|
707 k, tab delim.
|
data,
head
|
did_, gid_, TargStart_I, TargStop_n, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f
|
Result of running genome CE against Ted's minscop (scop 1.35)
|
id ntm nofilt
|
239 k, tab delim.
|
data,
head
|
id_, signalp, ntm_n
|
This table contains data on whether there is a signal sequence
and the number of transmembrane segments.
(version 2, revised 971113).
(Renamed table on 980101: id_ntm --> id_ntm_nofilt)
|
minscop occurrence
|
10 k, tab delim.
|
data,
head
|
did_, count
|
Number of times each minscop domain id (did) occurs in genome CE
This table should be sorted into a standard order and contain 990 entries.
Using no_overlap (minscop_soluble_matches_no_overlap) for CE, 1998,12.20
|
minscop occurrence ceonly
|
10 k, tab delim.
|
data,
head
|
did_, count
|
Number of times each minscop domain id (did) occurs in genome CE
Using no_overlap (minscop_soluble_matches_no_overlap) for CE, 1998,12.20The table should be sorted into a standard order, with the minscop one containing 990 entries.The current restriction is [_ceonly]
|
minscop soluble matches
|
221 k, tab delim.
|
data,
head
|
gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f
|
These are the good matches to an e-value cutoff of .01
for just the soluble proteins, scop classes 1-5,7
This is with a year cutoff of 97
good_scop_matches_to_mask_w_yr() running on genome CE...
...with year cutoff of 97 and table genome_v_minscop
|
minscop soluble matches no overlap
|
204 k, tab delim.
|
data,
head
|
gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f
|
These are the good matches to an e-value cutoff of .01
for just the soluble proteins, scop classes 1-5,7
This table is the result of filtering out the matches from
minscop_soluble_matches that hit the same sequence on the genome.
|
minscop soluble matches overlap
|
18 k, tab delim.
|
data,
head
|
gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f
|
These are the good matches to an e-value cutoff of .01
for just the soluble proteins, scop classes 1-5,7
This table is the matches from
minscop_soluble_matches that hit the same sequence on the genome.
That is, it contains duplicate matches that should not be used.
|
seq
|
8256 k, fasta
|
data,
head
|
|
|
seq MBY pdb
|
8230 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq
with the mask minscop_soluble_matches
|
seq MBY pdb COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq with
the mask minscop_soluble_matches to generate the masked fasta file seq_MBY_pdb.
|
seq MBY pdb STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq with
the mask minscop_soluble_matches to generate the masked fasta file seq_MBY_pdb.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
sfam occurrence
|
8 k, tab delim.
|
data,
head
|
sfam_, count
|
Number of times each sfam (represented by three scop fid numbers) occurs in genome CE
This table should be sorted into a standard order.
Using no_overlap (minscop_soluble_matches_no_overlap) for CE, 1998,12.20
|
sfam occurrence ceonly
|
8 k, tab delim.
|
data,
head
|
sfam_, count
|
Number of times each sfam (represented by three scop fid numbers) occurs in genome CE
Using no_overlap (minscop_soluble_matches_no_overlap) for CE, 1998,12.20The table should be sorted into a standard order, with the minscop one containing 990 entries.The current restriction is [_ceonly]
|
signal segs
|
50 k, tab delim.
|
data,
head
|
id_, start_I, stop_n
|
Signal sequences.
|
tm scores
|
655 k, tab delim.
|
data,
head
|
id_, sumscr, sig, minhall, ntmproc, totaa, avg_en, minhseg
|
This table contains scores determining whether to what
degree the sequences is an integral membrane protein.
sig = does it have a signal sequence?
sumscor = overall evaluation score (see below)
minhall = min hydrophobicity value for 20 res. window moved over whole prot.
totaa = total number of aa under -1 threshold
ntmproc = tot num of TM helices after processing
avg_en = average hydrophobicity of all the TM segments (per residue)
minhseg = min hydrophobicity of a TM segments (per residue)
#
# These parameters were refined on MG
# see genomes/mg-analyze-maxh-981127.xls
#
my = (minhall<-2 ? 4 :
(tot_aa > 50 ? 3 :
( minhall <-1.75 ? 2 :
( tot_aa > 20 ? 1 : 0))));
|
tm segs
|
774 k, tab delim.
|
data,
head
|
id_, start_I, stop_n, sumscor, energy_f
|
Transmembrane segments.
(version 2, revised 971113)
(version 3, revised 981127, now sumscor based on calc_istm_score)
sumscor gives a confidence value in the TM helix based on an analysis
of the TM helices in the WHOLE protein.
#
# These parameters were refined on MG
# see genomes/mg-analyze-maxh-981127.xls
#
my = (minhall<-2 ? 4 :
(tot_aa > 50 ? 3 :
( minhall <-1.75 ? 2 :
( tot_aa > 20 ? 1 : 0))));
|
tm segs best
|
505 k, tab delim.
|
data,
head
|
id_, start_I, stop_n, sumscor, energy_f
|
This is the segments from TABLE tm_segs
that have a sumscor = 4.
|
worm only p10
|
132 k, tab delim.
|
data,
head
|
|
|