Table Name
|
Size (kb),
Format
|
Links
|
Fields
(keys bold)
|
Description
|
tm segs
|
147 k, tab delim.
|
data,
head
|
id_, start_I, stop_n, energy_f
|
Transmembrane segments.
(version 2, revised 971113)
|
tm histo
|
1 k, tab delim.
|
data,
head
|
ntm_I, prots_n
|
Histogram of frequency of transmembrane segments.
|
signal segs
|
14 k, tab delim.
|
data,
head
|
id_, start_I, stop_n
|
Signal sequences.
|
seq MBY pdb MBY lcl MBY tms MBY lnk STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq_MBY_pdb_MBY_lcl_MBY_tms with
the mask linkers to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY pdb MBY lcl MBY tms MBY lnk COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq_MBY_pdb_MBY_lcl_MBY_tms with
the mask linkers to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk.
|
seq MBY pdb MBY lcl MBY tms MBY lnk
|
1385 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq_MBY_pdb_MBY_lcl_MBY_tms
with the mask linkers
|
minscop soluble matches no overlap
|
57 k, tab delim.
|
data,
head
|
gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f
|
These are the good matches to an e-value cutoff of .01
for just the soluble proteins, scop classes 1-5,7
This table is the result of filtering out the matches from
minscop_soluble_matches that hit the same sequence on the genome.
|
id ntm
|
47 k, tab delim.
|
data,
head
|
id_, signalp, ntm_n
|
This table contains the number of transmembrane segments for each ORF.
Its definition of TM-segment is after filtering.
It also has signal sequence data, based on simple criteria.
|
fold occurrence
|
4 k, tab delim.
|
data,
head
|
fold_, count
|
Number of times each fold (represented by two scop fid numbers) occurs in genome EC
This table should be sorted into a standard order.
|
all masks
|
798 k, tab delim.
|
data,
head
|
gid_, start_I, stop_n, tool_, score
|
This file concatenates the results of
creating all the masks for genome EC.
|
aa freq histo
|
1 k, tab delim.
|
data,
head
|
aa_, freq_n
|
Histogram of frequency of the various amino acids
|
alla segs
|
19 k, tab delim.
|
data,
head
|
id_, start_I, stop_n
|
all-a segments
|
allb segs
|
2 k, tab delim.
|
data,
head
|
id_, start_I, stop_n
|
all-b segments
|
characterized domains
|
74 k, tab delim.
|
data,
head
|
id_, start_I, stop_n
|
Already characterized domains (the borders between
linker regions).
|
full len segs
|
55 k, tab delim.
|
data,
head
|
id_, start_I, stop_n
|
Full length segments.
|
genome v minscop
|
488 k, tab delim.
|
data,
head
|
did_, gid_, TargStart_I, TargStop_n, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f
|
Result of running genome EC against Ted's minscop (scop 1.35)
|
gorss
|
1383 k, fasta
|
data,
head
|
gid_, gorss
|
This fasta file is the result of running GOR sec. struc. prediction
on the genome EC
|
gorss MBY nul
|
1383 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking gorss
with the mask full_len_segs
|
gorss MBY nul COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file gorss with
the mask full_len_segs to generate the masked fasta file gorss_MBY_nul.
|
gorss MBY nul STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file gorss with
the mask full_len_segs to generate the masked fasta file gorss_MBY_nul.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
gorss MBY ucd
|
1383 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking gorss
with the mask unchar_domains
|
gorss MBY ucd COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file gorss with
the mask unchar_domains to generate the masked fasta file gorss_MBY_ucd.
|
gorss MBY ucd STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file gorss with
the mask unchar_domains to generate the masked fasta file gorss_MBY_ucd.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
id ntm nofilt
|
47 k, tab delim.
|
data,
head
|
id_, signalp, ntm_n
|
This table contains data on whether there is a signal sequence
and the number of transmembrane segments.
(version 2, revised 971113).
(Renamed table on 980101: id_ntm --> id_ntm_nofilt)
|
linkers
|
64 k, tab delim.
|
data,
head
|
id_, start_I, stop_n
|
Linker regions between two other defined segments,
which are less in length than 50
|
low complexity long
|
41 k, tab delim.
|
data,
head
|
id_, start_I, stop_n, cplxity_f
|
Low complexity regions generated with the
following seg command: seg tmp.fa 45 3.4 3.75 -l
|
low complexity short
|
74 k, tab delim.
|
data,
head
|
id_, start_I, stop_n, cplxity_f
|
Low complexity regions generated with the
following seg command: seg tmp.fa 25 3.0 3.3 -l
|
minscop occurrence
|
10 k, tab delim.
|
data,
head
|
did_, count
|
Number of times each minscop domain id (did) occurs in genome EC
This table should be sorted into a standard order and contain 990 entries.
|
minscop soluble matches
|
61 k, tab delim.
|
data,
head
|
gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f
|
These are the good matches to an e-value cutoff of .01
for just the soluble proteins, scop classes 1-5,7
This is with a year cutoff of 97
|
minscop soluble matches overlap
|
4 k, tab delim.
|
data,
head
|
gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f
|
These are the good matches to an e-value cutoff of .01
for just the soluble proteins, scop classes 1-5,7
This table is the matches from
minscop_soluble_matches that hit the same sequence on the genome.
That is, it contains duplicate matches that should not be used.
|
seq
|
1379 k, Hidden
|
data,
head
|
-
|
-
|
seq MBY cdo
|
1385 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq
with the mask characterized_domains
|
seq MBY cdo COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq with
the mask characterized_domains to generate the masked fasta file seq_MBY_cdo.
|
seq MBY cdo STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq with
the mask characterized_domains to generate the masked fasta file seq_MBY_cdo.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY lcl
|
1385 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq
with the mask low_complexity_long
|
seq MBY lcl COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq with
the mask low_complexity_long to generate the masked fasta file seq_MBY_lcl.
|
seq MBY lcl STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq with
the mask low_complexity_long to generate the masked fasta file seq_MBY_lcl.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY lnk
|
1385 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq
with the mask linkers
|
seq MBY lnk COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq with
the mask linkers to generate the masked fasta file seq_MBY_lnk.
|
seq MBY lnk STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq with
the mask linkers to generate the masked fasta file seq_MBY_lnk.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY nul
|
1385 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq
with the mask full_len_segs
|
seq MBY nul COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq with
the mask full_len_segs to generate the masked fasta file seq_MBY_nul.
|
seq MBY nul STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq with
the mask full_len_segs to generate the masked fasta file seq_MBY_nul.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY pdb
|
1385 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq
with the mask minscop_soluble_matches
|
seq MBY pdb COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq with
the mask minscop_soluble_matches to generate the masked fasta file seq_MBY_pdb.
|
seq MBY pdb MBY lcl
|
1385 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq_MBY_pdb
with the mask low_complexity_long
|
seq MBY pdb MBY lcl COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq_MBY_pdb with
the mask low_complexity_long to generate the masked fasta file seq_MBY_pdb_MBY_lcl.
|
seq MBY pdb MBY lcl MBY tms
|
1385 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq_MBY_pdb_MBY_lcl
with the mask tm_segs
|
seq MBY pdb MBY lcl MBY tms COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq_MBY_pdb_MBY_lcl with
the mask tm_segs to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms.
|
seq MBY pdb MBY lcl MBY tms MBY lnk MBY alp
|
1385 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk
with the mask alla_segs
|
seq MBY pdb MBY lcl MBY tms MBY lnk MBY alp COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk with
the mask alla_segs to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp.
|
seq MBY pdb MBY lcl MBY tms MBY lnk MBY alp MBY bet
|
1385 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp
with the mask allb_segs
|
seq MBY pdb MBY lcl MBY tms MBY lnk MBY alp MBY bet COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp with
the mask allb_segs to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp_MBY_bet.
|
seq MBY pdb MBY lcl MBY tms MBY lnk MBY alp MBY bet STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp with
the mask allb_segs to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp_MBY_bet.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY pdb MBY lcl MBY tms MBY lnk MBY alp STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk with
the mask alla_segs to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY pdb MBY lcl MBY tms STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq_MBY_pdb_MBY_lcl with
the mask tm_segs to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY pdb MBY lcl STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq_MBY_pdb with
the mask low_complexity_long to generate the masked fasta file seq_MBY_pdb_MBY_lcl.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY pdb STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq with
the mask minscop_soluble_matches to generate the masked fasta file seq_MBY_pdb.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY tms
|
1385 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq
with the mask tm_segs
|
seq MBY tms COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq with
the mask tm_segs to generate the masked fasta file seq_MBY_tms.
|
seq MBY tms STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq with
the mask tm_segs to generate the masked fasta file seq_MBY_tms.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY ucd
|
1385 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq
with the mask unchar_domains
|
seq MBY ucd COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq with
the mask unchar_domains to generate the masked fasta file seq_MBY_ucd.
|
seq MBY ucd STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq with
the mask unchar_domains to generate the masked fasta file seq_MBY_ucd.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq lengths
|
46 k, tab delim.
|
data,
head
|
gid_, length_n
|
Length of each sequence in genome.
|
sfam occurrence
|
8 k, tab delim.
|
data,
head
|
sfam_, count
|
Number of times each sfam (represented by three scop fid numbers) occurs in genome EC
This table should be sorted into a standard order.
|
ss freq histo
|
1 k, tab delim.
|
data,
head
|
aa_, freq_n
|
Histogram of frequency of the various amino acids
|
tm segs filtered
|
72 k, tab delim.
|
data,
head
|
id_, start_I, stop_n, energy_f
|
Transmembrane segment definitions after removing pdb matches and (most
importantly) low-complexity regions. The tm_segs table is just
the raw data.
This is based on looking at the masked the file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk for the TM
segments (annotated with a 3).
|
unchar domains
|
56 k, tab delim.
|
data,
head
|
id_, start_I, stop_n
|
Linker regions between two other defined segments,
which are greater in length than 50
That is, these are uncharacterized protein domains.
|