Table Name
|
Size (kb),
Format
|
Links
|
Fields
(keys bold)
|
Description
|
tm segs
|
31 k, tab delim.
|
data,
head
|
id_, start_I, stop_n, energy_f
|
Transmembrane segments.
(version 2, revised 971113)
|
tm histo
|
1 k, tab delim.
|
data,
head
|
ntm_I, prots_n
|
Histogram of frequency of transmembrane segments.
|
signal segs
|
3 k, tab delim.
|
data,
head
|
id_, start_I, stop_n
|
Signal sequences.
|
seq MBY pdb MBY lcl MBY tms MBY lnk STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq_MBY_pdb_MBY_lcl_MBY_tms with
the mask linkers to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY pdb MBY lcl MBY tms MBY lnk COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq_MBY_pdb_MBY_lcl_MBY_tms with
the mask linkers to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk.
|
seq MBY pdb MBY lcl MBY tms MBY lnk
|
509 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq_MBY_pdb_MBY_lcl_MBY_tms
with the mask linkers
|
minscop soluble matches no overlap
|
16 k, tab delim.
|
data,
head
|
gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f
|
These are the good matches to an e-value cutoff of .01
for just the soluble proteins, scop classes 1-5,7
This table is the result of filtering out the matches from
minscop_soluble_matches that hit the same sequence on the genome.
|
id ntm
|
19 k, tab delim.
|
data,
head
|
id_, signalp, ntm_n
|
This table contains the number of transmembrane segments for each ORF.
Its definition of TM-segment is after filtering.
It also has signal sequence data, based on simple criteria.
|
fold occurrence
|
4 k, tab delim.
|
data,
head
|
fold_, count
|
Number of times each fold (represented by two scop fid numbers) occurs in genome MJ
This table should be sorted into a standard order.
|
all masks
|
249 k, tab delim.
|
data,
head
|
gid_, start_I, stop_n, tool_, score
|
This file concatenates the results of
creating all the masks for genome MJ.
|
aafreq histo
|
1 k, tab delim.
|
data,
head
|
aa_, freq_n
|
Histogram of frequency of the various amino acids
|
alla segs
|
5 k, tab delim.
|
data,
head
|
id_, start_I, stop_n
|
all-a segments
|
allb segs
|
2 k, tab delim.
|
data,
head
|
id_, start_I, stop_n
|
all-b segments
|
characterized domains
|
28 k, tab delim.
|
data,
head
|
id_, start_I, stop_n
|
Already characterized domains (the borders between
linker regions).
|
fid12 count
|
3 k, tab delim.
|
data,
head
|
fid12_, count_n, foldname
|
top-10 counts
|
full len segs
|
22 k, tab delim.
|
data,
head
|
id_, start_I, stop_n
|
Full length segments.
|
genome v minscop
|
456 k, tab delim.
|
data,
head
|
did_, gid_, TargStart_I, TargStop_n, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f
|
Result of running genome MJ against Ted's minscop (scop 1.35)
|
genome v pdb40 132
|
7 k, tab delim.
|
data,
head
|
did_, gid_, TargStart_i, TargStop_i, QryStart_i, QryStop_i, ev_r, swsc_f
|
Result of running genome MJ agains pdb40d-1.32
The swsc_f column is really ident.
Using an e-value cutoff of .001
|
genome v pdb40d135
|
560 k, tab delim.
|
data,
head
|
did_, gid_, TargStart_I, TargStop_n, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f
|
Result of running genome MJ agains pdb40d-1.35
|
gorss
|
509 k, fasta
|
data,
head
|
|
|
gorss MBY nul
|
509 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking gorss
with the mask full_len_segs
|
gorss MBY nul COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file gorss with
the mask full_len_segs to generate the masked fasta file gorss_MBY_nul.
|
gorss MBY nul STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file gorss with
the mask full_len_segs to generate the masked fasta file gorss_MBY_nul.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
gorss MBY ucd
|
509 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking gorss
with the mask unchar_domains
|
gorss MBY ucd COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file gorss with
the mask unchar_domains to generate the masked fasta file gorss_MBY_ucd.
|
gorss MBY ucd STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file gorss with
the mask unchar_domains to generate the masked fasta file gorss_MBY_ucd.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
id ntm nofilt
|
19 k, tab delim.
|
data,
head
|
id_, signalp, ntm_n
|
This table contains data on whether there is a signal sequence
and the number of transmembrane segments.
(version 2, revised 971113).
(Renamed table on 980101: id_ntm --> id_ntm_nofilt)
|
linkers
|
24 k, tab delim.
|
data,
head
|
id_, start_I, stop_n
|
Linker regions between two other defined segments,
which are less in length than 50
|
low complexity long
|
28 k, tab delim.
|
data,
head
|
id_, start_I, stop_n, cplxity_f
|
Low complexity regions generated with the
following seg command: seg/seg tmp.fa 45 3.4 3.75 -l
|
low complexity short
|
44 k, tab delim.
|
data,
head
|
id_, start_I, stop_n, cplxity_f
|
Low complexity regions generated with the
following seg command: seg/seg tmp.fa 25 3.0 3.3 -l
|
minscop occurrence
|
10 k, tab delim.
|
data,
head
|
did_, count
|
Number of times each minscop domain id (did) occurs in genome MJ
This table should be sorted into a standard order and contain 990 entries.
|
minscop soluble matches
|
17 k, tab delim.
|
data,
head
|
gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f
|
These are the good matches to an e-value cutoff of .01
for just the soluble proteins, scop classes 1-5,7
This is with a year cutoff of 97
|
minscop soluble matches overlap
|
2 k, tab delim.
|
data,
head
|
gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f
|
These are the good matches to an e-value cutoff of .01
for just the soluble proteins, scop classes 1-5,7
This table is the matches from
minscop_soluble_matches that hit the same sequence on the genome.
That is, it contains duplicate matches that should not be used.
|
null mask
|
1 k, tab delim.
|
data,
head
|
|
|
pdb40d135 soluble matches
|
35 k, tab delim.
|
data,
head
|
gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f
|
These are the good matches to an e-value cutoff of .01
for just the soluble proteins, scop classes 1-5,7
|
seq
|
541 k, Hidden
|
data,
head
|
-
|
-
|
seq MBY cdo
|
509 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq
with the mask characterized_domains
|
seq MBY cdo COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq with
the mask characterized_domains to generate the masked fasta file seq_MBY_cdo.
|
seq MBY cdo STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq with
the mask characterized_domains to generate the masked fasta file seq_MBY_cdo.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY lcl
|
509 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq
with the mask low_complexity_long
|
seq MBY lcl COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq with
the mask low_complexity_long to generate the masked fasta file seq_MBY_lcl.
|
seq MBY lcl STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq with
the mask low_complexity_long to generate the masked fasta file seq_MBY_lcl.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY lcs
|
4 k, Bad!
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq
with the mask low_complexity_short
|
seq MBY lcs COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq with
the mask low_complexity_short to generate the masked fasta file seq_MBY_lcs.
|
seq MBY lcs STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq with
the mask low_complexity_short to generate the masked fasta file seq_MBY_lcs.
|
seq MBY lnk
|
509 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq
with the mask linkers
|
seq MBY lnk COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq with
the mask linkers to generate the masked fasta file seq_MBY_lnk.
|
seq MBY lnk STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq with
the mask linkers to generate the masked fasta file seq_MBY_lnk.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY nul
|
509 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq
with the mask full_len_segs
|
seq MBY nul COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq with
the mask full_len_segs to generate the masked fasta file seq_MBY_nul.
|
seq MBY nul STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq with
the mask full_len_segs to generate the masked fasta file seq_MBY_nul.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY pdb
|
509 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq
with the mask minscop_soluble_matches
|
seq MBY pdb COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq with
the mask minscop_soluble_matches to generate the masked fasta file seq_MBY_pdb.
|
seq MBY pdb MBY lcl
|
509 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq_MBY_pdb
with the mask low_complexity_long
|
seq MBY pdb MBY lcl COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq_MBY_pdb with
the mask low_complexity_long to generate the masked fasta file seq_MBY_pdb_MBY_lcl.
|
seq MBY pdb MBY lcl MBY lcs
|
6 k, Bad!
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq_MBY_pdb_MBY_lcl
with the mask low_complexity_long
|
seq MBY pdb MBY lcl MBY lcs COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq_MBY_pdb_MBY_lcl with
the mask low_complexity_long to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_lcs.
|
seq MBY pdb MBY lcl MBY lcs MBY tms
|
4 k, Bad!
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq_MBY_pdb_MBY_lcl_MBY_lcs
with the mask tm_segs
|
seq MBY pdb MBY lcl MBY lcs MBY tms COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq_MBY_pdb_MBY_lcl_MBY_lcs with
the mask tm_segs to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_lcs_MBY_tms.
|
seq MBY pdb MBY lcl MBY lcs MBY tms STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq_MBY_pdb_MBY_lcl_MBY_lcs with
the mask tm_segs to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_lcs_MBY_tms.
|
seq MBY pdb MBY lcl MBY lcs STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq_MBY_pdb_MBY_lcl with
the mask low_complexity_long to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_lcs.
|
seq MBY pdb MBY lcl MBY tms
|
509 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq_MBY_pdb_MBY_lcl
with the mask tm_segs
|
seq MBY pdb MBY lcl MBY tms COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq_MBY_pdb_MBY_lcl with
the mask tm_segs to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms.
|
seq MBY pdb MBY lcl MBY tms MBY lnk MBY alp
|
509 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk
with the mask alla_segs
|
seq MBY pdb MBY lcl MBY tms MBY lnk MBY alp COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk with
the mask alla_segs to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp.
|
seq MBY pdb MBY lcl MBY tms MBY lnk MBY alp MBY bet
|
509 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp
with the mask allb_segs
|
seq MBY pdb MBY lcl MBY tms MBY lnk MBY alp MBY bet COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp with
the mask allb_segs to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp_MBY_bet.
|
seq MBY pdb MBY lcl MBY tms MBY lnk MBY alp MBY bet STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp with
the mask allb_segs to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp_MBY_bet.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY pdb MBY lcl MBY tms MBY lnk MBY alp STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk with
the mask alla_segs to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY pdb MBY lcl MBY tms STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq_MBY_pdb_MBY_lcl with
the mask tm_segs to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY pdb MBY lcl STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq_MBY_pdb with
the mask low_complexity_long to generate the masked fasta file seq_MBY_pdb_MBY_lcl.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY pdb MBY tms
|
1 k, Bad!
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq_MBY_pdb
with the mask tm_segs
|
seq MBY pdb MBY tms COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq_MBY_pdb with
the mask tm_segs to generate the masked fasta file seq_MBY_pdb_MBY_tms.
|
seq MBY pdb MBY tms STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq_MBY_pdb with
the mask tm_segs to generate the masked fasta file seq_MBY_pdb_MBY_tms.
|
seq MBY pdb STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq with
the mask minscop_soluble_matches to generate the masked fasta file seq_MBY_pdb.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY tms
|
509 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq
with the mask tm_segs
|
seq MBY tms COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq with
the mask tm_segs to generate the masked fasta file seq_MBY_tms.
|
seq MBY tms STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq with
the mask tm_segs to generate the masked fasta file seq_MBY_tms.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY ucd
|
509 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq
with the mask unchar_domains
|
seq MBY ucd COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq with
the mask unchar_domains to generate the masked fasta file seq_MBY_ucd.
|
seq MBY ucd STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq with
the mask unchar_domains to generate the masked fasta file seq_MBY_ucd.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq lengths
|
19 k, tab delim.
|
data,
head
|
gid_, length_n
|
Length of each sequence in genome.
|
sfam occurrence
|
8 k, tab delim.
|
data,
head
|
sfam_, count
|
Number of times each sfam (represented by three scop fid numbers) occurs in genome MJ
This table should be sorted into a standard order.
|
tm segs filtered
|
11 k, tab delim.
|
data,
head
|
id_, start_I, stop_n, energy_f
|
Transmembrane segment definitions after removing pdb matches and (most
importantly) low-complexity regions. The tm_segs table is just
the raw data.
This is based on looking at the masked the file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk for the TM
segments (annotated with a 3).
|
tmp genome v pdb40 132
|
4 k, tab delim.
|
data,
head
|
did_, gid_, fid1, fid2
|
temp table
|
unchar domains
|
21 k, tab delim.
|
data,
head
|
id_, start_I, stop_n
|
Linker regions between two other defined segments,
which are greater in length than 50
That is, these are uncharacterized protein domains.
|