For Genome EC, Tables with Specific Analysis

Table Name Size (kb), Format Links Fields (keys bold) Description
tm segs 147 k, tab delim. data, head id_, start_I, stop_n, energy_f

Transmembrane segments. 
(version 2, revised 971113)


tm histo 1 k, tab delim. data, head ntm_I, prots_n

Histogram of frequency of transmembrane segments. 


signal segs 14 k, tab delim. data, head id_, start_I, stop_n

Signal sequences.


seq MBY pdb MBY lcl MBY tms MBY lnk STAT 1 k, tab delim. data, head stat_, value

This are the statistics from masking the fasta file seq_MBY_pdb_MBY_lcl_MBY_tms with
the mask linkers to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk.
MASKED_CHARS  = number of characters masked with the application of this mask.
Masked_Seqs   = number of sequences masked with the application of this mask.
Masking_Segs  = number of segments used in the application of the mask


seq MBY pdb MBY lcl MBY tms MBY lnk COMP 1 k, tab delim. data, head aa_, count_n

This is the aa composition of the 
masked file from masking the fasta file seq_MBY_pdb_MBY_lcl_MBY_tms with
the mask linkers to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk.


seq MBY pdb MBY lcl MBY tms MBY lnk 1385 k, fasta data, head gid_, masked_seq

This fasta file is the result of masking seq_MBY_pdb_MBY_lcl_MBY_tms 
with the mask linkers


minscop soluble matches no overlap 57 k, tab delim. data, head gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f

These are the good matches to an e-value cutoff of .01 
for just the soluble proteins, scop classes 1-5,7
This table is the result of filtering out the matches from 
minscop_soluble_matches that hit the same sequence on the genome. 


id ntm 47 k, tab delim. data, head id_, signalp, ntm_n

This table contains the number of transmembrane segments for each ORF.
Its definition of TM-segment is after filtering. 
It also has signal sequence data, based on simple criteria. 


fold occurrence 4 k, tab delim. data, head fold_, count

Number of times each fold (represented by two scop fid numbers) occurs in genome EC
This table should be sorted into a standard order.


all masks 798 k, tab delim. data, head gid_, start_I, stop_n, tool_, score

This file concatenates the results of 
creating all the masks for genome EC. 


aa freq histo 1 k, tab delim. data, head aa_, freq_n

Histogram of frequency of the various amino acids


alla segs 19 k, tab delim. data, head id_, start_I, stop_n

all-a segments


allb segs 2 k, tab delim. data, head id_, start_I, stop_n

all-b segments


characterized domains 74 k, tab delim. data, head id_, start_I, stop_n

Already characterized domains (the borders between
linker regions).


full len segs 55 k, tab delim. data, head id_, start_I, stop_n

Full length segments.


genome v minscop 488 k, tab delim. data, head did_, gid_, TargStart_I, TargStop_n, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f

Result of running genome EC against Ted's minscop (scop 1.35)


gorss 1383 k, fasta data, head gid_, gorss

This fasta file is the result of running GOR sec. struc. prediction
on the genome EC 


gorss MBY nul 1383 k, fasta data, head gid_, masked_seq

This fasta file is the result of masking gorss 
with the mask full_len_segs


gorss MBY nul COMP 1 k, tab delim. data, head aa_, count_n

This is the aa composition of the 
masked file from masking the fasta file gorss with
the mask full_len_segs to generate the masked fasta file gorss_MBY_nul.


gorss MBY nul STAT 1 k, tab delim. data, head stat_, value

This are the statistics from masking the fasta file gorss with
the mask full_len_segs to generate the masked fasta file gorss_MBY_nul.
MASKED_CHARS  = number of characters masked with the application of this mask.
Masked_Seqs   = number of sequences masked with the application of this mask.
Masking_Segs  = number of segments used in the application of the mask


gorss MBY ucd 1383 k, fasta data, head gid_, masked_seq

This fasta file is the result of masking gorss 
with the mask unchar_domains


gorss MBY ucd COMP 1 k, tab delim. data, head aa_, count_n

This is the aa composition of the 
masked file from masking the fasta file gorss with
the mask unchar_domains to generate the masked fasta file gorss_MBY_ucd.


gorss MBY ucd STAT 1 k, tab delim. data, head stat_, value

This are the statistics from masking the fasta file gorss with
the mask unchar_domains to generate the masked fasta file gorss_MBY_ucd.
MASKED_CHARS  = number of characters masked with the application of this mask.
Masked_Seqs   = number of sequences masked with the application of this mask.
Masking_Segs  = number of segments used in the application of the mask


id ntm nofilt 47 k, tab delim. data, head id_, signalp, ntm_n

This table contains data on whether there is a signal sequence
and the number of transmembrane segments.
(version 2, revised 971113).
(Renamed table on 980101: id_ntm --> id_ntm_nofilt)


linkers 64 k, tab delim. data, head id_, start_I, stop_n

Linker regions between two other defined segments, 
which are less in length than 50 


low complexity long 41 k, tab delim. data, head id_, start_I, stop_n, cplxity_f

Low complexity regions generated with the
following seg command: seg tmp.fa 45 3.4 3.75 -l


low complexity short 74 k, tab delim. data, head id_, start_I, stop_n, cplxity_f

Low complexity regions generated with the
following seg command: seg tmp.fa 25 3.0 3.3 -l


minscop occurrence 10 k, tab delim. data, head did_, count

Number of times each minscop domain id (did) occurs in genome EC
This table should be sorted into a standard order and contain 990 entries. 


minscop soluble matches 61 k, tab delim. data, head gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f

These are the good matches to an e-value cutoff of .01 
for just the soluble proteins, scop classes 1-5,7
This is with a year cutoff of 97


minscop soluble matches overlap 4 k, tab delim. data, head gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f

These are the good matches to an e-value cutoff of .01 
for just the soluble proteins, scop classes 1-5,7
This table is the matches from 
minscop_soluble_matches that hit the same sequence on the genome. 
That is, it contains duplicate matches that should not be used. 


seq 1379 k, Hidden data, head -

-

seq MBY cdo 1385 k, fasta data, head gid_, masked_seq

This fasta file is the result of masking seq 
with the mask characterized_domains


seq MBY cdo COMP 1 k, tab delim. data, head aa_, count_n

This is the aa composition of the 
masked file from masking the fasta file seq with
the mask characterized_domains to generate the masked fasta file seq_MBY_cdo.


seq MBY cdo STAT 1 k, tab delim. data, head stat_, value

This are the statistics from masking the fasta file seq with
the mask characterized_domains to generate the masked fasta file seq_MBY_cdo.
MASKED_CHARS  = number of characters masked with the application of this mask.
Masked_Seqs   = number of sequences masked with the application of this mask.
Masking_Segs  = number of segments used in the application of the mask


seq MBY lcl 1385 k, fasta data, head gid_, masked_seq

This fasta file is the result of masking seq 
with the mask low_complexity_long


seq MBY lcl COMP 1 k, tab delim. data, head aa_, count_n

This is the aa composition of the 
masked file from masking the fasta file seq with
the mask low_complexity_long to generate the masked fasta file seq_MBY_lcl.


seq MBY lcl STAT 1 k, tab delim. data, head stat_, value

This are the statistics from masking the fasta file seq with
the mask low_complexity_long to generate the masked fasta file seq_MBY_lcl.
MASKED_CHARS  = number of characters masked with the application of this mask.
Masked_Seqs   = number of sequences masked with the application of this mask.
Masking_Segs  = number of segments used in the application of the mask


seq MBY lnk 1385 k, fasta data, head gid_, masked_seq

This fasta file is the result of masking seq 
with the mask linkers


seq MBY lnk COMP 1 k, tab delim. data, head aa_, count_n

This is the aa composition of the 
masked file from masking the fasta file seq with
the mask linkers to generate the masked fasta file seq_MBY_lnk.


seq MBY lnk STAT 1 k, tab delim. data, head stat_, value

This are the statistics from masking the fasta file seq with
the mask linkers to generate the masked fasta file seq_MBY_lnk.
MASKED_CHARS  = number of characters masked with the application of this mask.
Masked_Seqs   = number of sequences masked with the application of this mask.
Masking_Segs  = number of segments used in the application of the mask


seq MBY nul 1385 k, fasta data, head gid_, masked_seq

This fasta file is the result of masking seq 
with the mask full_len_segs


seq MBY nul COMP 1 k, tab delim. data, head aa_, count_n

This is the aa composition of the 
masked file from masking the fasta file seq with
the mask full_len_segs to generate the masked fasta file seq_MBY_nul.


seq MBY nul STAT 1 k, tab delim. data, head stat_, value

This are the statistics from masking the fasta file seq with
the mask full_len_segs to generate the masked fasta file seq_MBY_nul.
MASKED_CHARS  = number of characters masked with the application of this mask.
Masked_Seqs   = number of sequences masked with the application of this mask.
Masking_Segs  = number of segments used in the application of the mask


seq MBY pdb 1385 k, fasta data, head gid_, masked_seq

This fasta file is the result of masking seq 
with the mask minscop_soluble_matches


seq MBY pdb COMP 1 k, tab delim. data, head aa_, count_n

This is the aa composition of the 
masked file from masking the fasta file seq with
the mask minscop_soluble_matches to generate the masked fasta file seq_MBY_pdb.


seq MBY pdb MBY lcl 1385 k, fasta data, head gid_, masked_seq

This fasta file is the result of masking seq_MBY_pdb 
with the mask low_complexity_long


seq MBY pdb MBY lcl COMP 1 k, tab delim. data, head aa_, count_n

This is the aa composition of the 
masked file from masking the fasta file seq_MBY_pdb with
the mask low_complexity_long to generate the masked fasta file seq_MBY_pdb_MBY_lcl.


seq MBY pdb MBY lcl MBY tms 1385 k, fasta data, head gid_, masked_seq

This fasta file is the result of masking seq_MBY_pdb_MBY_lcl 
with the mask tm_segs


seq MBY pdb MBY lcl MBY tms COMP 1 k, tab delim. data, head aa_, count_n

This is the aa composition of the 
masked file from masking the fasta file seq_MBY_pdb_MBY_lcl with
the mask tm_segs to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms.


seq MBY pdb MBY lcl MBY tms MBY lnk MBY alp 1385 k, fasta data, head gid_, masked_seq

This fasta file is the result of masking seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk 
with the mask alla_segs


seq MBY pdb MBY lcl MBY tms MBY lnk MBY alp COMP 1 k, tab delim. data, head aa_, count_n

This is the aa composition of the 
masked file from masking the fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk with
the mask alla_segs to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp.


seq MBY pdb MBY lcl MBY tms MBY lnk MBY alp MBY bet 1385 k, fasta data, head gid_, masked_seq

This fasta file is the result of masking seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp 
with the mask allb_segs


seq MBY pdb MBY lcl MBY tms MBY lnk MBY alp MBY bet COMP 1 k, tab delim. data, head aa_, count_n

This is the aa composition of the 
masked file from masking the fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp with
the mask allb_segs to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp_MBY_bet.


seq MBY pdb MBY lcl MBY tms MBY lnk MBY alp MBY bet STAT 1 k, tab delim. data, head stat_, value

This are the statistics from masking the fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp with
the mask allb_segs to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp_MBY_bet.
MASKED_CHARS  = number of characters masked with the application of this mask.
Masked_Seqs   = number of sequences masked with the application of this mask.
Masking_Segs  = number of segments used in the application of the mask


seq MBY pdb MBY lcl MBY tms MBY lnk MBY alp STAT 1 k, tab delim. data, head stat_, value

This are the statistics from masking the fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk with
the mask alla_segs to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp.
MASKED_CHARS  = number of characters masked with the application of this mask.
Masked_Seqs   = number of sequences masked with the application of this mask.
Masking_Segs  = number of segments used in the application of the mask


seq MBY pdb MBY lcl MBY tms STAT 1 k, tab delim. data, head stat_, value

This are the statistics from masking the fasta file seq_MBY_pdb_MBY_lcl with
the mask tm_segs to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms.
MASKED_CHARS  = number of characters masked with the application of this mask.
Masked_Seqs   = number of sequences masked with the application of this mask.
Masking_Segs  = number of segments used in the application of the mask


seq MBY pdb MBY lcl STAT 1 k, tab delim. data, head stat_, value

This are the statistics from masking the fasta file seq_MBY_pdb with
the mask low_complexity_long to generate the masked fasta file seq_MBY_pdb_MBY_lcl.
MASKED_CHARS  = number of characters masked with the application of this mask.
Masked_Seqs   = number of sequences masked with the application of this mask.
Masking_Segs  = number of segments used in the application of the mask


seq MBY pdb STAT 1 k, tab delim. data, head stat_, value

This are the statistics from masking the fasta file seq with
the mask minscop_soluble_matches to generate the masked fasta file seq_MBY_pdb.
MASKED_CHARS  = number of characters masked with the application of this mask.
Masked_Seqs   = number of sequences masked with the application of this mask.
Masking_Segs  = number of segments used in the application of the mask


seq MBY tms 1385 k, fasta data, head gid_, masked_seq

This fasta file is the result of masking seq 
with the mask tm_segs


seq MBY tms COMP 1 k, tab delim. data, head aa_, count_n

This is the aa composition of the 
masked file from masking the fasta file seq with
the mask tm_segs to generate the masked fasta file seq_MBY_tms.


seq MBY tms STAT 1 k, tab delim. data, head stat_, value

This are the statistics from masking the fasta file seq with
the mask tm_segs to generate the masked fasta file seq_MBY_tms.
MASKED_CHARS  = number of characters masked with the application of this mask.
Masked_Seqs   = number of sequences masked with the application of this mask.
Masking_Segs  = number of segments used in the application of the mask


seq MBY ucd 1385 k, fasta data, head gid_, masked_seq

This fasta file is the result of masking seq 
with the mask unchar_domains


seq MBY ucd COMP 1 k, tab delim. data, head aa_, count_n

This is the aa composition of the 
masked file from masking the fasta file seq with
the mask unchar_domains to generate the masked fasta file seq_MBY_ucd.


seq MBY ucd STAT 1 k, tab delim. data, head stat_, value

This are the statistics from masking the fasta file seq with
the mask unchar_domains to generate the masked fasta file seq_MBY_ucd.
MASKED_CHARS  = number of characters masked with the application of this mask.
Masked_Seqs   = number of sequences masked with the application of this mask.
Masking_Segs  = number of segments used in the application of the mask


seq lengths 46 k, tab delim. data, head gid_, length_n

Length of each sequence in genome.


sfam occurrence 8 k, tab delim. data, head sfam_, count

Number of times each sfam (represented by three scop fid numbers) occurs in genome EC
This table should be sorted into a standard order.


ss freq histo 1 k, tab delim. data, head aa_, freq_n

Histogram of frequency of the various amino acids


tm segs filtered 72 k, tab delim. data, head id_, start_I, stop_n, energy_f

Transmembrane segment definitions after removing pdb matches and (most
importantly) low-complexity regions. The tm_segs table is just
the raw data.
This is based on looking at the masked the file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk for the TM
segments (annotated with a 3).


unchar domains 56 k, tab delim. data, head id_, start_I, stop_n

Linker regions between two other defined segments, 
which are greater in length than 50 
That is, these are uncharacterized protein domains. 


[census home]