For Genome `EC`, Tables with Specific Analysis

Table Name	Size (kb), Format	Links	Fields (keys bold)	Description
tm segs	147 k, tab delim.	data, head	id_, start_I, stop_n, energy_f	Transmembrane segments. (version 2, revised 971113)
tm histo	1 k, tab delim.	data, head	ntm_I, prots_n	Histogram of frequency of transmembrane segments.
signal segs	14 k, tab delim.	data, head	id_, start_I, stop_n	Signal sequences.
seq MBY pdb MBY lcl MBY tms MBY lnk STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file seq_MBY_pdb_MBY_lcl_MBY_tms with the mask linkers to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
seq MBY pdb MBY lcl MBY tms MBY lnk COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file seq_MBY_pdb_MBY_lcl_MBY_tms with the mask linkers to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk.
seq MBY pdb MBY lcl MBY tms MBY lnk	1385 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking seq_MBY_pdb_MBY_lcl_MBY_tms with the mask linkers
minscop soluble matches no overlap	57 k, tab delim.	data, head	gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f	These are the good matches to an e-value cutoff of .01 for just the soluble proteins, scop classes 1-5,7 This table is the result of filtering out the matches from minscop_soluble_matches that hit the same sequence on the genome.
id ntm	47 k, tab delim.	data, head	id_, signalp, ntm_n	This table contains the number of transmembrane segments for each ORF. Its definition of TM-segment is after filtering. It also has signal sequence data, based on simple criteria.
fold occurrence	4 k, tab delim.	data, head	fold_, count	Number of times each fold (represented by two scop fid numbers) occurs in genome EC This table should be sorted into a standard order.
all masks	798 k, tab delim.	data, head	gid_, start_I, stop_n, tool_, score	This file concatenates the results of creating all the masks for genome EC.
aa freq histo	1 k, tab delim.	data, head	aa_, freq_n	Histogram of frequency of the various amino acids
alla segs	19 k, tab delim.	data, head	id_, start_I, stop_n	all-a segments
allb segs	2 k, tab delim.	data, head	id_, start_I, stop_n	all-b segments
characterized domains	74 k, tab delim.	data, head	id_, start_I, stop_n	Already characterized domains (the borders between linker regions).
full len segs	55 k, tab delim.	data, head	id_, start_I, stop_n	Full length segments.
genome v minscop	488 k, tab delim.	data, head	did_, gid_, TargStart_I, TargStop_n, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f	Result of running genome EC against Ted's minscop (scop 1.35)
gorss	1383 k, fasta	data, head	gid_, gorss	This fasta file is the result of running GOR sec. struc. prediction on the genome EC
gorss MBY nul	1383 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking gorss with the mask full_len_segs
gorss MBY nul COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file gorss with the mask full_len_segs to generate the masked fasta file gorss_MBY_nul.
gorss MBY nul STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file gorss with the mask full_len_segs to generate the masked fasta file gorss_MBY_nul. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
gorss MBY ucd	1383 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking gorss with the mask unchar_domains
gorss MBY ucd COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file gorss with the mask unchar_domains to generate the masked fasta file gorss_MBY_ucd.
gorss MBY ucd STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file gorss with the mask unchar_domains to generate the masked fasta file gorss_MBY_ucd. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
id ntm nofilt	47 k, tab delim.	data, head	id_, signalp, ntm_n	This table contains data on whether there is a signal sequence and the number of transmembrane segments. (version 2, revised 971113). (Renamed table on 980101: id_ntm --> id_ntm_nofilt)
linkers	64 k, tab delim.	data, head	id_, start_I, stop_n	Linker regions between two other defined segments, which are less in length than 50
low complexity long	41 k, tab delim.	data, head	id_, start_I, stop_n, cplxity_f	Low complexity regions generated with the following seg command: seg tmp.fa 45 3.4 3.75 -l
low complexity short	74 k, tab delim.	data, head	id_, start_I, stop_n, cplxity_f	Low complexity regions generated with the following seg command: seg tmp.fa 25 3.0 3.3 -l
minscop occurrence	10 k, tab delim.	data, head	did_, count	Number of times each minscop domain id (did) occurs in genome EC This table should be sorted into a standard order and contain 990 entries.
minscop soluble matches	61 k, tab delim.	data, head	gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f	These are the good matches to an e-value cutoff of .01 for just the soluble proteins, scop classes 1-5,7 This is with a year cutoff of 97
minscop soluble matches overlap	4 k, tab delim.	data, head	gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f	These are the good matches to an e-value cutoff of .01 for just the soluble proteins, scop classes 1-5,7 This table is the matches from minscop_soluble_matches that hit the same sequence on the genome. That is, it contains duplicate matches that should not be used.
seq	1379 k, Hidden	data, head	-	-
seq MBY cdo	1385 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking seq with the mask characterized_domains
seq MBY cdo COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file seq with the mask characterized_domains to generate the masked fasta file seq_MBY_cdo.
seq MBY cdo STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file seq with the mask characterized_domains to generate the masked fasta file seq_MBY_cdo. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
seq MBY lcl	1385 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking seq with the mask low_complexity_long
seq MBY lcl COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file seq with the mask low_complexity_long to generate the masked fasta file seq_MBY_lcl.
seq MBY lcl STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file seq with the mask low_complexity_long to generate the masked fasta file seq_MBY_lcl. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
seq MBY lnk	1385 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking seq with the mask linkers
seq MBY lnk COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file seq with the mask linkers to generate the masked fasta file seq_MBY_lnk.
seq MBY lnk STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file seq with the mask linkers to generate the masked fasta file seq_MBY_lnk. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
seq MBY nul	1385 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking seq with the mask full_len_segs
seq MBY nul COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file seq with the mask full_len_segs to generate the masked fasta file seq_MBY_nul.
seq MBY nul STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file seq with the mask full_len_segs to generate the masked fasta file seq_MBY_nul. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
seq MBY pdb	1385 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking seq with the mask minscop_soluble_matches
seq MBY pdb COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file seq with the mask minscop_soluble_matches to generate the masked fasta file seq_MBY_pdb.
seq MBY pdb MBY lcl	1385 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking seq_MBY_pdb with the mask low_complexity_long
seq MBY pdb MBY lcl COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file seq_MBY_pdb with the mask low_complexity_long to generate the masked fasta file seq_MBY_pdb_MBY_lcl.
seq MBY pdb MBY lcl MBY tms	1385 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking seq_MBY_pdb_MBY_lcl with the mask tm_segs
seq MBY pdb MBY lcl MBY tms COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file seq_MBY_pdb_MBY_lcl with the mask tm_segs to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms.
seq MBY pdb MBY lcl MBY tms MBY lnk MBY alp	1385 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk with the mask alla_segs
seq MBY pdb MBY lcl MBY tms MBY lnk MBY alp COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk with the mask alla_segs to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp.
seq MBY pdb MBY lcl MBY tms MBY lnk MBY alp MBY bet	1385 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp with the mask allb_segs
seq MBY pdb MBY lcl MBY tms MBY lnk MBY alp MBY bet COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp with the mask allb_segs to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp_MBY_bet.
seq MBY pdb MBY lcl MBY tms MBY lnk MBY alp MBY bet STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp with the mask allb_segs to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp_MBY_bet. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
seq MBY pdb MBY lcl MBY tms MBY lnk MBY alp STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk with the mask alla_segs to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
seq MBY pdb MBY lcl MBY tms STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file seq_MBY_pdb_MBY_lcl with the mask tm_segs to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
seq MBY pdb MBY lcl STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file seq_MBY_pdb with the mask low_complexity_long to generate the masked fasta file seq_MBY_pdb_MBY_lcl. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
seq MBY pdb STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file seq with the mask minscop_soluble_matches to generate the masked fasta file seq_MBY_pdb. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
seq MBY tms	1385 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking seq with the mask tm_segs
seq MBY tms COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file seq with the mask tm_segs to generate the masked fasta file seq_MBY_tms.
seq MBY tms STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file seq with the mask tm_segs to generate the masked fasta file seq_MBY_tms. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
seq MBY ucd	1385 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking seq with the mask unchar_domains
seq MBY ucd COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file seq with the mask unchar_domains to generate the masked fasta file seq_MBY_ucd.
seq MBY ucd STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file seq with the mask unchar_domains to generate the masked fasta file seq_MBY_ucd. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
seq lengths	46 k, tab delim.	data, head	gid_, length_n	Length of each sequence in genome.
sfam occurrence	8 k, tab delim.	data, head	sfam_, count	Number of times each sfam (represented by three scop fid numbers) occurs in genome EC This table should be sorted into a standard order.
ss freq histo	1 k, tab delim.	data, head	aa_, freq_n	Histogram of frequency of the various amino acids
tm segs filtered	72 k, tab delim.	data, head	id_, start_I, stop_n, energy_f	Transmembrane segment definitions after removing pdb matches and (most importantly) low-complexity regions. The tm_segs table is just the raw data. This is based on looking at the masked the file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk for the TM segments (annotated with a 3).
unchar domains	56 k, tab delim.	data, head	id_, start_I, stop_n	Linker regions between two other defined segments, which are greater in length than 50 That is, these are uncharacterized protein domains.

[census home]

For Genome EC, Tables with Specific Analysis

For Genome `EC`, Tables with Specific Analysis