For Genome HP, Tables with Specific Analysis

Table Name Size (kb), Format Links Fields (keys bold) Description
tm segs 37 k, tab delim. data, head id_, start_I, stop_n, energy_f

Transmembrane segments. 
(version 2, revised 971113)


tm histo 1 k, tab delim. data, head ntm_I, prots_n

Histogram of frequency of transmembrane segments. 


signal segs 5 k, tab delim. data, head id_, start_I, stop_n

Signal sequences.


seq MBY pdb MBY lcl MBY tms MBY lnk STAT 1 k, tab delim. data, head stat_, value

This are the statistics from masking the fasta file seq_MBY_pdb_MBY_lcl_MBY_tms with
the mask linkers to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk.
MASKED_CHARS  = number of characters masked with the application of this mask.
Masked_Seqs   = number of sequences masked with the application of this mask.
Masking_Segs  = number of segments used in the application of the mask


seq MBY pdb MBY lcl MBY tms MBY lnk COMP 1 k, tab delim. data, head aa_, count_n

This is the aa composition of the 
masked file from masking the fasta file seq_MBY_pdb_MBY_lcl_MBY_tms with
the mask linkers to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk.


seq MBY pdb MBY lcl MBY tms MBY lnk 511 k, fasta data, head gid_, masked_seq

This fasta file is the result of masking seq_MBY_pdb_MBY_lcl_MBY_tms 
with the mask linkers


minscop soluble matches no overlap 17 k, tab delim. data, head gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f

These are the good matches to an e-value cutoff of .01 
for just the soluble proteins, scop classes 1-5,7
This table is the result of filtering out the matches from 
minscop_soluble_matches that hit the same sequence on the genome. 


id ntm 17 k, tab delim. data, head id_, signalp, ntm_n

This table contains the number of transmembrane segments for each ORF.
Its definition of TM-segment is after filtering. 
It also has signal sequence data, based on simple criteria. 


fold occurrence 4 k, tab delim. data, head fold_, count

Number of times each fold (represented by two scop fid numbers) occurs in genome HP
This table should be sorted into a standard order.


all masks 258 k, tab delim. data, head gid_, start_I, stop_n, tool_, score

This file concatenates the results of 
creating all the masks for genome HP. 


aafreq histo 1 k, tab delim. data, head aa_, freq_n

Histogram of frequency of the various amino acids


alla segs 11 k, tab delim. data, head id_, start_I, stop_n

all-a segments


allb segs 2 k, tab delim. data, head id_, start_I, stop_n

all-b segments


characterized domains 26 k, tab delim. data, head id_, start_I, stop_n

Already characterized domains (the borders between
linker regions).


full len segs 20 k, tab delim. data, head id_, start_I, stop_n

Full length segments.


genome v minscop 481 k, tab delim. data, head did_, gid_, TargStart_I, TargStop_n, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f

Result of running genome HP against Ted's minscop (scop 1.35)


genome v pdb40d135 595 k, tab delim. data, head did_, gid_, TargStart_I, TargStop_n, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f

Result of running genome HP agains pdb40d-1.35


gorss 502 k, fasta data, head gid_, gorss

This fasta file is the result of running GOR sec. struc. prediction
on the genome HP 


gorss MBY nul 502 k, fasta data, head gid_, masked_seq

This fasta file is the result of masking gorss 
with the mask full_len_segs


gorss MBY nul COMP 1 k, tab delim. data, head aa_, count_n

This is the aa composition of the 
masked file from masking the fasta file gorss with
the mask full_len_segs to generate the masked fasta file gorss_MBY_nul.


gorss MBY nul STAT 1 k, tab delim. data, head stat_, value

This are the statistics from masking the fasta file gorss with
the mask full_len_segs to generate the masked fasta file gorss_MBY_nul.
MASKED_CHARS  = number of characters masked with the application of this mask.
Masked_Seqs   = number of sequences masked with the application of this mask.
Masking_Segs  = number of segments used in the application of the mask


gorss MBY ucd 502 k, fasta data, head gid_, masked_seq

This fasta file is the result of masking gorss 
with the mask unchar_domains


gorss MBY ucd COMP 1 k, tab delim. data, head aa_, count_n

This is the aa composition of the 
masked file from masking the fasta file gorss with
the mask unchar_domains to generate the masked fasta file gorss_MBY_ucd.


gorss MBY ucd STAT 1 k, tab delim. data, head stat_, value

This are the statistics from masking the fasta file gorss with
the mask unchar_domains to generate the masked fasta file gorss_MBY_ucd.
MASKED_CHARS  = number of characters masked with the application of this mask.
Masked_Seqs   = number of sequences masked with the application of this mask.
Masking_Segs  = number of segments used in the application of the mask


id ntm nofilt 18 k, tab delim. data, head id_, signalp, ntm_n

This table contains data on whether there is a signal sequence
and the number of transmembrane segments.
(version 2, revised 971113).
(Renamed table on 980101: id_ntm --> id_ntm_nofilt)


linkers 21 k, tab delim. data, head id_, start_I, stop_n

Linker regions between two other defined segments, 
which are less in length than 50 


low complexity long 19 k, tab delim. data, head id_, start_I, stop_n, cplxity_f

Low complexity regions generated with the
following seg command: seg/seg tmp.fa 45 3.4 3.75 -l


low complexity short 33 k, tab delim. data, head id_, start_I, stop_n, cplxity_f

Low complexity regions generated with the
following seg command: seg/seg tmp.fa 25 3.0 3.3 -l


minscop occurrence 10 k, tab delim. data, head did_, count

Number of times each minscop domain id (did) occurs in genome HP
This table should be sorted into a standard order and contain 990 entries. 


minscop soluble matches 18 k, tab delim. data, head gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f

These are the good matches to an e-value cutoff of .01 
for just the soluble proteins, scop classes 1-5,7
This is with a year cutoff of 97


minscop soluble matches overlap 1 k, tab delim. data, head gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f

These are the good matches to an e-value cutoff of .01 
for just the soluble proteins, scop classes 1-5,7
This table is the matches from 
minscop_soluble_matches that hit the same sequence on the genome. 
That is, it contains duplicate matches that should not be used. 


null mask 1 k, tab delim. data, head



pdb40d135 mem matches 1 k, tab delim. data, head gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f

These are the good matches to an e-value cutoff of .01 
for just the membrane proteins, scop class 6


pdb40d135 soluble matches 37 k, tab delim. data, head gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f

These are the good matches to an e-value cutoff of .01 
for just the soluble proteins, scop classes 1-5,7


seq 581 k, Hidden data, head -

-

seq MBY cdo 511 k, fasta data, head gid_, masked_seq

This fasta file is the result of masking seq 
with the mask characterized_domains


seq MBY cdo COMP 1 k, tab delim. data, head aa_, count_n

This is the aa composition of the 
masked file from masking the fasta file seq with
the mask characterized_domains to generate the masked fasta file seq_MBY_cdo.


seq MBY cdo STAT 1 k, tab delim. data, head stat_, value

This are the statistics from masking the fasta file seq with
the mask characterized_domains to generate the masked fasta file seq_MBY_cdo.
MASKED_CHARS  = number of characters masked with the application of this mask.
Masked_Seqs   = number of sequences masked with the application of this mask.
Masking_Segs  = number of segments used in the application of the mask


seq MBY lcl 511 k, fasta data, head gid_, masked_seq

This fasta file is the result of masking seq 
with the mask low_complexity_long


seq MBY lcl COMP 1 k, tab delim. data, head aa_, count_n

This is the aa composition of the 
masked file from masking the fasta file seq with
the mask low_complexity_long to generate the masked fasta file seq_MBY_lcl.


seq MBY lcl STAT 1 k, tab delim. data, head stat_, value

This are the statistics from masking the fasta file seq with
the mask low_complexity_long to generate the masked fasta file seq_MBY_lcl.
MASKED_CHARS  = number of characters masked with the application of this mask.
Masked_Seqs   = number of sequences masked with the application of this mask.
Masking_Segs  = number of segments used in the application of the mask


seq MBY lcs 6 k, Bad! data, head gid_, masked_seq

This fasta file is the result of masking seq 
with the mask low_complexity_short


seq MBY lcs COMP 1 k, tab delim. data, head aa_, count_n

This is the aa composition of the 
masked file from masking the fasta file seq with
the mask low_complexity_short to generate the masked fasta file seq_MBY_lcs.


seq MBY lcs STAT 1 k, tab delim. data, head stat_, value

This are the statistics from masking the fasta file seq with
the mask low_complexity_short to generate the masked fasta file seq_MBY_lcs.


seq MBY lnk 511 k, fasta data, head gid_, masked_seq

This fasta file is the result of masking seq 
with the mask linkers


seq MBY lnk COMP 1 k, tab delim. data, head aa_, count_n

This is the aa composition of the 
masked file from masking the fasta file seq with
the mask linkers to generate the masked fasta file seq_MBY_lnk.


seq MBY lnk STAT 1 k, tab delim. data, head stat_, value

This are the statistics from masking the fasta file seq with
the mask linkers to generate the masked fasta file seq_MBY_lnk.
MASKED_CHARS  = number of characters masked with the application of this mask.
Masked_Seqs   = number of sequences masked with the application of this mask.
Masking_Segs  = number of segments used in the application of the mask


seq MBY nul 511 k, fasta data, head gid_, masked_seq

This fasta file is the result of masking seq 
with the mask full_len_segs


seq MBY nul COMP 1 k, tab delim. data, head aa_, count_n

This is the aa composition of the 
masked file from masking the fasta file seq with
the mask full_len_segs to generate the masked fasta file seq_MBY_nul.


seq MBY nul STAT 1 k, tab delim. data, head stat_, value

This are the statistics from masking the fasta file seq with
the mask full_len_segs to generate the masked fasta file seq_MBY_nul.
MASKED_CHARS  = number of characters masked with the application of this mask.
Masked_Seqs   = number of sequences masked with the application of this mask.
Masking_Segs  = number of segments used in the application of the mask


seq MBY pdb 511 k, fasta data, head gid_, masked_seq

This fasta file is the result of masking seq 
with the mask minscop_soluble_matches


seq MBY pdb COMP 1 k, tab delim. data, head aa_, count_n

This is the aa composition of the 
masked file from masking the fasta file seq with
the mask minscop_soluble_matches to generate the masked fasta file seq_MBY_pdb.


seq MBY pdb MBY lcl 511 k, fasta data, head gid_, masked_seq

This fasta file is the result of masking seq_MBY_pdb 
with the mask low_complexity_long


seq MBY pdb MBY lcl COMP 1 k, tab delim. data, head aa_, count_n

This is the aa composition of the 
masked file from masking the fasta file seq_MBY_pdb with
the mask low_complexity_long to generate the masked fasta file seq_MBY_pdb_MBY_lcl.


seq MBY pdb MBY lcl MBY lcs 5 k, Bad! data, head gid_, masked_seq

This fasta file is the result of masking seq_MBY_pdb_MBY_lcl 
with the mask low_complexity_long


seq MBY pdb MBY lcl MBY lcs COMP 1 k, tab delim. data, head aa_, count_n

This is the aa composition of the 
masked file from masking the fasta file seq_MBY_pdb_MBY_lcl with
the mask low_complexity_long to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_lcs.


seq MBY pdb MBY lcl MBY lcs MBY tms 3 k, Bad! data, head gid_, masked_seq

This fasta file is the result of masking seq_MBY_pdb_MBY_lcl_MBY_lcs 
with the mask tm_segs


seq MBY pdb MBY lcl MBY lcs MBY tms COMP 1 k, tab delim. data, head aa_, count_n

This is the aa composition of the 
masked file from masking the fasta file seq_MBY_pdb_MBY_lcl_MBY_lcs with
the mask tm_segs to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_lcs_MBY_tms.


seq MBY pdb MBY lcl MBY lcs MBY tms STAT 1 k, tab delim. data, head stat_, value

This are the statistics from masking the fasta file seq_MBY_pdb_MBY_lcl_MBY_lcs with
the mask tm_segs to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_lcs_MBY_tms.


seq MBY pdb MBY lcl MBY lcs STAT 1 k, tab delim. data, head stat_, value

This are the statistics from masking the fasta file seq_MBY_pdb_MBY_lcl with
the mask low_complexity_long to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_lcs.


seq MBY pdb MBY lcl MBY tms 511 k, fasta data, head gid_, masked_seq

This fasta file is the result of masking seq_MBY_pdb_MBY_lcl 
with the mask tm_segs


seq MBY pdb MBY lcl MBY tms COMP 1 k, tab delim. data, head aa_, count_n

This is the aa composition of the 
masked file from masking the fasta file seq_MBY_pdb_MBY_lcl with
the mask tm_segs to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms.


seq MBY pdb MBY lcl MBY tms MBY lnk MBY alp 511 k, fasta data, head gid_, masked_seq

This fasta file is the result of masking seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk 
with the mask alla_segs


seq MBY pdb MBY lcl MBY tms MBY lnk MBY alp COMP 1 k, tab delim. data, head aa_, count_n

This is the aa composition of the 
masked file from masking the fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk with
the mask alla_segs to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp.


seq MBY pdb MBY lcl MBY tms MBY lnk MBY alp MBY bet 511 k, fasta data, head gid_, masked_seq

This fasta file is the result of masking seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp 
with the mask allb_segs


seq MBY pdb MBY lcl MBY tms MBY lnk MBY alp MBY bet COMP 1 k, tab delim. data, head aa_, count_n

This is the aa composition of the 
masked file from masking the fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp with
the mask allb_segs to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp_MBY_bet.


seq MBY pdb MBY lcl MBY tms MBY lnk MBY alp MBY bet STAT 1 k, tab delim. data, head stat_, value

This are the statistics from masking the fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp with
the mask allb_segs to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp_MBY_bet.
MASKED_CHARS  = number of characters masked with the application of this mask.
Masked_Seqs   = number of sequences masked with the application of this mask.
Masking_Segs  = number of segments used in the application of the mask


seq MBY pdb MBY lcl MBY tms MBY lnk MBY alp STAT 1 k, tab delim. data, head stat_, value

This are the statistics from masking the fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk with
the mask alla_segs to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk_MBY_alp.
MASKED_CHARS  = number of characters masked with the application of this mask.
Masked_Seqs   = number of sequences masked with the application of this mask.
Masking_Segs  = number of segments used in the application of the mask


seq MBY pdb MBY lcl MBY tms STAT 1 k, tab delim. data, head stat_, value

This are the statistics from masking the fasta file seq_MBY_pdb_MBY_lcl with
the mask tm_segs to generate the masked fasta file seq_MBY_pdb_MBY_lcl_MBY_tms.
MASKED_CHARS  = number of characters masked with the application of this mask.
Masked_Seqs   = number of sequences masked with the application of this mask.
Masking_Segs  = number of segments used in the application of the mask


seq MBY pdb MBY lcl STAT 1 k, tab delim. data, head stat_, value

This are the statistics from masking the fasta file seq_MBY_pdb with
the mask low_complexity_long to generate the masked fasta file seq_MBY_pdb_MBY_lcl.
MASKED_CHARS  = number of characters masked with the application of this mask.
Masked_Seqs   = number of sequences masked with the application of this mask.
Masking_Segs  = number of segments used in the application of the mask


seq MBY pdb MBY tms 2 k, Bad! data, head gid_, masked_seq

This fasta file is the result of masking seq_MBY_pdb 
with the mask tm_segs


seq MBY pdb MBY tms COMP 1 k, tab delim. data, head aa_, count_n

This is the aa composition of the 
masked file from masking the fasta file seq_MBY_pdb with
the mask tm_segs to generate the masked fasta file seq_MBY_pdb_MBY_tms.


seq MBY pdb MBY tms STAT 1 k, tab delim. data, head stat_, value

This are the statistics from masking the fasta file seq_MBY_pdb with
the mask tm_segs to generate the masked fasta file seq_MBY_pdb_MBY_tms.


seq MBY pdb STAT 1 k, tab delim. data, head stat_, value

This are the statistics from masking the fasta file seq with
the mask minscop_soluble_matches to generate the masked fasta file seq_MBY_pdb.
MASKED_CHARS  = number of characters masked with the application of this mask.
Masked_Seqs   = number of sequences masked with the application of this mask.
Masking_Segs  = number of segments used in the application of the mask


seq MBY tms 511 k, fasta data, head gid_, masked_seq

This fasta file is the result of masking seq 
with the mask tm_segs


seq MBY tms COMP 1 k, tab delim. data, head aa_, count_n

This is the aa composition of the 
masked file from masking the fasta file seq with
the mask tm_segs to generate the masked fasta file seq_MBY_tms.


seq MBY tms STAT 1 k, tab delim. data, head stat_, value

This are the statistics from masking the fasta file seq with
the mask tm_segs to generate the masked fasta file seq_MBY_tms.
MASKED_CHARS  = number of characters masked with the application of this mask.
Masked_Seqs   = number of sequences masked with the application of this mask.
Masking_Segs  = number of segments used in the application of the mask


seq MBY ucd 511 k, fasta data, head gid_, masked_seq

This fasta file is the result of masking seq 
with the mask unchar_domains


seq MBY ucd COMP 1 k, tab delim. data, head aa_, count_n

This is the aa composition of the 
masked file from masking the fasta file seq with
the mask unchar_domains to generate the masked fasta file seq_MBY_ucd.


seq MBY ucd STAT 1 k, tab delim. data, head stat_, value

This are the statistics from masking the fasta file seq with
the mask unchar_domains to generate the masked fasta file seq_MBY_ucd.
MASKED_CHARS  = number of characters masked with the application of this mask.
Masked_Seqs   = number of sequences masked with the application of this mask.
Masking_Segs  = number of segments used in the application of the mask


seq lengths 17 k, tab delim. data, head gid_, length_n

Length of each sequence in genome.


sfam occurrence 8 k, tab delim. data, head sfam_, count

Number of times each sfam (represented by three scop fid numbers) occurs in genome HP
This table should be sorted into a standard order.


ss freq histo 1 k, tab delim. data, head aa_, freq_n

Histogram of frequency of the various amino acids


tm segs filtered 18 k, tab delim. data, head id_, start_I, stop_n, energy_f

Transmembrane segment definitions after removing pdb matches and (most
importantly) low-complexity regions. The tm_segs table is just
the raw data.
This is based on looking at the masked the file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk for the TM
segments (annotated with a 3).


unchar domains 22 k, tab delim. data, head id_, start_I, stop_n

Linker regions between two other defined segments, 
which are greater in length than 50 
That is, these are uncharacterized protein domains. 


[census home]