Table Name
|
Size (kb),
Format
|
Links
|
Fields
(keys bold)
|
Description
|
tm segs
|
12 k, tab delim.
|
data,
head
|
id_, start_I, stop_n, sumscor, energy_f
|
Transmembrane segments.
(version 2, revised 971113)
(version 3, revised 981127, now sumscor based on calc_istm_score)
sumscor gives a confidence value in the TM helix based on an analysis
of the TM helices in the WHOLE protein.
#
# These parameters were refined on MG
# see genomes/mg-analyze-maxh-981127.xls
#
my = (minhall<-2 ? 4 :
(tot_aa > 50 ? 3 :
( minhall <-1.75 ? 2 :
( tot_aa > 20 ? 1 : 0))));
|
signal segs
|
2 k, tab delim.
|
data,
head
|
id_, start_I, stop_n
|
Signal sequences.
|
sat mg strucs
|
12 k, tab delim.
|
data,
head
|
|
Sara look at this!
Reformated version of
SAT's http://www.mrc-lmb.cam.ac.uk/genomes/MG_strucs.html
by MBG.
Here is the readme for the original. "_" or "?" was used for
unidentifiable fields.
Explanation of format:
(MG sequence number)-(sequence length) (MG sequence region) (scop sequence name)-(sequence length) (sequence
region) (expectation value with which found)
If there is an M at the end of the line, the relationship was only found starting the search from the MG sequence.
If the information 'via sequence name' is given, the relationship is not found with PSI-BLAST, but only via another
sequence in the GEANFAMMER sequence family of that sequence.
If there is a * at the end of the line, the expectation value is below that considered significant, but the match is accepted for
other reasons.
MG001-267 149-266 pdb_d2pola3-122 4-119 8e-25
MG002-310 3-64 pdb_d1xbl__-75 4-68 5e-16
MG003-650 412-640 pdb_d1bgw__-680 1-243 4e-79
MG003-650 96-215 pdb_d1ah6__-213 89-212 8e-20
MG004-836 24-498 pdb_d1bgw__-680 206-677 1e-128
MG005-417 5-110 pdb_d1seta1-110 4-110 4e-25
MG005-417 114-413 pdb_d1seta2-311 3-303 1e-109
MG006-210 1-197 pdb_3adk__-194 5-186 8e-34 M
.
.
.
|
minscop soluble matches no overlap
|
9 k, tab delim.
|
data,
head
|
gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f
|
These are the good matches to an e-value cutoff of .01
for just the soluble proteins, scop classes 1-5,7
This table is the result of filtering out the matches from
minscop_soluble_matches that hit the same sequence on the genome.
|
id ntm
|
5 k, tab delim.
|
data,
head
|
id_, signalp, ntm_n
|
This table contains the number of transmembrane segments for each ORF.
Its definition of TM-segment is after filtering.
It also has signal sequence data, based on simple criteria.
|
genome v minscop
|
14 k, tab delim.
|
data,
head
|
did_, gid_, TargStart_I, TargStop_n, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f, comment
|
This is custom made up file based on the values in the table
sat_mg_strucs_nov98.txt. It was constructed by MBG on 981127. It has
all the original fields, plus a comment. The original matches are
from the web page at the MRC LMB. The beginning of this page is
reproduced below.
--
SCOP Domain Sequences in the MG Genome
(Additional information for "Structural assignments to the Mycoplasma genitalium proteins show extensive gene duplications and domain
rearrangements" (1998) by Sarah A. Teichmann, Jong Park and Cyrus Chothia, Proc. Natl. Acad. Sci. USA, 95, 14658-14663)
Explanation of format:
(MG sequence number)-(sequence length) (MG sequence region) (scop sequence name)-(sequence length)
(sequence region) (expectation value with which found)
If there is an M at the end of the line, the relationship was only found starting the search from the MG sequence.
If the information 'via sequence name' is given, the relationship is not found with PSI-BLAST, but only via another
sequence in the GEANFAMMER sequence family of that sequence.
If there is a * at the end of the line, the expectation value is below that considered significant, but the match is
accepted for other reasons.
MG001-267 149-266 pdb_d2pola3-122 4-119 8e-25
MG002-310 3-64 pdb_d1xbl__-75 4-68 5e-16
MG002-310 122-211 pdb_d1tbd__-134 7-91 5e-7
MG003-650 412-640 pdb_d1bgw__-680 1-243 4e-79
MG003-650 229-410 pdb_ds043_1-172 1-172 4e-54
MG003-650 96-215 pdb_d1ah6__-213 89-212 8e-20
MG004-836 24-498 pdb_d1bgw__-680 206-677 1e-128
MG005-417 5-110 pdb_d1seta1-110 4-110 4e-25
MG005-417 114-413 pdb_d1seta2-311 3-303 1e-109
|
fold occurrence
|
4 k, tab delim.
|
data,
head
|
fold_, count
|
Number of times each fold (represented by two scop fid numbers) occurs in genome MG
This table should be sorted into a standard order.
|
all masks
|
89 k, tab delim.
|
data,
head
|
gid_, start_I, stop_n, tool_, score
|
This file concatenates the results of
creating all the masks for genome MG.
|
aafreq histo
|
1 k, tab delim.
|
data,
head
|
aa_, freq_n
|
Histogram of frequency of the various amino acids
|
alla segs
|
2 k, tab delim.
|
data,
head
|
id_, start_I, stop_n
|
all-a segments
|
allb segs
|
2 k, tab delim.
|
data,
head
|
id_, start_I, stop_n
|
all-b segments
|
characterized domains 1
|
8 k, tab delim.
|
data,
head
|
id_, start_I, stop_n
|
Already characterized domains (the borders between
linker regions).
This is done at phase _1.
generate_linkers() running on MG/seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv with tag _1
|
characterized domains 2
|
7 k, tab delim.
|
data,
head
|
id_, start_I, stop_n
|
Already characterized domains (the borders between
linker regions).
This is done at phase _2.
generate_linkers() running on MG/seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm with tag _2
|
comp report
|
13 k, tab delim.
|
data,
head
|
selection, genome, sum, total_seqs, masked_seqs, total_chars, masked_chars, total_segs, masking_segs, mask_chars_per_seg, mask_chars_per_seq, frac_masked_chars, frac_masked_seqs, masking_segs_per_seq, dav_rms, dps_rms, dav_A, dav_C, dav_D, dav_E, dav_F, dav_G, dav_H, dav_I, dav_K, dav_L, dav_M, dav_N, dav_P, dav_Q, dav_R, dav_S, dav_T, dav_V, dav_W, dav_Y, dps_A, dps_C, dps_D, dps_E, dps_F, dps_G, dps_H, dps_I, dps_K, dps_L, dps_M, dps_N, dps_P, dps_Q, dps_R, dps_S, dps_T, dps_V, dps_W, dps_Y, pct_A, pct_C, pct_D, pct_E, pct_F, pct_G, pct_H, pct_I, pct_K, pct_L, pct_M, pct_N, pct_P, pct_Q, pct_R, pct_S, pct_T, pct_V, pct_W, pct_Y, A_n, C_n, D_n, E_n, F_n, G_n, H_n, I_n, K_n, L_n, M_n, N_n, P_n, Q_n, R_n, S_n, T_n, V_n, W_n, Y_n, 0_n, 1_n, 2_n, 3_n, 4_n, 5_n, 6_n, 7_n, 8_n, 9_n, selections_long, sort
|
Report on the compositions in the genomes
MG
based on the following a.a. selections
seq pdb tmb sig lcv lnk tms lcm ln2 fun nof ucd uc2 fasta__ sat9811 psib1wy
(version 3)
|
full len segs
|
6 k, tab delim.
|
data,
head
|
id_, start_I, stop_n
|
Full length segments.
|
genome v minscop fasta
|
7 k, tab delim.
|
data,
head
|
gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f
|
These are the good matches to an e-value cutoff of .01
for just the soluble proteins, scop classes 1-5,7
These are the good matches generated by FASTA against scop 1.35.
These are no longer being used in the analysis but are here for
comparative purposes.
|
genome v minscop psib1way
|
10 k, tab delim.
|
data,
head
|
|
|
genome v minscop sat9811
|
14 k, tab delim.
|
data,
head
|
did_, gid_, TargStart_I, TargStop_n, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f, comment
|
This is custom made up file based on the values in the table
sat_mg_strucs_nov98.txt. It was constructed by MBG on 981127. It has
all the original fields, plus a comment. The original matches are
from the web page at the MRC LMB. The beginning of this page is
reproduced below.
1998.12.14:
d1dts__ MG080 707 847 5 173 2.00E-07 _ _ (mbg fixed)
--
SCOP Domain Sequences in the MG Genome
(Additional information for "Structural assignments to the Mycoplasma genitalium proteins show extensive gene duplications and domain
rearrangements" (1998) by Sarah A. Teichmann, Jong Park and Cyrus Chothia, Proc. Natl. Acad. Sci. USA, 95, 14658-14663)
Explanation of format:
(MG sequence number)-(sequence length) (MG sequence region) (scop sequence name)-(sequence length)
(sequence region) (expectation value with which found)
If there is an M at the end of the line, the relationship was only found starting the search from the MG sequence.
If the information 'via sequence name' is given, the relationship is not found with PSI-BLAST, but only via another
sequence in the GEANFAMMER sequence family of that sequence.
If there is a * at the end of the line, the expectation value is below that considered significant, but the match is
accepted for other reasons.
MG001-267 149-266 pdb_d2pola3-122 4-119 8e-25
MG002-310 3-64 pdb_d1xbl__-75 4-68 5e-16
MG002-310 122-211 pdb_d1tbd__-134 7-91 5e-7
MG003-650 412-640 pdb_d1bgw__-680 1-243 4e-79
MG003-650 229-410 pdb_ds043_1-172 1-172 4e-54
MG003-650 96-215 pdb_d1ah6__-213 89-212 8e-20
MG004-836 24-498 pdb_d1bgw__-680 206-677 1e-128
MG005-417 5-110 pdb_d1seta1-110 4-110 4e-25
MG005-417 114-413 pdb_d1seta2-311 3-303 1e-109
|
genome v minscop sep98
|
12 k, tab delim.
|
data,
head
|
did_, gid_, TargStart_I, TargStop_n, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f, comment
|
This is custom made up file based on the values
in the table sat_mg_strucs.txt.
This was done by MBG on 980907.
It has all the original fields, plus a comment.
|
gorss
|
173 k, fasta
|
data,
head
|
gid_, gorss
|
This fasta file is the result of running GOR sec. struc. prediction
on the genome MG
|
gorss MBY nul
|
173 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking gorss
with the mask full_len_segs
|
gorss MBY nul COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file gorss with
the mask full_len_segs to generate the masked fasta file gorss_MBY_nul.
|
gorss MBY nul STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file gorss with
the mask full_len_segs to generate the masked fasta file gorss_MBY_nul.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
gorss MBY ucd
|
173 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking gorss
with the mask unchar_domains
|
gorss MBY ucd COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file gorss with
the mask unchar_domains to generate the masked fasta file gorss_MBY_ucd.
|
gorss MBY ucd STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file gorss with
the mask unchar_domains to generate the masked fasta file gorss_MBY_ucd.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
hlx aa pair freq
|
20 k, tab delim.
|
data,
head
|
aa_, offset_, count
|
results of all counts of pairs
|
id ntm nofilt
|
5 k, tab delim.
|
data,
head
|
id_, signalp, ntm_n
|
This table contains data on whether there is a signal sequence
and the number of transmembrane segments.
(version 2, revised 971113).
(Renamed table on 980101: id_ntm --> id_ntm_nofilt)
|
lcm segs
|
6 k, tab delim.
|
data,
head
|
gid_, start_, stop, score
|
This routine splits the low_complexity_long (LCL) regions into the LCV
(low-complexity very-long) and LCM (low-complexity medium). The
criteria is simply length. LCVs are LCLs longer than 150 aa. LCMs are
shorter.
|
lcm segs.txt
|
5 k, tab delim.
|
data,
head
|
gid_, start_, stop, score
|
This routine splits the low_complexity_long (lcl) regions into the lcv
(low-complexity very long) and lcm (low-complexity medium). The
criteria is simply length. LCVs are LCLs longer than 150 aa. LCMs are
shorter.
|
lcv segs
|
2 k, tab delim.
|
data,
head
|
gid_, start_, stop, score
|
This routine splits the low_complexity_long (LCL) regions into the LCV
(low-complexity very-long) and LCM (low-complexity medium). The
criteria is simply length. LCVs are LCLs longer than 150 aa. LCMs are
shorter.
|
lcv segs.txt
|
1 k, tab delim.
|
data,
head
|
gid_, start_, stop, score
|
This routine splits the low_complexity_long (lcl) regions into the lcv
(low-complexity very long) and lcm (low-complexity medium). The
criteria is simply length. LCVs are LCLs longer than 150 aa. LCMs are
shorter.
|
linkers 1
|
8 k, tab delim.
|
data,
head
|
id_, start_I, stop_n
|
Linker regions between two other defined segments,
which are less in length than 50
This is done at phase _1.
generate_linkers() running on MG/seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv with tag _1
|
linkers 2
|
2 k, tab delim.
|
data,
head
|
id_, start_I, stop_n
|
Linker regions between two other defined segments,
which are less in length than 50
This is done at phase _2.
generate_linkers() running on MG/seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm with tag _2
|
low complexity long
|
7 k, tab delim.
|
data,
head
|
id_, start_I, stop_n, cplxity_f
|
Low complexity regions generated with the
following seg command: seg/seg tmp.fa 45 3.4 3.75 -l
|
low complexity short
|
13 k, tab delim.
|
data,
head
|
id_, start_I, stop_n, cplxity_f
|
Low complexity regions generated with the
following seg command: seg/seg tmp.fa 25 3.0 3.3 -l
|
minscop occurrence
|
10 k, tab delim.
|
data,
head
|
did_, count
|
Number of times each minscop domain id (did) occurs in genome MG
This table should be sorted into a standard order and contain 990 entries.
|
minscop soluble matches
|
18 k, tab delim.
|
data,
head
|
gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f
|
These are the good matches to an e-value cutoff of .01
for just the soluble proteins, scop classes 1-5,7
This is with a year cutoff of 97
good_scop_matches_to_mask_w_yr() running on genome MG...
...with year cutoff of 97 and table genome_v_minscop
|
minscop soluble matches overlap
|
1 k, tab delim.
|
data,
head
|
gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f
|
These are the good matches to an e-value cutoff of .01
for just the soluble proteins, scop classes 1-5,7
This table is the matches from
minscop_soluble_matches that hit the same sequence on the genome.
That is, it contains duplicate matches that should not be used.
|
null mask
|
1 k, tab delim.
|
data,
head
|
|
|
pdb40d135 soluble matches
|
31 k, tab delim.
|
data,
head
|
gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f
|
These are the good matches to an e-value cutoff of .01
for just the soluble proteins, scop classes 1-5,7
|
seq
|
193 k, fasta
|
data,
head
|
|
|
seq MBY cdo
|
173 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq
with the mask characterized_domains
|
seq MBY cdo COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq with
the mask characterized_domains to generate the masked fasta file seq_MBY_cdo.
|
seq MBY cdo STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq with
the mask characterized_domains to generate the masked fasta file seq_MBY_cdo.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY lcs
|
4 k, Bad!
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq
with the mask low_complexity_short
|
seq MBY lcs COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq with
the mask low_complexity_short to generate the masked fasta file seq_MBY_lcs.
|
seq MBY lcs STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq with
the mask low_complexity_short to generate the masked fasta file seq_MBY_lcs.
|
seq MBY lnk
|
173 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq
with the mask linkers
|
seq MBY lnk COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq with
the mask linkers to generate the masked fasta file seq_MBY_lnk.
|
seq MBY lnk STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq with
the mask linkers to generate the masked fasta file seq_MBY_lnk.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY nul
|
173 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq
with the mask full_len_segs
|
seq MBY nul COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq with
the mask full_len_segs to generate the masked fasta file seq_MBY_nul.
|
seq MBY nul STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq with
the mask full_len_segs to generate the masked fasta file seq_MBY_nul.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY pdb
|
173 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq
with the mask minscop_soluble_matches
|
seq MBY pdb COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq with
the mask minscop_soluble_matches to generate the masked fasta file seq_MBY_pdb.
|
seq MBY pdb MBY tmb
|
173 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq_MBY_pdb
with the mask tm_segs_best
|
seq MBY pdb MBY tmb COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq_MBY_pdb with
the mask tm_segs_best to generate the masked fasta file seq_MBY_pdb_MBY_tmb.
|
seq MBY pdb MBY tmb MBY sig
|
173 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq_MBY_pdb_MBY_tmb
with the mask signal_segs
|
seq MBY pdb MBY tmb MBY sig COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq_MBY_pdb_MBY_tmb with
the mask signal_segs to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig.
|
seq MBY pdb MBY tmb MBY sig MBY lcl
|
173 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq_MBY_pdb_MBY_tmb_MBY_sig
with the mask low_complexity_long
|
seq MBY pdb MBY tmb MBY sig MBY lcl COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig with
the mask low_complexity_long to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcl.
|
seq MBY pdb MBY tmb MBY sig MBY lcl MBY tms
|
173 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcl
with the mask tm_segs
|
seq MBY pdb MBY tmb MBY sig MBY lcl MBY tms COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcl with
the mask tm_segs to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcl_MBY_tms.
|
seq MBY pdb MBY tmb MBY sig MBY lcl MBY tms MBY lnk
|
173 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcl_MBY_tms
with the mask linkers
|
seq MBY pdb MBY tmb MBY sig MBY lcl MBY tms MBY lnk COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcl_MBY_tms with
the mask linkers to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcl_MBY_tms_MBY_lnk.
|
seq MBY pdb MBY tmb MBY sig MBY lcl MBY tms MBY lnk STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcl_MBY_tms with
the mask linkers to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcl_MBY_tms_MBY_lnk.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY pdb MBY tmb MBY sig MBY lcl MBY tms STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcl with
the mask tm_segs to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcl_MBY_tms.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY pdb MBY tmb MBY sig MBY lcl STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig with
the mask low_complexity_long to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcl.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY pdb MBY tmb MBY sig MBY lcv
|
173 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq_MBY_pdb_MBY_tmb_MBY_sig
with the mask lcv_segs
|
seq MBY pdb MBY tmb MBY sig MBY lcv COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig with
the mask lcv_segs to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv.
|
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk
|
173 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv
with the mask linkers_1
|
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv with
the mask linkers_1 to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk.
|
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk MBY tms
|
173 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk
with the mask tm_segs
|
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk MBY tms COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk with
the mask tm_segs to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms.
|
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk MBY tms MBY lcm
|
173 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms
with the mask lcm_segs
|
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk MBY tms MBY lcm COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms with
the mask lcm_segs to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm.
|
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk MBY tms MBY lcm MBY ln2
|
173 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm
with the mask linkers_2
|
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk MBY tms MBY lcm MBY ln2 COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm with
the mask linkers_2 to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm_MBY_ln2.
|
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk MBY tms MBY lcm MBY ln2 MBY fun
|
173 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm_MBY_ln2
with the mask unchar_domains_2_have_func
|
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk MBY tms MBY lcm MBY ln2 MBY fun COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm_MBY_ln2 with
the mask unchar_domains_2_have_func to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm_MBY_ln2_MBY_fun.
|
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk MBY tms MBY lcm MBY ln2 MBY fun MBY nof
|
173 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm_MBY_ln2_MBY_fun
with the mask unchar_domains_2_no_func
|
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk MBY tms MBY lcm MBY ln2 MBY fun MBY nof COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm_MBY_ln2_MBY_fun with
the mask unchar_domains_2_no_func to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm_MBY_ln2_MBY_fun_MBY_nof.
|
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk MBY tms MBY lcm MBY ln2 MBY fun MBY nof STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm_MBY_ln2_MBY_fun with
the mask unchar_domains_2_no_func to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm_MBY_ln2_MBY_fun_MBY_nof.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk MBY tms MBY lcm MBY ln2 MBY fun STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm_MBY_ln2 with
the mask unchar_domains_2_have_func to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm_MBY_ln2_MBY_fun.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk MBY tms MBY lcm MBY ln2 STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm with
the mask linkers_2 to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm_MBY_ln2.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk MBY tms MBY lcm STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms with
the mask lcm_segs to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk MBY tms STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk with
the mask tm_segs to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv with
the mask linkers_1 to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY pdb MBY tmb MBY sig MBY lcv STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig with
the mask lcv_segs to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY pdb MBY tmb MBY sig STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq_MBY_pdb_MBY_tmb with
the mask signal_segs to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY pdb MBY tmb STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq_MBY_pdb with
the mask tm_segs_best to generate the masked fasta file seq_MBY_pdb_MBY_tmb.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY pdb MBY tms
|
1 k, Bad!
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq_MBY_pdb
with the mask tm_segs
|
seq MBY pdb MBY tms COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq_MBY_pdb with
the mask tm_segs to generate the masked fasta file seq_MBY_pdb_MBY_tms.
|
seq MBY pdb MBY tms STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq_MBY_pdb with
the mask tm_segs to generate the masked fasta file seq_MBY_pdb_MBY_tms.
|
seq MBY pdb STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq with
the mask minscop_soluble_matches to generate the masked fasta file seq_MBY_pdb.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY pdb fasta COMP
|
1 k, tab delim.
|
data,
head
|
|
|
seq MBY pdb fasta STAT
|
1 k, tab delim.
|
data,
head
|
|
|
seq MBY pdb psib1way COMP
|
1 k, tab delim.
|
data,
head
|
|
|
seq MBY pdb psib1way STAT
|
1 k, tab delim.
|
data,
head
|
|
|
seq MBY pdb sat9811 COMP
|
1 k, tab delim.
|
data,
head
|
|
|
seq MBY pdb sat9811 STAT
|
1 k, tab delim.
|
data,
head
|
|
|
seq MBY uc2
|
173 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq
with the mask unchar_domains_2
|
seq MBY uc2 COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq with
the mask unchar_domains_2 to generate the masked fasta file seq_MBY_uc2.
|
seq MBY uc2 STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq with
the mask unchar_domains_2 to generate the masked fasta file seq_MBY_uc2.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq MBY ucd
|
173 k, fasta
|
data,
head
|
gid_, masked_seq
|
This fasta file is the result of masking seq
with the mask unchar_domains_1
|
seq MBY ucd COMP
|
1 k, tab delim.
|
data,
head
|
aa_, count_n
|
This is the aa composition of the
masked file from masking the fasta file seq with
the mask unchar_domains_1 to generate the masked fasta file seq_MBY_ucd.
|
seq MBY ucd STAT
|
1 k, tab delim.
|
data,
head
|
stat_, value
|
This are the statistics from masking the fasta file seq with
the mask unchar_domains_1 to generate the masked fasta file seq_MBY_ucd.
MASKED_CHARS = number of characters masked with the application of this mask.
Masked_Seqs = number of sequences masked with the application of this mask.
Masking_Segs = number of segments used in the application of the mask
|
seq lengths
|
5 k, tab delim.
|
data,
head
|
gid_, length_n
|
Length of each sequence in genome.
|
tigr annote 9812 9710 diffsonly
|
5 k, tab delim.
|
data,
head
|
|
|
tigr annote 9812 9710 merge
|
51 k, tab delim.
|
data,
head
|
|
|
tigr seq dec98
|
204 k, fasta
|
data,
head
|
|
|
tigr seq dec98 lengths
|
5 k, tab delim.
|
data,
head
|
gid_, length_n
|
Length of each sequence in genome.
my txdb=HASH(0x80c0704) = txdb->new (name=>tigr_seq_dec98,io=>INPUT_SLURP_FASTA,ext=>fa);
|
tm scores
|
14 k, tab delim.
|
data,
head
|
id_, sumscr, sig, minhall, ntmproc, totaa, avg_en, minhseg
|
This table contains scores determining whether to what
degree the sequences is an integral membrane protein.
sig = does it have a signal sequence?
sumscor = overall evaluation score (see below)
minhall = min hydrophobicity value for 20 res. window moved over whole prot.
totaa = total number of aa under -1 threshold
ntmproc = tot num of TM helices after processing
avg_en = average hydrophobicity of all the TM segments (per residue)
minhseg = min hydrophobicity of a TM segments (per residue)
#
# These parameters were refined on MG
# see genomes/mg-analyze-maxh-981127.xls
#
my = (minhall<-2 ? 4 :
(tot_aa > 50 ? 3 :
( minhall <-1.75 ? 2 :
( tot_aa > 20 ? 1 : 0))));
|
tm segs best
|
10 k, tab delim.
|
data,
head
|
id_, start_I, stop_n, sumscor, energy_f
|
This is the segments from TABLE tm_segs
that have a sumscor = 4.
|
tm segs filtered
|
5 k, tab delim.
|
data,
head
|
id_, start_I, stop_n, energy_f
|
Transmembrane segment definitions after removing pdb matches and (most
importantly) low-complexity regions. The tm_segs table is just
the raw data.
This is based on looking at the masked the file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk for the TM
segments (annotated with a 3).
|
unchar domains 1
|
5 k, tab delim.
|
data,
head
|
id_, start_I, stop_n
|
Linker regions between two other defined segments,
which are greater in length than 50
That is, these are uncharacterized protein domains.
This is done at phase _1.
generate_linkers() running on MG/seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv with tag _1
|
unchar domains 2
|
5 k, tab delim.
|
data,
head
|
id_, start_I, stop_n
|
Linker regions between two other defined segments,
which are greater in length than 50
That is, these are uncharacterized protein domains.
This is done at phase _2.
generate_linkers() running on MG/seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm with tag _2
|
unchar domains 2 annotated
|
45 k, tab delim.
|
data,
head
|
gid_, start_, stop, score, status, amtanot, len9812, len9710, lendif, homolog, tigr9812_annotation
|
This table is derived from the UCDs in unchar_domains_2
This table merges the TIGR annotation with the UCD regions.
gid_ = TIGR Genome Identifier
status = 0 for same in both ORF files, 1 or 2 for dotted in 9812 file and missing in 9710 file, -1 for missing in 9812 file but in 9710 file
amtanot = Level of annotation (0 for hypothetical protein, 1 for putative, and 2 if there seems to be clear assignment)
len9812 = Length of ORF in 9812 file ( '_' if missing)
len9710 = Length of ORF in 9710 file ( '_' if missing)
lendif = Absolute difference in lengths (9999 if an ORF is not present)
homolog = homolog in 9812 ORF file annotations (MP = M. pneumoniae, MG = M. genitalium, EC = E. coli, &c)
tigr9812_annotation = Anotation from 9812 ORF file less homologs
score looks this: A-BB-CC
A = X if bad, F if functionally annotated, U if hypothetical or putative annotation
BB = FL if uncharacterized region spans the whole ORF
CC = MG if uncharacterized region has a paralog in MG
A sample score is:
U-FL-== (completely uncharacterized full length UCD without paralogs in MG)
|
unchar domains 2 have func
|
5 k, tab delim.
|
data,
head
|
gid_, start_, stop, score
|
This table is the part of unchar_domains_2_annotated that corresponds
to uncharacterized regions that have a well-characterized function.
gid_ = TIGR Genome Identifier
score looks this: A-BB-CC
A = X if bad, F if functionally annotated, U if hypothetical or putative annotation
BB = FL if uncharacterized region spans the whole ORF
CC = MG if uncharacterized region has a paralog in MG
A sample score is:
U-FL-== (completely uncharacterized full length UCD without paralogs in MG)
|
unchar domains 2 no func
|
4 k, tab delim.
|
data,
head
|
gid_, start_, stop, score
|
This table is the part of unchar_domains_2_annotated that corresponds
to uncharacterized regions that DO NOT have a well-characterized function.
gid_ = TIGR Genome Identifier
score looks this: A-BB-CC
A = X if bad, F if functionally annotated, U if hypothetical or putative annotation
BB = FL if uncharacterized region spans the whole ORF
CC = MG if uncharacterized region has a paralog in MG
A sample score is:
U-FL-== (completely uncharacterized full length UCD without paralogs in MG)
|
when all struc report
|
9 k, tab delim.
|
data,
head
|
year, stat, ucd-MG, pdb-MG, pdb_MBY_tmb-MG, pdb_MBY_tmb_MBY_sig-MG, pdb_MBY_tmb_MBY_sig_MBY_lcl-MG, pdb_MBY_tmb_MBY_sig_MBY_lcl_MBY_tms-MG, pdb_MBY_tmb_MBY_sig_MBY_lcl_MBY_tms_MBY_lnk-MG, value
|
Report on what fraction of the genome remains
uncharacterized structurally. Based
on the following genomes
MG
and the following selections
ucd pdb pdb_MBY_tmb pdb_MBY_tmb_MBY_sig pdb_MBY_tmb_MBY_sig_MBY_lcl pdb_MBY_tmb_MBY_sig_MBY_lcl_MBY_tms pdb_MBY_tmb_MBY_sig_MBY_lcl_MBY_tms_MBY_lnk
(Modified on 981128 to only accomodate MG.)
|