For Genome `MG`, Tables with Specific Analysis

Table Name	Size (kb), Format	Links	Fields (keys bold)	Description
tm segs	12 k, tab delim.	data, head	id_, start_I, stop_n, sumscor, energy_f	Transmembrane segments. (version 2, revised 971113) (version 3, revised 981127, now sumscor based on calc_istm_score) sumscor gives a confidence value in the TM helix based on an analysis of the TM helices in the WHOLE protein. # # These parameters were refined on MG # see genomes/mg-analyze-maxh-981127.xls # my = (minhall<-2 ? 4 : (tot_aa > 50 ? 3 : ( minhall <-1.75 ? 2 : ( tot_aa > 20 ? 1 : 0))));
signal segs	2 k, tab delim.	data, head	id_, start_I, stop_n	Signal sequences.
sat mg strucs	12 k, tab delim.	data, head		Sara look at this! Reformated version of SAT's http://www.mrc-lmb.cam.ac.uk/genomes/MG_strucs.html by MBG. Here is the readme for the original. "_" or "?" was used for unidentifiable fields. Explanation of format: (MG sequence number)-(sequence length) (MG sequence region) (scop sequence name)-(sequence length) (sequence region) (expectation value with which found) If there is an M at the end of the line, the relationship was only found starting the search from the MG sequence. If the information 'via sequence name' is given, the relationship is not found with PSI-BLAST, but only via another sequence in the GEANFAMMER sequence family of that sequence. If there is a * at the end of the line, the expectation value is below that considered significant, but the match is accepted for other reasons. MG001-267 149-266 pdb_d2pola3-122 4-119 8e-25 MG002-310 3-64 pdb_d1xbl__-75 4-68 5e-16 MG003-650 412-640 pdb_d1bgw__-680 1-243 4e-79 MG003-650 96-215 pdb_d1ah6__-213 89-212 8e-20 MG004-836 24-498 pdb_d1bgw__-680 206-677 1e-128 MG005-417 5-110 pdb_d1seta1-110 4-110 4e-25 MG005-417 114-413 pdb_d1seta2-311 3-303 1e-109 MG006-210 1-197 pdb_3adk__-194 5-186 8e-34 M . . .
minscop soluble matches no overlap	9 k, tab delim.	data, head	gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f	These are the good matches to an e-value cutoff of .01 for just the soluble proteins, scop classes 1-5,7 This table is the result of filtering out the matches from minscop_soluble_matches that hit the same sequence on the genome.
id ntm	5 k, tab delim.	data, head	id_, signalp, ntm_n	This table contains the number of transmembrane segments for each ORF. Its definition of TM-segment is after filtering. It also has signal sequence data, based on simple criteria.
genome v minscop	14 k, tab delim.	data, head	did_, gid_, TargStart_I, TargStop_n, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f, comment	This is custom made up file based on the values in the table sat_mg_strucs_nov98.txt. It was constructed by MBG on 981127. It has all the original fields, plus a comment. The original matches are from the web page at the MRC LMB. The beginning of this page is reproduced below. -- SCOP Domain Sequences in the MG Genome (Additional information for "Structural assignments to the Mycoplasma genitalium proteins show extensive gene duplications and domain rearrangements" (1998) by Sarah A. Teichmann, Jong Park and Cyrus Chothia, Proc. Natl. Acad. Sci. USA, 95, 14658-14663) Explanation of format: (MG sequence number)-(sequence length) (MG sequence region) (scop sequence name)-(sequence length) (sequence region) (expectation value with which found) If there is an M at the end of the line, the relationship was only found starting the search from the MG sequence. If the information 'via sequence name' is given, the relationship is not found with PSI-BLAST, but only via another sequence in the GEANFAMMER sequence family of that sequence. If there is a * at the end of the line, the expectation value is below that considered significant, but the match is accepted for other reasons. MG001-267 149-266 pdb_d2pola3-122 4-119 8e-25 MG002-310 3-64 pdb_d1xbl__-75 4-68 5e-16 MG002-310 122-211 pdb_d1tbd__-134 7-91 5e-7 MG003-650 412-640 pdb_d1bgw__-680 1-243 4e-79 MG003-650 229-410 pdb_ds043_1-172 1-172 4e-54 MG003-650 96-215 pdb_d1ah6__-213 89-212 8e-20 MG004-836 24-498 pdb_d1bgw__-680 206-677 1e-128 MG005-417 5-110 pdb_d1seta1-110 4-110 4e-25 MG005-417 114-413 pdb_d1seta2-311 3-303 1e-109
fold occurrence	4 k, tab delim.	data, head	fold_, count	Number of times each fold (represented by two scop fid numbers) occurs in genome MG This table should be sorted into a standard order.
all masks	89 k, tab delim.	data, head	gid_, start_I, stop_n, tool_, score	This file concatenates the results of creating all the masks for genome MG.
aafreq histo	1 k, tab delim.	data, head	aa_, freq_n	Histogram of frequency of the various amino acids
alla segs	2 k, tab delim.	data, head	id_, start_I, stop_n	all-a segments
allb segs	2 k, tab delim.	data, head	id_, start_I, stop_n	all-b segments
characterized domains 1	8 k, tab delim.	data, head	id_, start_I, stop_n	Already characterized domains (the borders between linker regions). This is done at phase _1. generate_linkers() running on MG/seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv with tag _1
characterized domains 2	7 k, tab delim.	data, head	id_, start_I, stop_n	Already characterized domains (the borders between linker regions). This is done at phase _2. generate_linkers() running on MG/seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm with tag _2
comp report	13 k, tab delim.	data, head	selection, genome, sum, total_seqs, masked_seqs, total_chars, masked_chars, total_segs, masking_segs, mask_chars_per_seg, mask_chars_per_seq, frac_masked_chars, frac_masked_seqs, masking_segs_per_seq, dav_rms, dps_rms, dav_A, dav_C, dav_D, dav_E, dav_F, dav_G, dav_H, dav_I, dav_K, dav_L, dav_M, dav_N, dav_P, dav_Q, dav_R, dav_S, dav_T, dav_V, dav_W, dav_Y, dps_A, dps_C, dps_D, dps_E, dps_F, dps_G, dps_H, dps_I, dps_K, dps_L, dps_M, dps_N, dps_P, dps_Q, dps_R, dps_S, dps_T, dps_V, dps_W, dps_Y, pct_A, pct_C, pct_D, pct_E, pct_F, pct_G, pct_H, pct_I, pct_K, pct_L, pct_M, pct_N, pct_P, pct_Q, pct_R, pct_S, pct_T, pct_V, pct_W, pct_Y, A_n, C_n, D_n, E_n, F_n, G_n, H_n, I_n, K_n, L_n, M_n, N_n, P_n, Q_n, R_n, S_n, T_n, V_n, W_n, Y_n, 0_n, 1_n, 2_n, 3_n, 4_n, 5_n, 6_n, 7_n, 8_n, 9_n, selections_long, sort	Report on the compositions in the genomes MG based on the following a.a. selections seq pdb tmb sig lcv lnk tms lcm ln2 fun nof ucd uc2 fasta__ sat9811 psib1wy (version 3)
full len segs	6 k, tab delim.	data, head	id_, start_I, stop_n	Full length segments.
genome v minscop fasta	7 k, tab delim.	data, head	gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f	These are the good matches to an e-value cutoff of .01 for just the soluble proteins, scop classes 1-5,7 These are the good matches generated by FASTA against scop 1.35. These are no longer being used in the analysis but are here for comparative purposes.
genome v minscop psib1way	10 k, tab delim.	data, head
genome v minscop sat9811	14 k, tab delim.	data, head	did_, gid_, TargStart_I, TargStop_n, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f, comment	This is custom made up file based on the values in the table sat_mg_strucs_nov98.txt. It was constructed by MBG on 981127. It has all the original fields, plus a comment. The original matches are from the web page at the MRC LMB. The beginning of this page is reproduced below. 1998.12.14: d1dts__ MG080 707 847 5 173 2.00E-07 _ _ (mbg fixed) -- SCOP Domain Sequences in the MG Genome (Additional information for "Structural assignments to the Mycoplasma genitalium proteins show extensive gene duplications and domain rearrangements" (1998) by Sarah A. Teichmann, Jong Park and Cyrus Chothia, Proc. Natl. Acad. Sci. USA, 95, 14658-14663) Explanation of format: (MG sequence number)-(sequence length) (MG sequence region) (scop sequence name)-(sequence length) (sequence region) (expectation value with which found) If there is an M at the end of the line, the relationship was only found starting the search from the MG sequence. If the information 'via sequence name' is given, the relationship is not found with PSI-BLAST, but only via another sequence in the GEANFAMMER sequence family of that sequence. If there is a * at the end of the line, the expectation value is below that considered significant, but the match is accepted for other reasons. MG001-267 149-266 pdb_d2pola3-122 4-119 8e-25 MG002-310 3-64 pdb_d1xbl__-75 4-68 5e-16 MG002-310 122-211 pdb_d1tbd__-134 7-91 5e-7 MG003-650 412-640 pdb_d1bgw__-680 1-243 4e-79 MG003-650 229-410 pdb_ds043_1-172 1-172 4e-54 MG003-650 96-215 pdb_d1ah6__-213 89-212 8e-20 MG004-836 24-498 pdb_d1bgw__-680 206-677 1e-128 MG005-417 5-110 pdb_d1seta1-110 4-110 4e-25 MG005-417 114-413 pdb_d1seta2-311 3-303 1e-109
genome v minscop sep98	12 k, tab delim.	data, head	did_, gid_, TargStart_I, TargStop_n, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f, comment	This is custom made up file based on the values in the table sat_mg_strucs.txt. This was done by MBG on 980907. It has all the original fields, plus a comment.
gorss	173 k, fasta	data, head	gid_, gorss	This fasta file is the result of running GOR sec. struc. prediction on the genome MG
gorss MBY nul	173 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking gorss with the mask full_len_segs
gorss MBY nul COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file gorss with the mask full_len_segs to generate the masked fasta file gorss_MBY_nul.
gorss MBY nul STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file gorss with the mask full_len_segs to generate the masked fasta file gorss_MBY_nul. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
gorss MBY ucd	173 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking gorss with the mask unchar_domains
gorss MBY ucd COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file gorss with the mask unchar_domains to generate the masked fasta file gorss_MBY_ucd.
gorss MBY ucd STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file gorss with the mask unchar_domains to generate the masked fasta file gorss_MBY_ucd. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
hlx aa pair freq	20 k, tab delim.	data, head	aa_, offset_, count	results of all counts of pairs
id ntm nofilt	5 k, tab delim.	data, head	id_, signalp, ntm_n	This table contains data on whether there is a signal sequence and the number of transmembrane segments. (version 2, revised 971113). (Renamed table on 980101: id_ntm --> id_ntm_nofilt)
lcm segs	6 k, tab delim.	data, head	gid_, start_, stop, score	This routine splits the low_complexity_long (LCL) regions into the LCV (low-complexity very-long) and LCM (low-complexity medium). The criteria is simply length. LCVs are LCLs longer than 150 aa. LCMs are shorter.
lcm segs.txt	5 k, tab delim.	data, head	gid_, start_, stop, score	This routine splits the low_complexity_long (lcl) regions into the lcv (low-complexity very long) and lcm (low-complexity medium). The criteria is simply length. LCVs are LCLs longer than 150 aa. LCMs are shorter.
lcv segs	2 k, tab delim.	data, head	gid_, start_, stop, score	This routine splits the low_complexity_long (LCL) regions into the LCV (low-complexity very-long) and LCM (low-complexity medium). The criteria is simply length. LCVs are LCLs longer than 150 aa. LCMs are shorter.
lcv segs.txt	1 k, tab delim.	data, head	gid_, start_, stop, score	This routine splits the low_complexity_long (lcl) regions into the lcv (low-complexity very long) and lcm (low-complexity medium). The criteria is simply length. LCVs are LCLs longer than 150 aa. LCMs are shorter.
linkers 1	8 k, tab delim.	data, head	id_, start_I, stop_n	Linker regions between two other defined segments, which are less in length than 50 This is done at phase _1. generate_linkers() running on MG/seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv with tag _1
linkers 2	2 k, tab delim.	data, head	id_, start_I, stop_n	Linker regions between two other defined segments, which are less in length than 50 This is done at phase _2. generate_linkers() running on MG/seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm with tag _2
low complexity long	7 k, tab delim.	data, head	id_, start_I, stop_n, cplxity_f	Low complexity regions generated with the following seg command: seg/seg tmp.fa 45 3.4 3.75 -l
low complexity short	13 k, tab delim.	data, head	id_, start_I, stop_n, cplxity_f	Low complexity regions generated with the following seg command: seg/seg tmp.fa 25 3.0 3.3 -l
minscop occurrence	10 k, tab delim.	data, head	did_, count	Number of times each minscop domain id (did) occurs in genome MG This table should be sorted into a standard order and contain 990 entries.
minscop soluble matches	18 k, tab delim.	data, head	gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f	These are the good matches to an e-value cutoff of .01 for just the soluble proteins, scop classes 1-5,7 This is with a year cutoff of 97 good_scop_matches_to_mask_w_yr() running on genome MG... ...with year cutoff of 97 and table genome_v_minscop
minscop soluble matches overlap	1 k, tab delim.	data, head	gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f	These are the good matches to an e-value cutoff of .01 for just the soluble proteins, scop classes 1-5,7 This table is the matches from minscop_soluble_matches that hit the same sequence on the genome. That is, it contains duplicate matches that should not be used.
null mask	1 k, tab delim.	data, head
pdb40d135 soluble matches	31 k, tab delim.	data, head	gid_, TargStart_I, TargStop_n, did, fids, QryStart_n, QryStop_n, ev_f, swsc_n, swid_f	These are the good matches to an e-value cutoff of .01 for just the soluble proteins, scop classes 1-5,7
seq	193 k, fasta	data, head
seq MBY cdo	173 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking seq with the mask characterized_domains
seq MBY cdo COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file seq with the mask characterized_domains to generate the masked fasta file seq_MBY_cdo.
seq MBY cdo STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file seq with the mask characterized_domains to generate the masked fasta file seq_MBY_cdo. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
seq MBY lcs	4 k, Bad!	data, head	gid_, masked_seq	This fasta file is the result of masking seq with the mask low_complexity_short
seq MBY lcs COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file seq with the mask low_complexity_short to generate the masked fasta file seq_MBY_lcs.
seq MBY lcs STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file seq with the mask low_complexity_short to generate the masked fasta file seq_MBY_lcs.
seq MBY lnk	173 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking seq with the mask linkers
seq MBY lnk COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file seq with the mask linkers to generate the masked fasta file seq_MBY_lnk.
seq MBY lnk STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file seq with the mask linkers to generate the masked fasta file seq_MBY_lnk. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
seq MBY nul	173 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking seq with the mask full_len_segs
seq MBY nul COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file seq with the mask full_len_segs to generate the masked fasta file seq_MBY_nul.
seq MBY nul STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file seq with the mask full_len_segs to generate the masked fasta file seq_MBY_nul. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
seq MBY pdb	173 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking seq with the mask minscop_soluble_matches
seq MBY pdb COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file seq with the mask minscop_soluble_matches to generate the masked fasta file seq_MBY_pdb.
seq MBY pdb MBY tmb	173 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking seq_MBY_pdb with the mask tm_segs_best
seq MBY pdb MBY tmb COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file seq_MBY_pdb with the mask tm_segs_best to generate the masked fasta file seq_MBY_pdb_MBY_tmb.
seq MBY pdb MBY tmb MBY sig	173 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking seq_MBY_pdb_MBY_tmb with the mask signal_segs
seq MBY pdb MBY tmb MBY sig COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file seq_MBY_pdb_MBY_tmb with the mask signal_segs to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig.
seq MBY pdb MBY tmb MBY sig MBY lcl	173 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking seq_MBY_pdb_MBY_tmb_MBY_sig with the mask low_complexity_long
seq MBY pdb MBY tmb MBY sig MBY lcl COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig with the mask low_complexity_long to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcl.
seq MBY pdb MBY tmb MBY sig MBY lcl MBY tms	173 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcl with the mask tm_segs
seq MBY pdb MBY tmb MBY sig MBY lcl MBY tms COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcl with the mask tm_segs to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcl_MBY_tms.
seq MBY pdb MBY tmb MBY sig MBY lcl MBY tms MBY lnk	173 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcl_MBY_tms with the mask linkers
seq MBY pdb MBY tmb MBY sig MBY lcl MBY tms MBY lnk COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcl_MBY_tms with the mask linkers to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcl_MBY_tms_MBY_lnk.
seq MBY pdb MBY tmb MBY sig MBY lcl MBY tms MBY lnk STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcl_MBY_tms with the mask linkers to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcl_MBY_tms_MBY_lnk. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
seq MBY pdb MBY tmb MBY sig MBY lcl MBY tms STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcl with the mask tm_segs to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcl_MBY_tms. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
seq MBY pdb MBY tmb MBY sig MBY lcl STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig with the mask low_complexity_long to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcl. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
seq MBY pdb MBY tmb MBY sig MBY lcv	173 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking seq_MBY_pdb_MBY_tmb_MBY_sig with the mask lcv_segs
seq MBY pdb MBY tmb MBY sig MBY lcv COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig with the mask lcv_segs to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv.
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk	173 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv with the mask linkers_1
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv with the mask linkers_1 to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk.
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk MBY tms	173 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk with the mask tm_segs
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk MBY tms COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk with the mask tm_segs to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms.
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk MBY tms MBY lcm	173 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms with the mask lcm_segs
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk MBY tms MBY lcm COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms with the mask lcm_segs to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm.
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk MBY tms MBY lcm MBY ln2	173 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm with the mask linkers_2
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk MBY tms MBY lcm MBY ln2 COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm with the mask linkers_2 to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm_MBY_ln2.
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk MBY tms MBY lcm MBY ln2 MBY fun	173 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm_MBY_ln2 with the mask unchar_domains_2_have_func
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk MBY tms MBY lcm MBY ln2 MBY fun COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm_MBY_ln2 with the mask unchar_domains_2_have_func to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm_MBY_ln2_MBY_fun.
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk MBY tms MBY lcm MBY ln2 MBY fun MBY nof	173 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm_MBY_ln2_MBY_fun with the mask unchar_domains_2_no_func
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk MBY tms MBY lcm MBY ln2 MBY fun MBY nof COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm_MBY_ln2_MBY_fun with the mask unchar_domains_2_no_func to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm_MBY_ln2_MBY_fun_MBY_nof.
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk MBY tms MBY lcm MBY ln2 MBY fun MBY nof STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm_MBY_ln2_MBY_fun with the mask unchar_domains_2_no_func to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm_MBY_ln2_MBY_fun_MBY_nof. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk MBY tms MBY lcm MBY ln2 MBY fun STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm_MBY_ln2 with the mask unchar_domains_2_have_func to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm_MBY_ln2_MBY_fun. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk MBY tms MBY lcm MBY ln2 STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm with the mask linkers_2 to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm_MBY_ln2. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk MBY tms MBY lcm STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms with the mask lcm_segs to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk MBY tms STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk with the mask tm_segs to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
seq MBY pdb MBY tmb MBY sig MBY lcv MBY lnk STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv with the mask linkers_1 to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
seq MBY pdb MBY tmb MBY sig MBY lcv STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file seq_MBY_pdb_MBY_tmb_MBY_sig with the mask lcv_segs to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
seq MBY pdb MBY tmb MBY sig STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file seq_MBY_pdb_MBY_tmb with the mask signal_segs to generate the masked fasta file seq_MBY_pdb_MBY_tmb_MBY_sig. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
seq MBY pdb MBY tmb STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file seq_MBY_pdb with the mask tm_segs_best to generate the masked fasta file seq_MBY_pdb_MBY_tmb. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
seq MBY pdb MBY tms	1 k, Bad!	data, head	gid_, masked_seq	This fasta file is the result of masking seq_MBY_pdb with the mask tm_segs
seq MBY pdb MBY tms COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file seq_MBY_pdb with the mask tm_segs to generate the masked fasta file seq_MBY_pdb_MBY_tms.
seq MBY pdb MBY tms STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file seq_MBY_pdb with the mask tm_segs to generate the masked fasta file seq_MBY_pdb_MBY_tms.
seq MBY pdb STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file seq with the mask minscop_soluble_matches to generate the masked fasta file seq_MBY_pdb. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
seq MBY pdb fasta COMP	1 k, tab delim.	data, head
seq MBY pdb fasta STAT	1 k, tab delim.	data, head
seq MBY pdb psib1way COMP	1 k, tab delim.	data, head
seq MBY pdb psib1way STAT	1 k, tab delim.	data, head
seq MBY pdb sat9811 COMP	1 k, tab delim.	data, head
seq MBY pdb sat9811 STAT	1 k, tab delim.	data, head
seq MBY uc2	173 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking seq with the mask unchar_domains_2
seq MBY uc2 COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file seq with the mask unchar_domains_2 to generate the masked fasta file seq_MBY_uc2.
seq MBY uc2 STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file seq with the mask unchar_domains_2 to generate the masked fasta file seq_MBY_uc2. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
seq MBY ucd	173 k, fasta	data, head	gid_, masked_seq	This fasta file is the result of masking seq with the mask unchar_domains_1
seq MBY ucd COMP	1 k, tab delim.	data, head	aa_, count_n	This is the aa composition of the masked file from masking the fasta file seq with the mask unchar_domains_1 to generate the masked fasta file seq_MBY_ucd.
seq MBY ucd STAT	1 k, tab delim.	data, head	stat_, value	This are the statistics from masking the fasta file seq with the mask unchar_domains_1 to generate the masked fasta file seq_MBY_ucd. MASKED_CHARS = number of characters masked with the application of this mask. Masked_Seqs = number of sequences masked with the application of this mask. Masking_Segs = number of segments used in the application of the mask
seq lengths	5 k, tab delim.	data, head	gid_, length_n	Length of each sequence in genome.
tigr annote 9812 9710 diffsonly	5 k, tab delim.	data, head
tigr annote 9812 9710 merge	51 k, tab delim.	data, head
tigr seq dec98	204 k, fasta	data, head
tigr seq dec98 lengths	5 k, tab delim.	data, head	gid_, length_n	Length of each sequence in genome. my txdb=HASH(0x80c0704) = txdb->new (name=>tigr_seq_dec98,io=>INPUT_SLURP_FASTA,ext=>fa);
tm scores	14 k, tab delim.	data, head	id_, sumscr, sig, minhall, ntmproc, totaa, avg_en, minhseg	This table contains scores determining whether to what degree the sequences is an integral membrane protein. sig = does it have a signal sequence? sumscor = overall evaluation score (see below) minhall = min hydrophobicity value for 20 res. window moved over whole prot. totaa = total number of aa under -1 threshold ntmproc = tot num of TM helices after processing avg_en = average hydrophobicity of all the TM segments (per residue) minhseg = min hydrophobicity of a TM segments (per residue) # # These parameters were refined on MG # see genomes/mg-analyze-maxh-981127.xls # my = (minhall<-2 ? 4 : (tot_aa > 50 ? 3 : ( minhall <-1.75 ? 2 : ( tot_aa > 20 ? 1 : 0))));
tm segs best	10 k, tab delim.	data, head	id_, start_I, stop_n, sumscor, energy_f	This is the segments from TABLE tm_segs that have a sumscor = 4.
tm segs filtered	5 k, tab delim.	data, head	id_, start_I, stop_n, energy_f	Transmembrane segment definitions after removing pdb matches and (most importantly) low-complexity regions. The tm_segs table is just the raw data. This is based on looking at the masked the file seq_MBY_pdb_MBY_lcl_MBY_tms_MBY_lnk for the TM segments (annotated with a 3).
unchar domains 1	5 k, tab delim.	data, head	id_, start_I, stop_n	Linker regions between two other defined segments, which are greater in length than 50 That is, these are uncharacterized protein domains. This is done at phase _1. generate_linkers() running on MG/seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv with tag _1
unchar domains 2	5 k, tab delim.	data, head	id_, start_I, stop_n	Linker regions between two other defined segments, which are greater in length than 50 That is, these are uncharacterized protein domains. This is done at phase _2. generate_linkers() running on MG/seq_MBY_pdb_MBY_tmb_MBY_sig_MBY_lcv_MBY_lnk_MBY_tms_MBY_lcm with tag _2
unchar domains 2 annotated	45 k, tab delim.	data, head	gid_, start_, stop, score, status, amtanot, len9812, len9710, lendif, homolog, tigr9812_annotation	This table is derived from the UCDs in unchar_domains_2 This table merges the TIGR annotation with the UCD regions. gid_ = TIGR Genome Identifier status = 0 for same in both ORF files, 1 or 2 for dotted in 9812 file and missing in 9710 file, -1 for missing in 9812 file but in 9710 file amtanot = Level of annotation (0 for hypothetical protein, 1 for putative, and 2 if there seems to be clear assignment) len9812 = Length of ORF in 9812 file ( '_' if missing) len9710 = Length of ORF in 9710 file ( '_' if missing) lendif = Absolute difference in lengths (9999 if an ORF is not present) homolog = homolog in 9812 ORF file annotations (MP = M. pneumoniae, MG = M. genitalium, EC = E. coli, &c) tigr9812_annotation = Anotation from 9812 ORF file less homologs score looks this: A-BB-CC A = X if bad, F if functionally annotated, U if hypothetical or putative annotation BB = FL if uncharacterized region spans the whole ORF CC = MG if uncharacterized region has a paralog in MG A sample score is: U-FL-== (completely uncharacterized full length UCD without paralogs in MG)
unchar domains 2 have func	5 k, tab delim.	data, head	gid_, start_, stop, score	This table is the part of unchar_domains_2_annotated that corresponds to uncharacterized regions that have a well-characterized function. gid_ = TIGR Genome Identifier score looks this: A-BB-CC A = X if bad, F if functionally annotated, U if hypothetical or putative annotation BB = FL if uncharacterized region spans the whole ORF CC = MG if uncharacterized region has a paralog in MG A sample score is: U-FL-== (completely uncharacterized full length UCD without paralogs in MG)
unchar domains 2 no func	4 k, tab delim.	data, head	gid_, start_, stop, score	This table is the part of unchar_domains_2_annotated that corresponds to uncharacterized regions that DO NOT have a well-characterized function. gid_ = TIGR Genome Identifier score looks this: A-BB-CC A = X if bad, F if functionally annotated, U if hypothetical or putative annotation BB = FL if uncharacterized region spans the whole ORF CC = MG if uncharacterized region has a paralog in MG A sample score is: U-FL-== (completely uncharacterized full length UCD without paralogs in MG)
when all struc report	9 k, tab delim.	data, head	year, stat, ucd-MG, pdb-MG, pdb_MBY_tmb-MG, pdb_MBY_tmb_MBY_sig-MG, pdb_MBY_tmb_MBY_sig_MBY_lcl-MG, pdb_MBY_tmb_MBY_sig_MBY_lcl_MBY_tms-MG, pdb_MBY_tmb_MBY_sig_MBY_lcl_MBY_tms_MBY_lnk-MG, value	Report on what fraction of the genome remains uncharacterized structurally. Based on the following genomes MG and the following selections ucd pdb pdb_MBY_tmb pdb_MBY_tmb_MBY_sig pdb_MBY_tmb_MBY_sig_MBY_lcl pdb_MBY_tmb_MBY_sig_MBY_lcl_MBY_tms pdb_MBY_tmb_MBY_sig_MBY_lcl_MBY_tms_MBY_lnk (Modified on 981128 to only accomodate MG.)

[census home]

For Genome MG, Tables with Specific Analysis

For Genome `MG`, Tables with Specific Analysis