D Christendat, A Yee, A Dharamsi, Y Kluger, A Savchenko, J R Cort, V Booth, C D Mackereth, V Saridakis, I Ekiel, G Kozlov, K L Maxwell, N Wu, L P. McIntosh, K Gehring, M A. Kennedy, A R Davidson, E F Pai, M Gerstein, A M Edwards & C H Arrowsmith.
"Structural Proteomics of an Archeon," citation and paper
crystal.tree.pdf 30-May-2000 07:44 3k crystal.tree.ps 30-May-2000 07:44 4k Decision tree for crystallizability. The number E/T denote the proportion of the training cases reaching that node that are wrongly classified by the label. The total number of instances T at a given node is the sum of the correctly classified instances C plus the incorrectly classified instances, the error E, such that T = C + E. YES = "protein could be crystallized" NO = "protein could NOT be crystallized" At the top, we have 63 cases = 24 YES + 39 NO. At the next level on the left, we have 44 cases, 23 YES, 21 NO. At the next level on the right, we have 19 cases, 1 YES, 18 NO. At the third level, the leftmost two nodes are: 7 cases = 7 NO 37 cases = 23 YES + 14 NO(Figure 3 from the paper.) A decision tree for discriminating between soluble and insoluble proteins. The nodes of the tree are represented by ellipses (intermediate nodes) and rectangles (final nodes or leaves). The numbers on the left of each node denote the number of insoluble proteins in the node, and are proportional to the node's dark area. Similarly, the numbers on the right denote the soluble proteins and are proportional to the white area. Under each intermediate node, the decision tree algorithm calculates all possible splitting thresholds for each of 53 variables (hydrophobicity, amino acid composition, etc.). It picks the optimal splitting variable and its threshold, in order for at least one of the two daughter nodes to be as homogeneous as possible. When a variable v is split, v
Expression Tree
expression.tree.pdf 30-May-2000 07:44 13k expression.tree.ps 30-May-2000 07:44 23k
Solubility Tree
solubility.GIF 30-May-2000 17:36 62k