BIOINFORMATICSDatabases
Contents: Databases
Relational Databases
UnstructuredData
Semi-Structured Data
Structured Data
Turn the Survey into a Table (I)
Turn the Survey into a Table (II)
Turn the Survey into a Table (III)
Statistics are only Possible on Standarized Values
SQL
matches table
matches table 2
structures table
folds table
Table Interpretation
Structure of a Table
What is a Key?
SQL Select on a Single Table
SQL Select on a Single Table, Example
SQL Select on a Single Table, Example 2
Joins
SQL Select on Multiple Tables
Foreign Key
Selection as Array Lookup
SQL Select on Multiple Tables
Cross Product A x B
ER-diagrams
Aggregate Functions--Statistics on Attributes
Joins
Join Gives Unnormalized Table
Normalization
Normalization Example
Normalized Tables
Query Optimization
Indexes Speed Access
Object Databases
Forms & reports [user views]
Aspects of Forms:Transactions and Security
Complex Data Example:Encoding Trees in RDBs
RDBs Everywhere: Internet Mail
RDBs Everywhere: File System
Quickie Trees and Clustering
Methods of Building Trees from the bottom up
Bootstrap to Test the Tree
Popular Tree Program Systems
Tree of Life
GenProtEC - Functional Classification
COGs - Orthologs
Example Report: Motions Database
Example Report: Motions Database
Example Report: Motions Database
Example Report: Motions Database
Example Report: Motions Database
Example Report: Motions Database
Large-scale Example: Census DB
Major Application II:Overall Genome Characterization
The World of Structures is also Finite: A Fold Library
Cross-Reference: Folds?Sequences? Organisms
Venn Diagrams for Shared Folds
Patterns of Folds Usage in 8 Genomes
Cluster Trees Grouping Initial Genomes on Basis of Shared Folds
Whole Genome Trees
Top-10 Folds in a Genome
Characteristics of Common, Shared Folds: bab structure
What are the most common folds:Overall? In plants? In animals?
An Issue with Fold Counting: Biases in the Databanks
Using a Tree to Correct for Biases
Know All Folds in a Genome: How are we doing on MG?
Know All Folds in Genome: MG Optimistic ? Prediction
TM-helix “prediction”
Comparative Genomics of Membrane Proteins
2º Structure Prediction
Different Amino Acid Composition Should Give Different 2º Structure
Supersecondary structure words
Different Perspectives on Protein Thermostability
Thermostability: Analyzing a few Factors with Genome Comparison
Composition Analysis of the Proteome
1-4 Spacing of Charged Residues More than Expected in Thermophile Helices ? Salt Bridges
Sequence Length Doesn’t Completely Relate to Thermostability
Controlling for Biases: Stratified Sample
Controls II: Known Structures, Random Genomes
How Representative are the Known Structures of the Proteins in a Complete Genome? The issue of Bias
Amino Acid Composition
Composition of Different Regions of Genomes
Biophysical Proteins
Adding Structure to Functional Genomics, Function to Structural Genomics
Fold-Function Combinations
Fold-Function Combinations
The Most Versatile Folds, Versatile Functions
Fold-Function CombinationsCross-Tabulation Summary Diagram
Compare Classifications and Genomes
COGs vs SCOP: Different Structure Function Relationships for Most Conserved Proteins
Gene Expression Datasets: the Transcriptome
Composition of Genome vs. Transcriptome
Which Protein Folds are Highly Expressed?
Broad Categories Const. in Transcriptome over Timecourse, Not Specific Genes (or Folds)
Different Classes of Membrane Proteins Have Different Changes in Expression Level (esp. 12 TMs)
Correlate with Expression Level with Functional Category
Results from Analysis of Correlation of Functional Class and Expression
Whole Genome Phenotype Profiles
Phenotype ORF Clustering