BIOINFORMATICS Databases

Contents: Databases

Relational Databases

Unstructured Data

Semi-Structured Data

Structured Data

Turn the Survey into a Table (I)

Turn the Survey into a Table (II)

Turn the Survey into a Table (III)

Statistics are only Possible on Standarized Values

SQL

matches table

matches table 2

structures table

folds table

Table Interpretation

Structure of a Table

What is a Key?

SQL Select on a Single Table

SQL Select on a Single Table, Example

SQL Select on a Single Table, Example 2

Joins

SQL Select on Multiple Tables

Foreign Key

Selection as Array Lookup

SQL Select on Multiple Tables

Cross Product A x B

ER-diagrams

Aggregate Functions-- Statistics on Attributes

Joins

Join Gives Unnormalized Table

Normalization

Normalization Example

Normalized Tables

Query Optimization

Indexes Speed Access

Object Databases

Forms & reports [user views]

Aspects of Forms: Transactions and Security

Complex Data Example: Encoding Trees in RDBs

RDBs Everywhere: Internet Mail

RDBs Everywhere: File System

Quickie Trees and Clustering

Methods of Building Trees from the bottom up

Bootstrap to Test the Tree

Popular Tree Program Systems

Tree of Life

GenProtEC - Functional Classification

COGs - Orthologs

Example Report: Motions Database

Example Report: Motions Database

Example Report: Motions Database

Example Report: Motions Database

Example Report: Motions Database

Example Report: Motions Database

Large-scale Example: Census DB

Major Application II: Overall Genome Characterization

The World of Structures is also Finite: A Fold Library

Cross-Reference: Folds?Sequences ? Organisms

Venn Diagrams for Shared Folds

Patterns of Folds Usage in 8 Genomes

Cluster Trees Grouping Initial Genomes on Basis of Shared Folds

Whole Genome Trees

Top-10 Folds in a Genome

Characteristics of Common, Shared Folds: bab structure

What are the most common folds: Overall? In plants? In animals?

An Issue with Fold Counting: Biases in the Databanks

Using a Tree to Correct for Biases

Know All Folds in a Genome: How are we doing on MG?

Know All Folds in Genome: MG Optimistic ? Prediction

TM-helix “prediction”

Comparative Genomics of Membrane Proteins

2º Structure Prediction

Different Amino Acid Composition Should Give Different 2º Structure

Supersecondary structure words

Different Perspectives on Protein Thermostability

Thermostability: Analyzing a few Factors with Genome Comparison

Composition Analysis of the Proteome

1-4 Spacing of Charged Residues More than Expected in Thermophile Helices ? Salt Bridges

Sequence Length Doesn’t Completely Relate to Thermostability

Controlling for Biases: Stratified Sample

Controls II: Known Structures, Random Genomes

How Representative are the Known Structures of the Proteins in a Complete Genome? The issue of Bias

Amino Acid Composition

Composition of Different Regions of Genomes

Biophysical Proteins

Adding Structure to Functional Genomics, Function to Structural Genomics

Fold-Function Combinations

Fold-Function Combinations

The Most Versatile Folds, Versatile Functions

Fold-Function Combinations Cross-Tabulation Summary Diagram

Compare Classifications and Genomes

COGs vs SCOP: Different Structure Function Relationships for Most Conserved Proteins

Gene Expression Datasets: the Transcriptome

Composition of Genome vs. Transcriptome

Which Protein Folds are Highly Expressed?

Broad Categories Const. in Transcriptome over Timecourse, Not Specific Genes (or Folds)

Different Classes of Membrane Proteins Have Different Changes in Expression Level (esp. 12 TMs)

Correlate with Expression Level with Functional Category

Results from Analysis of Correlation of Functional Class and Expression

Whole Genome Phenotype Profiles

Phenotype ORF Clustering