The wiki in 60 minutes tutorial

From Bioinformatics.Org Wiki

Jump to: navigation, search

Mystica Arrow set (with deep) 1.png up to Metabolic Profiling

Microarrays: Handle like a snake.


What do we want to learn?

The goal of the project is to understand how biofuels producing bactera "decide" which fermentation pathways they use under different growth conditions. Understanding which pathways are key, we need to compare end-product composition relative to expression of genes, focussing primarely on genes of the fermentation pathways.

To achieve this we are using a three (3) prong approach of transcriptomics (micro-arrays), proteomics and metabolomics to look at the roles of differences in pools of metabolic intermediates, transcription and translation of key proteins influence the flow of carbon to or away from biofuels production.

For example C. thermocellum produces ethanol in exponential phase, but lactate in stationary phase. Is this due to gene expression changes or to changes in fluxes on intermediates. We will also compare gene expression in wild type strains vs mutants that have unexpected patterns of end-product formation.

We will then compare the proteomic data (measuring the protein abundance it-self vs the levels of mRNA).

The comparisons will first be made within species varying substrates and growth conditions, then between species. This will allow us to study differences in regulation of gene expression under similar growth conditions, yielding different patterns of end-products.

The same principles apply for the bio-plastics producing bacteria.

The proteomic and transcriptomic data will guide further experiments to improve strains using molecular methods.

Experimental Design

List of microarray experiments and species used File:MGB Transcriptomics Priorities.v5.Oct2010.doc

Components of Variance in Microarray Experiments

Biological Replicates

Most critical thing: The number of biological replicates should be chosen to give results for which the observed differences between treatments are greater than the cumulative errors of the measurements.

Estimated sample size requirements for example data set [1]

FDR = 0.10
FDR = 0.05
FDR = 0.01
Power = 0.5 3 / 3 3 / 3 5 / 5
Power = 0.6 3 / 3 3 / 4 7 / 6
Power = 0.7 3 / 4 5 / 5 10 / 9
Power = 0.8 4 / 6 9 / 8 20 / 14
Power = 0.9 13 / 11 30 / 16 75 / 27


The plan is to do single-label experiments.

The assumption is that labeling reactions will be done buy synthesizing labeled cDNA from mRNA populations. This means that the oligos must be identical to the mRNAs, ie. the same strand as the mRNAs.

Series of experiments

SOW for a total of 200 slides. Don't all have to be fabricated in a single run. Could be done in several runs, and the oligos could even be different in each run, so you can change your mind about species or oligos.

There may be merit in doing a few small micrarray experiments first, and use the results to design subsequent experiments.

Pilot Experiment with Clostridium thermocellum DSM1237, 2360 and 4150

Microarray Design

File:MicroarrayExperiments.xls - spreadsheet listing species, strains and conditions for all microarray experiments

Question: Is there a priority list for species? and condition lists for each species? We may add another column in the spreadsheet.

Question: On a given slide, do all 8 arrays have to have the same oligonuclieotides synthesized in the same positions.

Answer: Not even in one slide but the whole pack of 10 slides - see Array Production section

Agilent Microarrays

Teleconference call with the Vancouver Microarray Facility on the the Design of Microarray Oligo's: File:Teleconference.Sept.2010v1.pdf


Nomenclature (geneIDs, GO numbers)

One concern is that the list of genes is only as good as the genomes they came from. The more gaps there are in the genome, the more genes will be missed. There could be as many missed genes as there are contigs, because genes spanning two contigs are unlikely to be annotated.

If Agilent has internal controls and reference standards, then all we need to do is to design oligos for our genes.

For each species, we need to define a set of annotated CDS sequences from MAGPIE that will be used for oligo design. This number, ideally, would be less than 5000 per species, to allow for 3 spot replicates. Some MAGPIE-annotated genomes have > 5000 CDS sequences. We can probably reducuce this number by eliminating redundant genes.

Internal controls

Internal controls include both positive and negative controls, as well as controls for quantitation of signal.

Agilent may have some positive and negative controls that we can include in the arrays. Find out more about that.

Reference standards

Reference standards allow for comparison of signal intensity between slides.

It seems likely that Agilent may have some commercial reference standards that we can include in our arrays. Find out about that.

Oligonucleotide Design

Design considerations:

Design procedure:


DEADLINE: A set of oligos for on species and set of experiments (I think it's Clostridium thermocellum, but check with Richard Sparling) must be ready before December 17.


Array Production

Agilent produces slides in sets of 10, that is, 10 identical slides. So regardless of the size of the experiment, you must purchase 10 or some multiple of 10 slides.

The problem with this constraint is some packs of 10 slides may be wasted if there are not enough experiments. For example 10 slides for genome Thermotoga petrophila is using only 6 slides for 6 replicates.

A possible solution: analyzing the possibilities of combining 2 closely related species into one set of oligos and fit into one pack of 10 slides.

Experimental protocols

Labeling and hybridization

The Agilent system typically uses a single-lable design. Comparison of intensity between slides is done by calculating a ratio between a gene and a reference standard. The reference standards are control oligos also synthesized on the array, and are also spiked into the labeling reaction.

One alternative would be to do a double-label experiment. For example, each unique mRNA population would be labeled using Cy3. And a mixed mRNA population from all conditions would be labeled using Cy5. The mixed mRNA thus provides a consistant way of calculating intensity.

We need to find out if the Agilent system can accommodate double-lable experiments, and if so, whether there is any advantage over the single-label.

Biological experiments

In-vitro work


Processing the raw data

Analysis of Results

Data Mining


Agilent - 10 Pitfalls of Microarray Analysis

Tommy S. Jorstad, Mette Langaas, Atle M. Bones, Understanding sample size: what determines the required number of microarrays for an experiment?, Trends in Plant Science, Volume 12, Issue 2, February 2007, Pages 46-50, ISSN 1360-1385, DOI: 10.1016/j.tplants.2007.01.001.

Knapen D, Vergauwen L, Laukens K, Blust R (2009) Best practices for hybridization design in two-color microarray analysis Trends in Biotechnology 27:406-414

Simon, S. Myths & Truths About Microarray Expression Profiling


We need to have standardized terms to create a precise description of the microarray experiments.

array - a single cell on a slide, with a capacity of 15,000 oligos.

slide - an Agilent slide with 8 arrrays, each of which can be hybridized with a different labeled probe

probe - labeled single-stranded cDNA population synthesized form a mRNA population

species - bacterial species

strain - a particular genotype of a bacterial species

condition - defined set of experimental parameters

biological replicate - a replicate of the species, strain, and experimental condition, generating a separate mRNA population

array replicate (technical replicate) - a repeat of a biological replicate, on two separate arrays of two independent slides. We many not need to do array replicates, because biological replicates include all experimental variation contained within array replicates.

spot replicate - the same oligo is synthesized on two or more locations in one array. The reason for spot replicates is so that the intensity for that gene is calculated as the average of spot replicates for that gene, controlling for errors in reading any one spot.

Personal tools
wiki navigation