The wiki in 60 minutes tutorial
From Bioinformatics.Org Wiki
Microarrays: Handle like a snake.
Contents |
What do we want to learn?
The goal of the project is to understand how biofuels producing bactera "decide" which fermentation pathways they use under different growth conditions. Understanding which pathways are key, we need to compare end-product composition relative to expression of genes, focussing primarely on genes of the fermentation pathways.
To achieve this we are using a three (3) prong approach of transcriptomics (micro-arrays), proteomics and metabolomics to look at the roles of differences in pools of metabolic intermediates, transcription and translation of key proteins influence the flow of carbon to or away from biofuels production.
For example C. thermocellum produces ethanol in exponential phase, but lactate in stationary phase. Is this due to gene expression changes or to changes in fluxes on intermediates. We will also compare gene expression in wild type strains vs mutants that have unexpected patterns of end-product formation.
We will then compare the proteomic data (measuring the protein abundance it-self vs the levels of mRNA).
The comparisons will first be made within species varying substrates and growth conditions, then between species. This will allow us to study differences in regulation of gene expression under similar growth conditions, yielding different patterns of end-products.
The same principles apply for the bio-plastics producing bacteria.
The proteomic and transcriptomic data will guide further experiments to improve strains using molecular methods.
Experimental Design
List of microarray experiments and species used File:MGB Transcriptomics Priorities.v5.Oct2010.doc
Components of Variance in Microarray Experiments
Biological Replicates
Most critical thing: The number of biological replicates should be chosen to give results for which the observed differences between treatments are greater than the cumulative errors of the measurements.
Estimated sample size requirements for example data set [1]
|
FDR = 0.10
|
FDR = 0.05
|
FDR = 0.01
|
---|---|---|---|
Power = 0.5 | 3 / 3 | 3 / 3 | 5 / 5 |
Power = 0.6 | 3 / 3 | 3 / 4 | 7 / 6 |
Power = 0.7 | 3 / 4 | 5 / 5 | 10 / 9 |
Power = 0.8 | 4 / 6 | 9 / 8 | 20 / 14 |
Power = 0.9 | 13 / 11 | 30 / 16 | 75 / 27 |
- Power is the fraction of true positives detected. FDR is the false discovery rate ie. false positives. The numbers either side of the right slash indicate sample-size (ie. biological replicates) estimates made using the sample-size estimation methods described in Ref. [8] and Ref. [10], respectively.
Labeling
The plan is to do single-label experiments.
The assumption is that labeling reactions will be done buy synthesizing labeled cDNA from mRNA populations. This means that the oligos must be identical to the mRNAs, ie. the same strand as the mRNAs.
Series of experiments
SOW for a total of 200 slides. Don't all have to be fabricated in a single run. Could be done in several runs, and the oligos could even be different in each run, so you can change your mind about species or oligos.
There may be merit in doing a few small micrarray experiments first, and use the results to design subsequent experiments.
Pilot Experiment with Clostridium thermocellum DSM1237, 2360 and 4150
Microarray Design
File:MicroarrayExperiments.xls - spreadsheet listing species, strains and conditions for all microarray experiments
Question: Is there a priority list for species? and condition lists for each species? We may add another column in the spreadsheet.
- Two main problems:
- Layout of each array, for each species
This is the physical order of oligos on the slide. It should be randomized, and include at least two spot replicates in different locations.
Each array can hold 15,000 oligos. If we have less than 5000 genes per species, there can be three spot replicates for each gene on the array. If there are > 5000 genes, then there will two spot replicates.
- Layout of arrays by slide
This must be random. There should be no pattern of species, strains, conditions or replicates that are physically near each other. We should avoid having two biological replicates of the same experiment on any slide.
Question: On a given slide, do all 8 arrays have to have the same oligonuclieotides synthesized in the same positions.
Answer: Not even in one slide but the whole pack of 10 slides - see Array Production section
- Stephane Le Bihan slebihan@prostatecentre.com
- Robert Bell rbell@prostatecentre.com
- Anne Haegert ahaegert@prostatecentre.com
Teleconference call with the Vancouver Microarray Facility on the the Design of Microarray Oligo's: File:Teleconference.Sept.2010v1.pdf
Genes
Nomenclature (geneIDs, GO numbers)
One concern is that the list of genes is only as good as the genomes they came from. The more gaps there are in the genome, the more genes will be missed. There could be as many missed genes as there are contigs, because genes spanning two contigs are unlikely to be annotated.
If Agilent has internal controls and reference standards, then all we need to do is to design oligos for our genes.
For each species, we need to define a set of annotated CDS sequences from MAGPIE that will be used for oligo design. This number, ideally, would be less than 5000 per species, to allow for 3 spot replicates. Some MAGPIE-annotated genomes have > 5000 CDS sequences. We can probably reducuce this number by eliminating redundant genes.
Internal controls
Internal controls include both positive and negative controls, as well as controls for quantitation of signal.
Agilent may have some positive and negative controls that we can include in the arrays. Find out more about that.
Reference standards
Reference standards allow for comparison of signal intensity between slides.
It seems likely that Agilent may have some commercial reference standards that we can include in our arrays. Find out about that.
Oligonucleotide Design
Design considerations:
- Are there cases in which two closely-related species could use the same set of oligos?
- The Scion group is willing to design the oligos for their 2 organisms in such a way that both genomes can fit in duplicate per array spot.
- If number of packs (slides) is a problem, we will see if we can combine other pair of species into one set of oligo.
- Is it possible to create a list of interested genes? - very broad coverage not missing any one, but also reduce total number of genes of each oligo design for following purposes.
- Can we get rid of some genes if required? For example number of genes in one genome is too large, or for the purpose of combining 2 closely related species into one oligo set.
- Can we have genes categorized into 2 groups, one is interested and the other are rest? We can give interested gene one more spots than others in one single array.
Design procedure:
- Input -
- Set of design parameters
- FASTA file containing coding sequences for genes
- Output -
Software:
- eArray - Agilent Microarray manufacturer's software
- eArray - web-based oligo design at Agilent website
- log in eArray
- Agilent technical support information for microarray
- Email: genomics@agilent.com
- Phone: 800-227-9770-5599 (Scanner group) direct line 859-3736406
- Name: Lesley
- progress of oligo design with eArray
- Osprey pipeline at COE
- OligoWiz - Freeware for oligo design
DEADLINE: A set of oligos for on species and set of experiments (I think it's Clostridium thermocellum, but check with Richard Sparling) must be ready before December 17.
Schedule:
- By Wed. Nov. 24 - Trial run of oligos using C. thermocellum
- By Tue. Nov. 30 - Good first draft of oligo set for C. thermocellum, and rough drafts of other 5 species
- Before Dec. 17 - Final drafts of oligo sets for all 6 species.
Array Production
Agilent produces slides in sets of 10, that is, 10 identical slides. So regardless of the size of the experiment, you must purchase 10 or some multiple of 10 slides.
The problem with this constraint is some packs of 10 slides may be wasted if there are not enough experiments. For example 10 slides for genome Thermotoga petrophila is using only 6 slides for 6 replicates.
A possible solution: analyzing the possibilities of combining 2 closely related species into one set of oligos and fit into one pack of 10 slides.
Experimental protocols
Labeling and hybridization
The Agilent system typically uses a single-lable design. Comparison of intensity between slides is done by calculating a ratio between a gene and a reference standard. The reference standards are control oligos also synthesized on the array, and are also spiked into the labeling reaction.
One alternative would be to do a double-label experiment. For example, each unique mRNA population would be labeled using Cy3. And a mixed mRNA population from all conditions would be labeled using Cy5. The mixed mRNA thus provides a consistant way of calculating intensity.
We need to find out if the Agilent system can accommodate double-lable experiments, and if so, whether there is any advantage over the single-label.
Biological experiments
In-vitro work
Results
- Agilent files - Get an example of an Agilent file so that we can experiment with it on our own system using TMEV or other software.
- TIFF images of slides - We will ask Agilent to supply us with TIFF files of each slide, just as to check for experimental artifacts that might not show up in statistical analysis. For example, a fingerprint on a slide.
Processing the raw data
Analysis of Results
Data Mining
References
Agilent - 10 Pitfalls of Microarray Analysis
Simon, S. Myths & Truths About Microarray Expression Profiling
Glossary
We need to have standardized terms to create a precise description of the microarray experiments.
array - a single cell on a slide, with a capacity of 15,000 oligos.
slide - an Agilent slide with 8 arrrays, each of which can be hybridized with a different labeled probe
probe - labeled single-stranded cDNA population synthesized form a mRNA population
species - bacterial species
strain - a particular genotype of a bacterial species
condition - defined set of experimental parameters
biological replicate - a replicate of the species, strain, and experimental condition, generating a separate mRNA population
array replicate (technical replicate) - a repeat of a biological replicate, on two separate arrays of two independent slides. We many not need to do array replicates, because biological replicates include all experimental variation contained within array replicates.
spot replicate - the same oligo is synthesized on two or more locations in one array. The reason for spot replicates is so that the intensity for that gene is calculated as the average of spot replicates for that gene, controlling for errors in reading any one spot.