[Bioclusters] distributed blasting of genomes and WASHU blast
Chris Dwan (CCGB)
bioclusters@bioinformatics.org
Thu, 13 Feb 2003 09:44:35 -0600 (CST)
> ...
> 2) If I have two large genomes that need a lengthy blast, how can I
> split that up?
> ...
> Even a valid hit can have some repeat in it ...
> ...
> However, I'm after a generalized solution that doesn't require special
> knowledge of the sequences.
> ...
Disclaimer first: I don't know if this comment applies to your
particular situation.
So much for apologies.
I've had several mid-sized script-n-hack projects start with exactly
this question: "How do I BLAST one genome against another?" When we
got to the root of it, the biological questions of interest demanded a
variety of approaches. Here are two examples:
1) Find me putative orthologs between these two chromosomes.
----------------------------------------------------------
- This broke down into
1) Find the genes
2) Find the orthologs.
- In this case it makes a lot of sense to filter out
low complexity sequence up front, hit each chromosome
with a suite of gene-finders...including a blastx vs.
a well annotated protein dataset like swissprot. From
that, we get a set of possible genes in each chromosome.
Now the problem is more recognizable as a job that BLAST
might be good at.
2) Show me the large scale genomic events that provide evidence
for evolutionary relation between these two specific chromosomes.
-----------------------------------------------------------------
- Here, we do NOT want to get rid of low complexity or repetitive
elements. A straight-ahead "overlapping chunks -> blastn ->
dot-plot" approach gives what is wanted.
3) Show me the paralogs (duplicated genes within a single genome)
and...
You get the idea.
-Chris Dwan
Center for Computational Genomics and Bioinformatics
University of Minnesota