[Bioclusters] distributed blasting of genomes and WASHU blast

Chris Dwan (CCGB) bioclusters@bioinformatics.org
Thu, 13 Feb 2003 09:44:35 -0600 (CST)


> ...
> 2)  If I have two large genomes that need a lengthy blast, how can I
>     split that up?
> ...
> Even a valid hit can have some repeat in it ...
> ... 
> However, I'm after a generalized solution that doesn't require special
> knowledge of the sequences. 
> ...

Disclaimer first:  I don't know if this comment applies to your
particular situation.  

So much for apologies.

I've had several mid-sized script-n-hack projects start with exactly
this question:  "How do I BLAST one genome against another?"  When we
got to the root of it, the biological questions of interest demanded a
variety of approaches.  Here are two examples:

1) Find me putative orthologs between these two chromosomes.
   ----------------------------------------------------------
   - This broke down into 
     1) Find the genes
     2) Find the orthologs.

   - In this case it makes a lot of sense to filter out 
     low complexity sequence up front, hit each chromosome
     with a suite of gene-finders...including a blastx vs. 
     a well annotated protein dataset like swissprot.  From 
     that, we get a set of possible genes in each chromosome.
     Now the problem is more recognizable as a job that BLAST
     might be good at.

2) Show me the large scale genomic events that provide evidence 
   for evolutionary relation between these two specific chromosomes.
   -----------------------------------------------------------------
   - Here, we do NOT want to get rid of low complexity or repetitive
     elements.  A straight-ahead "overlapping chunks -> blastn -> 
     dot-plot" approach gives what is wanted.

3) Show me the paralogs (duplicated genes within a single genome)
   and...

You get the idea.

-Chris Dwan
 Center for Computational Genomics and Bioinformatics
 University of Minnesota