[Biodevelopers] Re: BLAST asymmetrical
Christopher Dwan
cdwan at bioteam.net
Fri Jan 19 09:59:18 EST 2007
> I want all homologies.
>
>> Or even Smith-Waterman which will take a while to run.
>
> Do you know of a program that can calculate SW on a pair of genomes?
This may be a semantic confusion on my part, but here's my answer to
that specific question:
If you really want the single best *global* alignment between two
multi megabase sequences, yes SW is the way to go, and yes, it will
take a really long time. On the other hand, I've never met anyone
who really, seriously cares about monolithic, global alignments of
chromosomes. Go down that road, and the next question will be "why
can't we just run clustalw on whole chromosomes?" Yes, of course you
could ... but it'll be really slow and not very useful.
Note: This is not an invitation to the accelerator people in the
audience to offer me a *faster* clustalw or SW. I'm trying to steer
people toward *better* uses of the tools. You might as well work on
multi-gigabyte cut-and-paste buffers so that I can stuff whole
genomes into the NCBI web interface.
On the other hand, if you want the best gene sized (a few kilobase)
matches from within that pair of megabase sequences, it's a different
question. You're going to wind up chopping each sequence into
overlapping chunks and running an all against all search of some
sort. The chunk size will be determined by how large you think the
introns and exons in your genes are. An even more clever approach
might involve doing preliminary gene calls with a gene finding
program like Glimmer, and then starting the all against all search
from those hits.
Chromosome vs. chromosome BLAST answers the question "is there a
decent hit to any part of this chromosome in that other one". The
answer, broadly speaking, will be "yes, there is a statistically
significant match there."
If you want homologous genes, you're going to have to do a bit more
work than just running a single program to get The Answers.
-Chris Dwan
More information about the Biodevelopers
mailing list