[Bioclusters] Parallel blast
chris dagdigian
bioclusters@bioinformatics.org
Fri, 07 Jun 2002 07:56:17 -0400
Hi Wim,
This will be a quickie response...
With newer versions of ncbi-blast there are 2 things that have made the
process of splitting up the target databases so that your query can be
multiplexed across multiple searches and machines far easier:
o The "-z" option switch (used to be undocumented I think?) allows you
to override/tell the blastall binary the effective size of the database.
If you feed the original (large) value to the blastall binary while
searching against the small slice you will at least get back the correct
scores and statistics. This is a huge time and accuracy saver as trying
to parse and adjust these values after the fact is a giant error-prone
excercise in pain.
o XML output of results
Having the scores and statistics correct while getting the results back
in a way that is far easier to parse than the human readable version is
95% of the battle. Everything else is fairly simple.
-Chris
Wim Glassee wrote:
><snip>
>
>I've noticed some people cut their databases and query sequences to
>smaller pieces, with or without overlap, and perform separate blasts.
>But how do you put them back together again? And are the results the
>same?
>
>Wim
>
>
>
>
--
Chris Dagdigian, <dag@sonsorol.org>
Life Science IT & Research Computing Consultant
Office: 617-666-6454, Mobile: 617-877-5498, Fax: 425-699-0193
Work: http://bioteam.net PGP KeyID: 83D4310E Yahoo IM: craffi