[Bioclusters] blast query

Michael Cariaso bioclusters@bioinformatics.org
Wed, 15 Sep 2004 01:22:49 -0400


Che, Anney (NIH/NIAID) wrote:
> Does anyone know how to set a filter in blast to omitting the replicated
> hits?
> 
> Thanks, Anney

Hope this helps. I keep a text 'gilist.txt' file with this format:
GI#1 <tab> a description of the sequence
GI#2 <tab> a description of the sequence
GI#3 <tab> a description of the sequence

When I want a filter that will only see 'mouse' sequences. I run this 
command:
grep mouse gilist.txt | cut -f 1 > filter.txt

Then you run blast as you normally would but with an extra -l switch.

blastall -p blastn -d database -i query -l filter.txt

Optionally. If you'll be using the filter several times you may want to 
make a binary version which will allow blast to run faster. You can use 
this file in place of 'filter.txt' for a quite a little boost of extra 
speed.

formatdb -F filter.txt -B filter.bin

followed by

blastall -p blastn -d database -i query -l filter.bin


Since this is bioclusters, I'll mention that this also works very well 
with mpiblast.


Michael Cariaso
Bioinformatics Developer
Besthesda, MD
cariaso at yahoo dot com