[Bioclusters] RE: split database with blast

Tue Nov 28 12:17:18 EST 2006

Is top-posting allowed here?  

I believe NCBI does this as well.  I recall talking to them about this when
I visited their facilities once.  Have you tried emailing ncbi-helpdesk?
Someone their should be able to tell you what parameters you need to use to
calculate the correct e-value.  bl2seq allows you to set a theoretical db
size, so I'm sure blast will allow you to as well.

Ryan

> Hi Daniel,
> It is non-trivial to get correct effective database sizes 
> with NCBI BLAST, as it involves processing both query 
> sequences and database sequences. You're best bet is to use a 
> package that can split the databases and return correct 
> e-values. mpiBLAST  is one, but dBlast is another if for some 
> unfathomable reason you don't like mpiBLAST. 
> However, depending on what you're doing, e-value differences 
> may not matter. In my personal opinion, there is no 
> difference between e-36 and e-40, so the differences you are 
> talking about are negligible. -Lucas
> 
> On Tuesday, November 21, 2006 at 15:41 -0800, Daniel Xavier 
> de Sousa wrote:
> > 
> > 
> > Hi for all,
> > 
> > I need some help about Parallel BLAST. I will bee happy if 
> anyone help 
> > me.
> > I have worked with parallel BLAST using split database. 
> > 
> > I don?t have problem to execute on part of database  and statistics 
> > values  when use WUBLAST, because  use DBRECMAX and DBRECMIN 
> > parameters and I execute Blast like virtual split database, 
> get just 
> > the piece of all database, and the e-value get right.
> > 
> > But I really want do everything work in NCBI_BLAST. I know the 
> > solution of  mpiBLAST and the list of GI number file. But, these 
> > solutions aren?t so good. The first because the source of 
> BLAST have 
> > to change. And the second, because require that you use GI 
> numbers in 
> > the FASTA identifier.
> > 
> > So, my  question is:
> > 
> > 1)      Somebody
> > knows some else solution to run process blast on split 
> database, and 
> > not changes the e-value with relation to run whole database?
> > 
> > If not, the difference between e-value with whole database 
> and part of 
> > database (using the parameter ?z and ?Y of  ncbi_blast) is very 
> > important?
> > 
> > Example, I processed one sequence with whole database and 
> just part of 
> > database, using parameter ?z, the result was:
> > 
> >                                                 (evalue)NR  
>                                 (evalue) NR/2 using ?z
> > SeqQuery       Seq1DB                4e-66                  
>                                 6e-66
> > SeqQuery       Seq2DB                4e-38                  
>                                 5e-38
> > 
> >             This difference is relevant?
> > Thanks,
> > 
> > Daniel Xavier ? PUC ? Rio de Janeiro - Brazil
> 
> 
> ------------------------------