[Bioclusters] RE: split database with blast
Ryan Golhar
golharam at umdnj.edu
Tue Nov 28 12:17:18 EST 2006
Is top-posting allowed here?
I believe NCBI does this as well. I recall talking to them about this when
I visited their facilities once. Have you tried emailing ncbi-helpdesk?
Someone their should be able to tell you what parameters you need to use to
calculate the correct e-value. bl2seq allows you to set a theoretical db
size, so I'm sure blast will allow you to as well.
Ryan
> Hi Daniel,
> It is non-trivial to get correct effective database sizes
> with NCBI BLAST, as it involves processing both query
> sequences and database sequences. You're best bet is to use a
> package that can split the databases and return correct
> e-values. mpiBLAST is one, but dBlast is another if for some
> unfathomable reason you don't like mpiBLAST.
> However, depending on what you're doing, e-value differences
> may not matter. In my personal opinion, there is no
> difference between e-36 and e-40, so the differences you are
> talking about are negligible. -Lucas
>
> On Tuesday, November 21, 2006 at 15:41 -0800, Daniel Xavier
> de Sousa wrote:
> >
> >
> > Hi for all,
> >
> > I need some help about Parallel BLAST. I will bee happy if
> anyone help
> > me.
> > I have worked with parallel BLAST using split database.
> >
> > I don?t have problem to execute on part of database and statistics
> > values when use WUBLAST, because use DBRECMAX and DBRECMIN
> > parameters and I execute Blast like virtual split database,
> get just
> > the piece of all database, and the e-value get right.
> >
> > But I really want do everything work in NCBI_BLAST. I know the
> > solution of mpiBLAST and the list of GI number file. But, these
> > solutions aren?t so good. The first because the source of
> BLAST have
> > to change. And the second, because require that you use GI
> numbers in
> > the FASTA identifier.
> >
> > So, my question is:
> >
> > 1) Somebody
> > knows some else solution to run process blast on split
> database, and
> > not changes the e-value with relation to run whole database?
> >
> > If not, the difference between e-value with whole database
> and part of
> > database (using the parameter ?z and ?Y of ncbi_blast) is very
> > important?
> >
> > Example, I processed one sequence with whole database and
> just part of
> > database, using parameter ?z, the result was:
> >
> > (evalue)NR
> (evalue) NR/2 using ?z
> > SeqQuery Seq1DB 4e-66
> 6e-66
> > SeqQuery Seq2DB 4e-38
> 5e-38
> >
> > This difference is relevant?
> > Thanks,
> >
> > Daniel Xavier ? PUC ? Rio de Janeiro - Brazil
>
>
> ------------------------------
More information about the Bioclusters
mailing list