[Biodevelopers] getting all db sequences from a PSI-BLAST run

Osnat Dafni dafniosn at post.tau.ac.il
Tue Mar 6 02:19:52 EST 2007


I would like to point out that the E-value is NOT a probability, and can exceed
1 for unsignificant alignments. Here's a quote from NCBI's blast pages:

"The Expect value (E) is a parameter that describes the number of hits one can
"expect" to see just by chance when searching a database of a particular size.
It decreases exponentially with the Score (S) that is assigned to a match
between two sequences. Essentially, the E value describes the random background
noise that exists for matches between sequences. For example, an E value of 1
assigned to a hit can be interpreted as meaning that in a database of the
current size one might expect to see 1 match with a similar score simply by
chance. "


Regards,
Os



Quoting Martin Heusel <mheusel at gmail.com>:

> Hi Noel,
>
> yepp i meant '-e' sorry. For the number of sequences returned i never
> get all sequences of a database. Even for a small database with 500
> sequences only around 300 are given back. The e-value was set to
> 1000000. What i recently learned is that blastpgp only makes
> approximations for computing the e-values for speed up reasons. It
> computes only the first taylor term of a taylor approx. of exp() of
>
> E = K m n exp(-lambda score)
>
>  which only makes sense for not too small scores. So my assumption is
> that for to small scores and lambdas the approx. gives way to high
> e-values exceeding the -e threshold. By definition an e-value is a
> propability and should not go beyond 1.
>
> Regards
>
>   Martin
>
> On 2/25/07, Noel Faux <Noel.Faux at med.monash.edu.au> wrote:
> > Hi Martin,
> >
> > This was probably a typo, but, I think you need '-e' not '-E' to set the
> > e-value cutoff for the returned results.  When I wanted all results I
> > set -b to the size of the subject database and -e 100000.  The e-value
> > never reached that, so PSI-BLAST returned all results.
> >
> > Cheers
> > Noel
> >
> > Martin Heusel wrote:
> > > Hi,
> > >
> > > i'm wondering if it's possible to get all sequences of a large
> > > database ranked by E-value or score from PSI-BLAST with a query.
> > > Normally PSI-BLAST stops outputting after a couple of sequences even
> > > if one sets the output parameters -b or -E to very high values. Is it
> > > possible in general or are there computational limits (time etc.) in
> > > figuring out the right scores or E-values when it comes to the many
> > > sequences with very low identity? Thanks for any advice.
> > >
> > > Martin
> > >
> >
> > _______________________________________________
> > Biodevelopers mailing list
> > Biodevelopers at bioinformatics.org
> > https://bioinformatics.org/mailman/listinfo/biodevelopers
> >
>
>
> --
> + Neural Processing Group Technical University Berlin
> + http://ni.cs.tu-berlin.de
> + Institute of Bioinformatics Johannes Kepler University Linz
> + http://www.bioinf.jku.at/
>
> + In the beginning was the WORD, and the WORD was UNSIGNED,
> + and the main(){} was without form and void
> _______________________________________________
> Biodevelopers mailing list
> Biodevelopers at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biodevelopers
>




----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.



More information about the Biodevelopers mailing list