[Bioclusters] Re: Help on BLAST

Wim Glassee bioclusters@bioinformatics.org
Tue, 27 Aug 2002 11:18:18 +0200


Hi Chris,

> 
> 
> Wim,
> 
> If you assume that the two blast target sets are non-overlapping, the
> only scores which need to be recalculated are the e- and p- values.
> Score and Bit Score are based soley on the alignment, substitution
> matrix, and gap costs, plus the K and lambda parameters.  Those don't
> change with target set size.
> 
>   e-value = { m n 2^(bit_score) }

Shouldn't this be:

	e-value = { m n 2^(-bit_score) }

I think you forgot a minus. This is the equation found in the blast
tutorial.

> 
> m and n are the number of residues in the target and query set.  To
> recompute an e-value, given n-old (the original target set size) and
> n-new (the new TOTAL target set size):
> 
>   new-e-value = n-new * (old-e-value / n-old)
> 
> Adding this line to whatever code you're using is left as an exercise
> for the reader.

Recalculating the scores like this does create significant rounding
errors, mostly because of the number of significant digits in the blast
output.
If you can calculate kappa and lambda yourself (I'll have to check the
code again) you could calculate the bit_score from the score and then
the first equation would be the best to use.

Have you tried splitting and merging yourself? In my experience the
actual results (not the statistics) are not always consistent. I believe
this to be a bigger problem.

Does anybody have any experience with this?

Wim

> 
> -C
> 
> Wim Glassee writes:
> > Hi,
> >
> > I had a fast look at the sources for seqsplit and blastunsplit, and
> > there doesn't seem to be any statistics recalculation of any kind in
> > there. If you blast smaller pieces of a query sequence against a db,
the
> > statistics will not be the same as for the original blast, so when
> > merging the output files, you won't end up with the same results. In
a
> > lot of cases even the number of hits and/or hsps will NOT be the
same.
> >
> > Wim
> >
> >
> >
> > > -----Original Message-----
> > > From: bioclusters-admin@bioinformatics.org [mailto:bioclusters-
> > > admin@bioinformatics.org] On Behalf Of Mario Belluardo
> > > Sent: maandag 26 augustus 2002 15:04
> > > To: bioclusters@bioinformatics.org
> > > Subject: [Bioclusters] Re: Help on BLAST
> > >
> > > Hi Sylvain,
> > > I've found and testing seqsplit (and blastunsplit) that you can
> > download
> > > form here
> > >
> > > ftp://ftp.cgr.ki.se/pub/prog/MSPcrunch+Blixem/
> > >
> > > Here is the web documentation:
> > > http://www.cgr.ki.se/cgr/groups/sonnhammer/MSPcrunch.html
> > >
> > > Unfortunately seems it works only with a single-sequence at time,
it
> > > means that you cannot submit multi-sequences querys, but you can
> > modify
> > > yourself the source code. I would like to do it, so if you modify
it
> > > before me let me know!
> > >
> > > Mario
> > >
> > >
> > >
> > > > Message: 2
> > > > Date: Fri, 23 Aug 2002 14:51:14 -0400
> > > > From: Sylvain Foisy <sylvain.foisy@bioneq.qc.ca>
> > > > To: bioclusters@bioinformatics.org
> > > > Subject: [Bioclusters] Re: Help on BLAST
> > > > Reply-To: bioclusters@bioinformatics.org
> > > >
> > > > Hi
> > > >
> > > > On Friday, August 23, 2002, at 12:01 PM, bioclusters-
> > > > request@bioinformatics.org wrote:
> > > >
> > > > > I read your posts saying "splitting the query sequence into
small
> > =
> > > > > fragments and BLASTing each of those fragments against the
> > (entire) =
> > > > > database is super-easy to implement." Could you please tell me
how
> > to
> > > =
> > > > > combine the results, or a link to the solution would be very
> > helpful?
> > > >
> > > > Add me to the list of interested parties to that subject. I
would
> > like
> > > > to know how to write an app that would do these three steps:
> > > >
> > > > -Splitting a sequence in multiples of, let say, 100 nucleotides;
> > > > -Send each of them to a node for BLASTing;
> > > > -Reassemble the different results into a single report for the
> > users.
> > > >
> > > > Any web links that would help us in our quest?
> > > >
> > > > Cordially
> > > >
> > > > Sylvain
> > > >
> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > > Sylvain Foisy, Ph. D.
> > > > Directeur-Operations / Project Manager
> > > > BioNEQ - Le Reseau quebecois de bioinformatique
> > > > Genome-Quebec
> > > > Tel.: (514) 878-9911
> > > > E-mail: sylvain.foisy@bioneq.qc.ca
> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >
> > >
> > >
> > > --
> > >
> > > Dr. Mario Belluardo
> > > Institute for Cancer Research and Treatment
> > > http://www.ircc.it
> > > _______________________________________________
> > > Bioclusters maillist  -  Bioclusters@bioinformatics.org
> > > https://bioinformatics.org/mailman/listinfo/bioclusters
> >
> >
> > _______________________________________________
> > Bioclusters maillist  -  Bioclusters@bioinformatics.org
> > https://bioinformatics.org/mailman/listinfo/bioclusters
> >
> _______________________________________________
> Bioclusters maillist  -  Bioclusters@bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters