[Bioclusters] Re: Help on BLAST

Chris Dwan (CCGB) bioclusters@bioinformatics.org
Mon, 26 Aug 2002 08:45:03 -0500 (CDT)


Wim,

If you assume that the two blast target sets are non-overlapping, the 
only scores which need to be recalculated are the e- and p- values.
Score and Bit Score are based soley on the alignment, substitution 
matrix, and gap costs, plus the K and lambda parameters.  Those don't
change with target set size.

  e-value = { m n 2^(bit_score) } 

m and n are the number of residues in the target and query set.  To
recompute an e-value, given n-old (the original target set size) and
n-new (the new TOTAL target set size):

  new-e-value = n-new * (old-e-value / n-old)

Adding this line to whatever code you're using is left as an exercise
for the reader.

-C

Wim Glassee writes:
> Hi,
> 
> I had a fast look at the sources for seqsplit and blastunsplit, and
> there doesn't seem to be any statistics recalculation of any kind in
> there. If you blast smaller pieces of a query sequence against a db, the
> statistics will not be the same as for the original blast, so when
> merging the output files, you won't end up with the same results. In a
> lot of cases even the number of hits and/or hsps will NOT be the same.
> 
> Wim
> 
> 
> 
> > -----Original Message-----
> > From: bioclusters-admin@bioinformatics.org [mailto:bioclusters-
> > admin@bioinformatics.org] On Behalf Of Mario Belluardo
> > Sent: maandag 26 augustus 2002 15:04
> > To: bioclusters@bioinformatics.org
> > Subject: [Bioclusters] Re: Help on BLAST
> > 
> > Hi Sylvain,
> > I've found and testing seqsplit (and blastunsplit) that you can
> download
> > form here
> > 
> > ftp://ftp.cgr.ki.se/pub/prog/MSPcrunch+Blixem/
> > 
> > Here is the web documentation:
> > http://www.cgr.ki.se/cgr/groups/sonnhammer/MSPcrunch.html
> > 
> > Unfortunately seems it works only with a single-sequence at time, it
> > means that you cannot submit multi-sequences querys, but you can
> modify
> > yourself the source code. I would like to do it, so if you modify it
> > before me let me know!
> > 
> > Mario
> > 
> > 
> > 
> > > Message: 2
> > > Date: Fri, 23 Aug 2002 14:51:14 -0400
> > > From: Sylvain Foisy <sylvain.foisy@bioneq.qc.ca>
> > > To: bioclusters@bioinformatics.org
> > > Subject: [Bioclusters] Re: Help on BLAST
> > > Reply-To: bioclusters@bioinformatics.org
> > >
> > > Hi
> > >
> > > On Friday, August 23, 2002, at 12:01 PM, bioclusters-
> > > request@bioinformatics.org wrote:
> > >
> > > > I read your posts saying "splitting the query sequence into small
> =
> > > > fragments and BLASTing each of those fragments against the
> (entire) =
> > > > database is super-easy to implement." Could you please tell me how
> to
> > =
> > > > combine the results, or a link to the solution would be very
> helpful?
> > >
> > > Add me to the list of interested parties to that subject. I would
> like
> > > to know how to write an app that would do these three steps:
> > >
> > > -Splitting a sequence in multiples of, let say, 100 nucleotides;
> > > -Send each of them to a node for BLASTing;
> > > -Reassemble the different results into a single report for the
> users.
> > >
> > > Any web links that would help us in our quest?
> > >
> > > Cordially
> > >
> > > Sylvain
> > >
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > Sylvain Foisy, Ph. D.
> > > Directeur-Operations / Project Manager
> > > BioNEQ - Le Reseau quebecois de bioinformatique
> > > Genome-Quebec
> > > Tel.: (514) 878-9911
> > > E-mail: sylvain.foisy@bioneq.qc.ca
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > 
> > 
> > 
> > --
> > 
> > Dr. Mario Belluardo
> > Institute for Cancer Research and Treatment
> > http://www.ircc.it
> > _______________________________________________
> > Bioclusters maillist  -  Bioclusters@bioinformatics.org
> > https://bioinformatics.org/mailman/listinfo/bioclusters
> 
> 
> _______________________________________________
> Bioclusters maillist  -  Bioclusters@bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
>