[BiO BB] BLAST problem: limiting # of HSPs

Kerr Wall pkerrwall at psu.edu
Fri Mar 26 17:18:42 EST 2004


On 3/26/04 12:01 PM, "Dan Bolser <dmb at mrc-dunn.cam.ac.uk>" wrote:

>> In the default blast output, there are summary statistics for the overall
>> hit, is there an option for the tab-deliminated BLAST output that would give
>> us this overall hit statistic instead of one for each HSP?
> 
> 
> I think you can simply sum the e-values for each non overlapping HSP (I
> think they shouldn't overlap). Anybody know the correct formula?

I can handle non overlapping HSP's because I would only be parsing out the
best evalue from each hit.  I'm just trying to avoid it if at all possible.
I'm running a tblastx of ~ 1,000,000 cdna's against themselves to produce a
similarity matrix.  Therefore, I'm more worried about the size of the output
files and making sure that I don't run out of similarities between more
distantly related genes that might get left out of the output when the
maximum number of hits is reached (for some of the larger gene families).  I
need to make sure the matrix is as symmetrical as possible.

>> If not, is there an option to limit the number of HSPs returned in the
>> tab-deliminated output?
> 
> I am sure there is a way to do this, but I can't find any mention of this
> option in the 
> 
> ncbi/doc/blast.txt

Yes, I know.  They don¹t even discuss all of the options in that file.  You
would think that the documentation for blast would be complete considering
how long it has been around.

> Hmm.... Not sure if these have anything to do with it...
> 
> -K N (blastall, blastcl3, blastpgp)
>      Number  of  best  hits from a region to keep (off by default, if
>      used a value of 100 is recommended)
> 
> -P N (blastall, blastpgp, rpsblast)
>      Set to  1  for  single-hit  mode  or  0  for  multiple-hit
>      mode (default)
> 
> -b N (blastall, blastcl3, blastpgp, impala, megablast, rpsblast, seed-
>     top)
>      Number of database sequences to show alignments for (B) (default
>      is 250)

Thanks.  Those are the parameters I've been working with so far.  I did find
a paragraph in the documentation that might be on this same track.
Specifically #4 in the section "Notes for 2.0.6 release":


############################################################################
Notes for 2.0.6 release:

Enhancements:

...

4.) BLAST has been changed to reduce the number of redundant hits that a
user may see.  This is acheived by keeping track of the number of hits
completely contained in a certain region and eliminating those lower scoring
hits that are redundant with others.  This behavior may be controlled with
the -K and -L options:

  -K  Number of best hits from a region to keep [Integer]
    default = 50
  -L  Length of region used to judge hits [Integer]
    default = 20

Setting -K to zero turns off this feature.  This is the default only on
blastall.
############################################################################

Of course, when you get a list of all the options 'blastall -', the L option
is labeled as '-L  Location on query sequence [String]  Optional'.  Not sure
what to make of that?  I wonder if they have changed parameter names from
2.0.6 to 2.2.8?

It looks as if setting K = 1 and using L > 100 (or much larger) would help
me reduce the number of output.  I think also using P = 1 as you stated
above would probably help out the most.

> If you get an answer from blast-help at ncbi.nlm.nih.gov can you please post
> it up? (these emails get archived).

I will.  I sent them an email yesterday afternoon so I won't be expecting
anything back until sometime next week.  I usually have solved the problem
by the time they get back to me.

Thanks for the help,

Kerr


> Cheers,
> Dan.
> 
>> 
>> Thanks,
>> 
>> Kerr Wall
>> 
>> _______________________________________________
>> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
>> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>> 
> 
> 
> 
> --__--__--
> 
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> 
> 
> End of BiO_Bulletin_Board Digest
> 




More information about the BBB mailing list