[BiO BB] Refined nucleotide BLAST matrix

Peter Rice pmr at ebi.ac.uk
Fri Feb 18 12:40:32 EST 2005


> Yannick Wurm wrote:
>> for a specific need in my lab, we are looking for an implementation of 
>> nucleotide sequence alignment program which would be more flexible 
>> than standard BLAST.
>>
>> To help identify these sequences, we need to be able to fine-tune the 
>> matrix used for scoring. Thus, for example when calculating the 
>> "score" of an aligment, C->A and C->T could be given different weights.
>>
>> To my surprise, BLAST does not have this liberty, despite the fact 
>> that different scoring matrices are used for proteins. I couldn't find 
>> anything on Google either.
>>
>> Would anyone one the list have a clue? Or do I need to get dirty 
>> messing with BLAST's source?

Avoid messing with the BLAST sources!

You can get around this - you also need this trick to handle nucleotide 
ambiguity codes (for example to compare patent sequences which use codes other 
than 'N'.

You have to cheat though.

1. Build your blast database as protein

2. Give your matrix a name that matches one of the blast protein matrix names (!)

3. Put in the matrix values you want

4. Remember that you are now using blastp (protein search) so you can only use 
a short wordsize - I am guessing you have short sequences anyway so this 
should not be a problem

5. Remember that BLAST does local alignment.

6. Remember that your scores will be making some wrong assumptions about using 
proteins. You should still find the hits you are looking for.

This cheat was published (by NCBI if I recall correctly) some time back. 
Sorry, I can't track down the reference.

Hope this helps,

Peter Rice




More information about the BBB mailing list