[BiO BB] All-again-all protein sequence comparison

Dan Bolser dmb at mrc-dunn.cam.ac.uk
Thu Dec 16 17:15:39 EST 2004


On Thu, 16 Dec 2004, Iddo Friedberg wrote:

>
>Use ncbi toolkit, write a script around bl2seq for the all-vs-all.

Does bl2seq use fastacmd or does it expect two sequences only?


>If the genomes are really large, I would try and cluster each genome 
>first at 90% Sequence ID, to remove redundancies, using CD-HIT.

Agreed. Run this over a combined database and you already have some
interesting data. Has anyone played with the new -L coverage cutoff
threshold in cd-hit?


>I wouldn't go with the strategy of having  one genome as a database, and 
>another as a query pool, because that would skew your BLAST statistics 
>to give you false-positive hits. I would go with the all-vs-all pairwise 
>BLAST.

I never used bl2seq, but it might be usefull to run formatdb on the two
databases anyway, only because it lets you use fastacmd to get any
sequence (or pair of sequences) out of the database very easily.



>
>./I
>
>
>Dr. Christoph Gille wrote:
>
>>the ncbi toolkit works well.
>>I can loop over all proteins in one genome
>>and run blast against the other.
>>
>>
>>  
>>
>>>Hi, All
>>>
>>>
>>>I have been working on obtain the BLAST e-score for all-against-all
>>>protein sequences of two genomes. Is there is tool for script for this
>>>function? Any suggestions will be helpful.
>>>
>>>Thanks,
>>>
>>>
>>>Anne_______________________________________________
>>>BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
>>>https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>>>
>>>
>>>    
>>>
>>
>>
>>_______________________________________________
>>BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
>>https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>>
>>
>>  
>>
>
>
>




More information about the BBB mailing list