[BiO BB] How to find the same proteins?

Dan Bolser dmb at mrc-dunn.cam.ac.uk
Thu Mar 23 09:18:25 EST 2006

Semen Esilevsky wrote:
> Dear all,
> I'm a novice in bioinformatics and this question is
> probably stupid, but...
> I have a list of ~200 PDB id's. For each of them I
> have to build a list of all entries in PDB, which
> represent the same protein (say, >99% sequence
> similarity and no large gaps). Could someone suggest
> me the least painfull way of doing this?
> As far as I understand all what I need is the database
> where all pairwice BLAST allignments of PDB chains are
> stored. I've found one as a part of a PISCES server,
> but it is incomplete and contains some internal
> inconsistensies. Could someone suggest me a better one
> or there is a simpler way out?

It is not a stupid question, but rather a common problem for the whole 
field! It would be useful if you could describe the problems you are 
having with PISCES, as that is a very popular and commonly used database.

The simplest approach I can think of is to combine your list of proteins 
with a full fasta database of the PDB (unless your proteins are already 
in that fasta file), and then run CD-HIT on the fasta file (with your 
own choice of sequence identity clustering threshold)...


'The same' proteins (defined here by sequence identity) will be found in 
the same CD-HIT clusters.

Hmm... That reminds me...

> Best,
> Semen
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> _______________________________________________
> Bioinformatics.Org general forum  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board

More information about the BBB mailing list