[BiO BB] How to find the same proteins?
Dan Bolser
dmb at mrc-dunn.cam.ac.uk
Thu Mar 23 09:18:25 EST 2006
Semen Esilevsky wrote:
> Dear all,
> I'm a novice in bioinformatics and this question is
> probably stupid, but...
> I have a list of ~200 PDB id's. For each of them I
> have to build a list of all entries in PDB, which
> represent the same protein (say, >99% sequence
> similarity and no large gaps). Could someone suggest
> me the least painfull way of doing this?
> As far as I understand all what I need is the database
> where all pairwice BLAST allignments of PDB chains are
> stored. I've found one as a part of a PISCES server,
> but it is incomplete and contains some internal
> inconsistensies. Could someone suggest me a better one
> or there is a simpler way out?
It is not a stupid question, but rather a common problem for the whole
field! It would be useful if you could describe the problems you are
having with PISCES, as that is a very popular and commonly used database.
The simplest approach I can think of is to combine your list of proteins
with a full fasta database of the PDB (unless your proteins are already
in that fasta file), and then run CD-HIT on the fasta file (with your
own choice of sequence identity clustering threshold)...
http://bioinformatics.org/cd-hit/
'The same' proteins (defined here by sequence identity) will be found in
the same CD-HIT clusters.
Hmm... That reminds me...
> Best,
> Semen
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
More information about the BBB
mailing list