[BiO BB] How to find the same proteins?
Mike Marchywka
mmarchywka at eyewonder.com
Thu Mar 23 09:47:58 EST 2006
If my earlier reply ever gets by the moderator you will see that generally
nlm supports automated searches via eutils but they appear to
support blast only via a special utility. The clustering added from your site
is a nice additional feature but it is amazingly easy to download clustering
software from many sources and run with scripts for any purpose- I used gene expression array software
to organize authors from a biotech message board.
*************************************************************************
Mike Marchywka
EyeWonder
Instant Streaming, Infinite Results
1447 Peachtree Street
9th Floor
Atlanta, GA 30309
w.678-891-2033
c.
h.770-565-8101
mmarchywka at eyewonder.com
alt: marchywka at hotmail.com
Instant Streaming, Intelligent results.
*************************************************************************
-----Original Message-----
From:
bio_bulletin_board-bounces+mmarchywka=eyewonder.com at bioinformatics.org
[mailto:bio_bulletin_board-bounces+mmarchywka=eyewonder.com at bioinformati
cs.org]On Behalf Of Dan Bolser
Sent: ThursdayMarch-23-2006 09:18 AM
To: The general forum at Bioinformatics.Org
Subject: Re: [BiO BB] How to find the same proteins?
Semen Esilevsky wrote:
> Dear all,
> I'm a novice in bioinformatics and this question is
> probably stupid, but...
> I have a list of ~200 PDB id's. For each of them I
> have to build a list of all entries in PDB, which
> represent the same protein (say, >99% sequence
> similarity and no large gaps). Could someone suggest
> me the least painfull way of doing this?
> As far as I understand all what I need is the database
> where all pairwice BLAST allignments of PDB chains are
> stored. I've found one as a part of a PISCES server,
> but it is incomplete and contains some internal
> inconsistensies. Could someone suggest me a better one
> or there is a simpler way out?
It is not a stupid question, but rather a common problem for the whole
field! It would be useful if you could describe the problems you are
having with PISCES, as that is a very popular and commonly used database.
The simplest approach I can think of is to combine your list of proteins
with a full fasta database of the PDB (unless your proteins are already
in that fasta file), and then run CD-HIT on the fasta file (with your
own choice of sequence identity clustering threshold)...
http://bioinformatics.org/cd-hit/
'The same' proteins (defined here by sequence identity) will be found in
the same CD-HIT clusters.
Hmm... That reminds me...
> Best,
> Semen
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
_______________________________________________
Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
More information about the BBB
mailing list