[BiO BB] FW: finding sequence helix/sheet in pdb

Indraneel Majumdar indraneel at indialine.org
Sun Apr 22 19:45:28 EDT 2001


Hi,

The PDB mostly uses the DSSP program and sometimes sstruc. You might
try to look at the DSSP database (output of the DSSP program) via the
SRS but it doesn't seem to be in any human readable format.

As far as the PDB goes, the A,B,C etc are in the next column (the PDB
was designed by Fortran programmers) after the residue number, which is
otherwise left blank. So you should not have any trouble with that if
you check that column too for all cases in your script. What are you
using?

Are you storing this in any RDBMS? If so then you might need to define
two more attributes for each residue ID. (A better method would be to
define your own data type for residue ID, which stores a number and a
character, if the database system allows that. PostgreSQL does.) Your
script would be able to directly dump the data into the database that
way.

IMHO this is hard work, so best wishes,
Indraneel

On Tue, Apr 17, 2001 at 02:40:56PM -0500, Mathura, Venkatarajan S. wrote:
> 
>  Thanks kiran.  I know I can do it with RASMOl/MOLMOL but I would like
> to do it an very large scale. I mean the entire pdb  (just for proteins)
> or atleast non-redundant structures.  Currently I am using the pdb
> header HELIX  or SHEET information. The script looks for starting/ending
> helix and sheet residue info, and reads out single letter seq for the
> sec structure regions using the residue number and start/end info. I
> have already done this for 2096 pdb files and was quiet successful. But
> the script doesn't handle cases where the residue numbers were numbered
> by 15A,16B etc  (particularly in case of mutation). I am looking for
> someone who have already extracted all  helix sequences in the pdb (I
> mean 12000 structures) or who have scripts that can better handle the
> problem mentioned above.  I would like to stick to the secondary
> structure information from the pdb header (as supplied by authors)
> rather using other programs to write out sec structures from just
> co-ordinates alone.

-- 
http://www.indialine.org




More information about the BBB mailing list