[BiO BB] looking for conserved domain downloadable databases.

Mike Marchywka marchywka at hotmail.com
Sat Jul 26 15:30:13 EDT 2008

I was trying to find something like the prosite rules database that may be include more conserved domains.
That is, I've got a bunch of short peptides and I want to determine if any of them have functional
significance. I would imagine that function prediction servers may have such database but probably
not in downloadable form. In particular, I took about 3000 short sequences that have something to
do with cell cycle arrest ( eutilsnew is my own script but you get the idea), 
eutilsnew -protein -v -out stuff '"cell cycle" arrest'
$progpath/file_parsing -fastas stuff  stuff_fasta

I have a way to get the most frequently occuring short strings. In this case, I got some interesting hits,
( and also found out that "M" occurs at the start quite often, adding some confidence that the code is running

 $progpath/string_test -fastas stuff_fasta  -status -conserved | grep [A-Z] | sort -g -r -k 2 > cca_roots
$ head cca_roots
   M 2321
PENL 565
   L 545
FENL 461
YENL 458
   F 456
   W 455
WENL 454
  MS 425
RSPS 396

In any case, I wanted to see if the regular expression [PFYW]ENL means anything.
First, I did get a control group,
( only got the first 1500 and used ctrl-c to "select" the first few),

eutilsnew -v -protein -out some_hydo "hydroxylase"

$ head hydro_roots
   M 1418
GDAA 312
GAGL 308
DAAH 299
AGLL 283
GLLS 266
IGLA 263
PVAG 258
LLSS 253
AGQG 253

The prosite rule list that I have shows some "ENL" candidates explicitly( non of which
include PWY or W as a leading acid )  and maybe more that
are more cryptic,

$ grep ENL /cygdrive/c/mydocs/scripts/cc/affx/prosite_rules
P.{2}[LIVMF]{2}[LIVMS].[GDN].{3}[DENL].{3}[LIVM].E.{4}[GNQKRH][LIVM][AP]>rule|216|PEPDTIDE Prosite RIBOSOMAL_S2_2
K[LIVMF]DG[LIVMAS][SAG].{4}Y.{2}[GRD].[LF].{4}[ST]RG[DN]G.{2}G[DE][DENL]>rule|832|PEPDTIDE Prosite DNA_LIGASE_N1

but that is all I have to go on. I did a quick look at NCBI CDART and related pfam resources but
couldn't figure out how to download anything useful.  I couldn't immediately get blast to return
any hits on "ENL" and I'm not sure what all parameters I'd need to tweak to search on short things.


Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
415-264-8477 (w)<- use this
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
marchywka at hotmail.com
Note: If I am asking for free stuff, I normally use for hobby/non-profit
information but may use in investment forums, public and private.
Please indicate any concerns if applicable.
Note: Hotmail is possibly blocking my mom's entire
ISP - try  me on marchywka at yahoo.com if no reply
here. Thanks.

Time for vacation? WIN what you need- enter now!

More information about the BBB mailing list