[BiO BB] ortholog

Mike Marchywka marchywka at hotmail.com
Thu Sep 13 17:37:48 EDT 2007


If you are interested in doing custom comparisons or looking for differences 
by functional area,
it turns out that the prosite rules seem to be downloadable , they sent me
a link today:

ftp://ftp.expasy.org/databases/prosite/prosite.dat

I had to ignore their matrix data but their pattern library was easily 
convertible
into PERL ( as far as I have looked, obvious caveats for bugs etc- the 
canned
c++ regex code I lifted from Microsoft may not be bug free etc) and
it gave me quick graphical and textual compare results on 1000+ rules.
The point here is that you can make your own rules as you read the 
literature
( that is my plan anyway ) and implement ad hoc splicing or translation 
schemes
( pretend you want to model flakey ribosomes).
Anyway, I get stuff like this:

Translated rule matches generates rule hit files:

$ $progpath/rules_annotater -clean -which 1 -fastas o2_fasta -xrules 
$progpath/prosite_rules > pro1

$ $progpath/mm_align_tool -fastas o2_fasta -rules pro0 -rules pro1 -stats
For Rules set 0:>ref|NW_876253.1|Cfa11_WGA39_2:47189155-47195387 Canis 
familiar
is chromosome 11 genomic contig, whole genome shotgun sequence
97         >rule|13|PEPDTIDE Prosite MICROBODIES_CTER
68         >rule|3|PEPDTIDE Prosite PKC_PHOSPHO_SITE
64         >rule|6|PEPDTIDE Prosite MYRISTYL
47         >rule|4|PEPDTIDE Prosite CK2_PHOSPHO_SITE
46         >rule|11|PEPDTIDE Prosite PRENYLATION
30         >rule|1|PEPDTIDE Prosite ASN_GLYCOSYLATION
10         >rule|2|PEPDTIDE Prosite CAMP_PHOSPHO_SITE
10         >rule|5|PEPDTIDE Prosite TYR_PHOSPHO_SITE
6          >rule|7|PEPDTIDE Prosite AMIDATION
3          >rule|87|PEPDTIDE Prosite LEUCINE_ZIPPER
2          >rule|12|PEPDTIDE Prosite ER_TARGET
1          >rule|1087|PEPDTIDE Prosite THIONIN
1          >rule|973|PEPDTIDE Prosite TUBULIN_B_AUTOREG
For Rules set 1:>gb|AACN010493556.1|:1-1146 Canis familiaris 
ctg19866850213054,
whole genome shotgun sequence
23         >rule|13|PEPDTIDE Prosite MICROBODIES_CTER
9          >rule|1|PEPDTIDE Prosite ASN_GLYCOSYLATION
8          >rule|11|PEPDTIDE Prosite PRENYLATION
8          >rule|3|PEPDTIDE Prosite PKC_PHOSPHO_SITE
8          >rule|6|PEPDTIDE Prosite MYRISTYL
7          >rule|4|PEPDTIDE Prosite CK2_PHOSPHO_SITE
2          >rule|5|PEPDTIDE Prosite TYR_PHOSPHO_SITE
1          >rule|12|PEPDTIDE Prosite ER_TARGET
1          >rule|7|PEPDTIDE Prosite AMIDATION
1          >rule|87|PEPDTIDE Prosite LEUCINE_ZIPPER


This turned out to be easyto align as the sequences are largely identical ( 
the lone "G" is
the mismatch in this excerpt ) but you get the idea:

$ $progpath/mm_align_tool -fastas o2_fasta -rules pro0 -rules pro1 -use_rule 
13
-align -output text
[...]
Start at 696 and 2373:
          GGCCATTTTGCAACTCATGCATGAGCTACCTTTAGTTCCCCTTCTACATCTGAGAACTGT

          CCCATATAGAATATTTTATAAAACAAGATGGCATTGTGCTAAGTAAAATGCAGAACAAAA
                                     G
          TCAGTATCCCATTAGACATGTCATATTCAGAGTTTATTTTTATCCTTGCACTGAAAGAAT

          GATTGTAAATCAATGGTTTCTTTTTGTTTCTTGACTGTGGCAGTGTTCTGGCTCCAAATG

          ATGGAGATTCCAAATAAGCATTACAGCTTGGCAGGAAATGCCAGTTCAGATATTTGTGAG

          ATCCTAAAGAATAGATCTGGACACATAT

_________________________________________________________________
More photos; more messages; more whatever. Windows Live Hotmail - NOW with 
5GB storage. 
http://imagine-windowslive.com/hotmail/?locale=en-us&ocid=TXT_TAGHM_migration_HM_mini_5G_0907




More information about the BBB mailing list