[BiO BB] ortholog
Mike Marchywka
marchywka at hotmail.com
Thu Sep 13 17:37:48 EDT 2007
If you are interested in doing custom comparisons or looking for differences
by functional area,
it turns out that the prosite rules seem to be downloadable , they sent me
a link today:
ftp://ftp.expasy.org/databases/prosite/prosite.dat
I had to ignore their matrix data but their pattern library was easily
convertible
into PERL ( as far as I have looked, obvious caveats for bugs etc- the
canned
c++ regex code I lifted from Microsoft may not be bug free etc) and
it gave me quick graphical and textual compare results on 1000+ rules.
The point here is that you can make your own rules as you read the
literature
( that is my plan anyway ) and implement ad hoc splicing or translation
schemes
( pretend you want to model flakey ribosomes).
Anyway, I get stuff like this:
Translated rule matches generates rule hit files:
$ $progpath/rules_annotater -clean -which 1 -fastas o2_fasta -xrules
$progpath/prosite_rules > pro1
$ $progpath/mm_align_tool -fastas o2_fasta -rules pro0 -rules pro1 -stats
For Rules set 0:>ref|NW_876253.1|Cfa11_WGA39_2:47189155-47195387 Canis
familiar
is chromosome 11 genomic contig, whole genome shotgun sequence
97 >rule|13|PEPDTIDE Prosite MICROBODIES_CTER
68 >rule|3|PEPDTIDE Prosite PKC_PHOSPHO_SITE
64 >rule|6|PEPDTIDE Prosite MYRISTYL
47 >rule|4|PEPDTIDE Prosite CK2_PHOSPHO_SITE
46 >rule|11|PEPDTIDE Prosite PRENYLATION
30 >rule|1|PEPDTIDE Prosite ASN_GLYCOSYLATION
10 >rule|2|PEPDTIDE Prosite CAMP_PHOSPHO_SITE
10 >rule|5|PEPDTIDE Prosite TYR_PHOSPHO_SITE
6 >rule|7|PEPDTIDE Prosite AMIDATION
3 >rule|87|PEPDTIDE Prosite LEUCINE_ZIPPER
2 >rule|12|PEPDTIDE Prosite ER_TARGET
1 >rule|1087|PEPDTIDE Prosite THIONIN
1 >rule|973|PEPDTIDE Prosite TUBULIN_B_AUTOREG
For Rules set 1:>gb|AACN010493556.1|:1-1146 Canis familiaris
ctg19866850213054,
whole genome shotgun sequence
23 >rule|13|PEPDTIDE Prosite MICROBODIES_CTER
9 >rule|1|PEPDTIDE Prosite ASN_GLYCOSYLATION
8 >rule|11|PEPDTIDE Prosite PRENYLATION
8 >rule|3|PEPDTIDE Prosite PKC_PHOSPHO_SITE
8 >rule|6|PEPDTIDE Prosite MYRISTYL
7 >rule|4|PEPDTIDE Prosite CK2_PHOSPHO_SITE
2 >rule|5|PEPDTIDE Prosite TYR_PHOSPHO_SITE
1 >rule|12|PEPDTIDE Prosite ER_TARGET
1 >rule|7|PEPDTIDE Prosite AMIDATION
1 >rule|87|PEPDTIDE Prosite LEUCINE_ZIPPER
This turned out to be easyto align as the sequences are largely identical (
the lone "G" is
the mismatch in this excerpt ) but you get the idea:
$ $progpath/mm_align_tool -fastas o2_fasta -rules pro0 -rules pro1 -use_rule
13
-align -output text
[...]
Start at 696 and 2373:
GGCCATTTTGCAACTCATGCATGAGCTACCTTTAGTTCCCCTTCTACATCTGAGAACTGT
CCCATATAGAATATTTTATAAAACAAGATGGCATTGTGCTAAGTAAAATGCAGAACAAAA
G
TCAGTATCCCATTAGACATGTCATATTCAGAGTTTATTTTTATCCTTGCACTGAAAGAAT
GATTGTAAATCAATGGTTTCTTTTTGTTTCTTGACTGTGGCAGTGTTCTGGCTCCAAATG
ATGGAGATTCCAAATAAGCATTACAGCTTGGCAGGAAATGCCAGTTCAGATATTTGTGAG
ATCCTAAAGAATAGATCTGGACACATAT
_________________________________________________________________
More photos; more messages; more whatever. Windows Live Hotmail - NOW with
5GB storage.
http://imagine-windowslive.com/hotmail/?locale=en-us&ocid=TXT_TAGHM_migration_HM_mini_5G_0907
More information about the BBB
mailing list