[BiO BB] looking for reference on DSCAM exon locations.

Mike Marchywka marchywka at hotmail.com
Mon Feb 4 08:22:45 EST 2008




Hi,
I'm using DSCAM, and mostly fly DSCAM, as a test case to develop more general
tools for exploring base sequences. Some early results don't appear
to be trivially wrong, but I have a few missing pieces of info I can't
quite locate to further explore initial output. If you could point me to a link
that may address these issues that would be most helpful. Essentially,
I just need to know the exact location of each exon variant, preferably in
as many species as possible but so far I have only located this for exon 4
and otherwise had to guess from ref [3] results. 

I'm trying to generalize results in [4] and [5] and search for DNA features that
may suggest splice rules or answer some questions posed in [6]. 
I'm searching for stem-loop structures as in [4] and [5],
as well as reverse-complement matches that may be well separated as in [8].

>From [1], I gather that there are a certain number of exon variants for melanogaster.
Notably, 12 for exon 4, 48 for exon 6, and 33 for exon 9. 
I can get exact locations for exon 4 and 5 starts from [2], but am stuck using
ambiguous flybase exons. From [3], I end up with 98 exons which is short of the
100+ I get from adding up earlier variants or the 115 cited in [6]( or [7] ). 


I tried a sloppy version of the stem-loop in [4] that relates to pseudoexon.
In my bastardized regular-expression format (I'm using '[]' for a group, not the normal
PERL convention, don't ask..., and implicitly match one group to its reverse complement-
and,yes, the quantifiers are redundant. Otherwise, this is just a PERL REGEX): 
[\1]{6,6}.{2,10}[\2]{2,3}.{1,8}[\3]{1,5}.{0,4}[\3]{1,5}.{1,8}[\2]{2,3}.{2,10}[\1]{6,6}.{6,11}[\4]{4,7}.{0,4}[\4]{4,7}> RC5|5|CFTR

I was rather excited that these "hits" are in many locations BUT are excluded in the
range of exon 4 variant. In particular, this mish-mash of hits shows where things seem to
occur. It appears that exon 6 may or may not obey similar distribtion of hits.
Each line is the location of some "rule hit" where the first number is location in genome,
"Dscam" indicates the flybase exon number, "RC5" is my rule hit, and the
other things are rule hits to known locations such as exon 4 starts: 
( I tried to make the numbers useful to outside reader but this confuses things
like the exon 4 rule hits that end where Dscam starts- my hits are leaders, the dscam
labelled hits are where exon actually starts): 

$ cat flybase_exon_starts ffx fg | sort -g | awk '{$1=3269374-$1; print $0;}' | more> mish_mash.txt

3269374 Drosophila melanogaster chromosome 2R
3269374 Drosophila melanogaster chromosome 2R
3269375 Dscam:98
3268566 875 TATTTCATGCTACTTTTTATTTATAAATCGAGTTTTAGAGGAAATAATTGCAGTCCCTGAATTTTCAG> RC5|5|CFTR
3267836 1597 TAATTTCTGTTTACATTGATACTCCGCTTAATGTAAATTATTATACTTATTTTACAATAA> RC5|5|CFTR
3265322 4114 TTTATAAGCACAAAAGGAGTAGCCCCTATAAAAAATGTATAAACAAAATAAATCATATAATAT> RC5|5|CFTR
3265220 Dscam:97
3264108 5333 ATTTATTCCTCCATTTTACTTTTCCCTATTATCGTAATAATTGATAAATTGCATATGCAAACTATTTG> RC5|5|CFTR
3263246 6195 AATGCGATGTTTATGTTGTTGTTCCTGTCTCCGCTACAGTCGGACGCATTTAATTCGCAATTTCATTG> RC5|5|CFTR
3259916 9510 TTGCTTAAATTAATTAAAGCATTGGCTTAAAGAAGCAAAGAATCTATAATTAT> RC5|5|CFTR
3257467 11974 TTAAACTATTACTTTATAGATAAAAGTATATCCTCACAATAATTTTGTTTAACAAATGCATTCAAATG> RC5|5|CFTR
3257239 12192 AATTGTTCATTGCATTCACATTATTTAATTAACAATTAATAAATAATTTTATTTTAAA> RC5|5|CFTR
3257148 12285 TAAGAACATAACTATACTTATTCTGTGCCTTTGAGCTTTCTTATATTAATGGATTTAAAT> RC5|5|CFTR
3256238 Dscam:96
3255817 13612 TTAAAAAAGGATAGATATGAGCTTTATATATTTTTAAAAAGTTTAAAAAAATATTT> RC5|5|CFTR
3254485 14902 CGGCCTTTTCCCAG>local|i|DNA Fly DCAM Exon 4.1
3254472 Dscam:95
3254146 15241 TCCTACCTGTTTAG>local|i|DNA Fly DCAM Exon 4.2
3254133 Dscam:94
3253623 15764 CATTGCTGTTTTAG>local|i|DNA Fly DCAM Exon 4.3
3253610 Dscam:93
3253001 16386 GAACTCACCTTCAG>local|i|DNA Fly DCAM Exon 4.4
3252988 Dscam:92
3252698 16689 CTCTTGCTTTACAG>local|i|DNA Fly DCAM Exon 4.5
3252685 Dscam:91
3252412 16975 ATTTTAAATCGCAG>local|i|DNA Fly DCAM Exon 4.6
3252399 Dscam:90
3252136 17251 GCACACCTTTGCAG>local|i|DNA Fly DCAM Exon 4.7
3252123 Dscam:89
3251867 17520 TATTCGATTCAAAG>local|i|DNA Fly DCAM Exon 4.8
3251854 Dscam:88
3251567 17820 TTCTATCGACTCAG>local|i|DNA Fly DCAM Exon 4.9
3251554 Dscam:87
3251284 18103 CTGATTTCCTTCAG>local|i|DNA Fly DCAM Exon 4.10
3251271 Dscam:86
3251009 18378 CTCCCGTCTTGCAG>local|i|DNA Fly DCAM Exon 4.11
3250996 Dscam:85
3250713 18674 CGTACACTTTGCAG>local|i|DNA Fly DCAM Exon 4.12
3250700 Dscam:84
3249574 19855 ATTTTTGCACAATTAAAAGTAACACAAAATGAAAAATGATTACCAGCCATGTGGCT> RC5|5|CFTR
3249386 20001 TATCAAAATATCAG>local|i|DNA Fly DCAM Exon 5
3249373 Dscam:83
3248960 20486 TTTGTATCTTTTGGAGTTTTCTCATCTACAGCTCAAATAGAATAGATACAAATCAAGTATTAAAATACATATT> RC5|5|CFTR
3248760 20675 AATTTAAAACTTATCATATTTCAAATATTTTTGAACACATAAATTTAATGTCAAATTGTTTG> RC5|5|CFTR
3248545 20904 TTTACAAATATAAATATATATATAATTCAATATAAATATTGAAATATCAAAAATGTAAATATTTAAAATGATATTT> RC5|5|CFTR
3248155 Dscam:82
3247920 Dscam:81
3247711 Dscam:80
3247513 Dscam:79
3247296 Dscam:78
3247071 Dscam:77
3246851 Dscam:76
3246645 Dscam:75
3246436 Dscam:74
3246233 Dscam:73
3245845 Dscam:72
3245421 Dscam:71
3245220 Dscam:70
3245029 Dscam:69
3244602 Dscam:68
3244374 Dscam:67
3244156 Dscam:66
3243946 Dscam:65
3243736 Dscam:64
3243530 Dscam:63
3243315 Dscam:62
3242920 Dscam:61
3242716 Dscam:60
3242511 Dscam:59
3242315 Dscam:58
3242055 27370 ATAGAATACGTACGGCTGGGTGAAATCGTTTCTATAATGTGTCCTGCGCAGG> RC5|5|CFTR
3241906 Dscam:57
3241442 Dscam:56
3241198 Dscam:55
3240871 Dscam:54
3240528 Dscam:53
3239545 Dscam:52
3239328 Dscam:51
3238953 30482 ATATTTATGATACGGGAATGTTAGATTTGATATTCAAATATACTCCACTTCTTTATGTTAAA> RC5|5|CFTR
3238803 Dscam:50
3238210 Dscam:49
3238003 Dscam:48
3237466 31973 CTACAACATCAATAAGTCCCATAAGAAGCATATTGTTATTACTTTTGTAGAGCCAGTTGGCGCCAA> RC5|5|CFTR
3237417 Dscam:47
3237019 Dscam:46
3236481 Dscam:45
3235516 Dscam:44
3235203 Dscam:43
3234956 34477 CGTGTGTGGCCAGGAATGCGGCCGGGGTCATCTACCACACGGCAGAGCTGCGCGTTAACG> RC5|5|CFTR
3234817 34627 CCTCGCCCTCCTCCGCAGTTCTGCCCCAGATCGTGCCCTTCGATTTTGGCGAGGAGACCGTCAACGAGTTG> RC5|5|CFTR
3234800 Dscam:42
3234435 Dscam:41
3234062 Dscam:40
3233672 Dscam:39
3233281 Dscam:38
3233199 36235 TCAAGGGGGACCTGCCCTTGAGAATCCACTGGACCTTGAATGGTGAGCCTGTGGCAACAGG> RC5|5|CFTR
3232857 Dscam:37
3232742 36707 CACTAAACTCGGCTCTCATTGTAAACGGTGAAATGGGATTCACGTTAGTGCGGCTGAATAAGCGAACCAGTTCGCT> RC5|5|CFTR
3232460 Dscam:36
3232075 Dscam:35
3231673 37750 ATATGATATTTGTGCTGAATGTCATATAAATCAGAAAAATTAGGTGTAAT> RC5|5|CFTR
3231128 Dscam:34
3230754 Dscam:33
3230387 Dscam:32
3229897 Dscam:31
3229501 Dscam:30
3229124 Dscam:29
3228738 Dscam:28
3228338 Dscam:27
3227948 Dscam:26
3227576 Dscam:25
3227196 Dscam:24
3226762 42662 AGTCTCTGTGACTTGTTTGATATCCAGTGGAGACTTACCCATCGATATCGA> RC5|5|CFTR
3226434 Dscam:23
3226043 Dscam:22
3225665 Dscam:21
3225287 Dscam:20
3225060 44371 TAGTTGCCGGGCAAAGAACTACGCAGCAGCCGTCAACTACAGCACTGAACTCATAGTT> RC5|5|CFTR
3224228 45215 CCCGTGGACATCACCTGGTTGTTCAATGACTATGCCATCAACGAGTATCACGGGGTCACCTCTTCCAAGA> RC5|5|CFTR
3223509 Dscam:19
3222724 Dscam:18
3222172 Dscam:17
3219886 Dscam:16
3219708 Dscam:15
3219235 50201 TCCGGAGATGCCATATGCTTTGAAGGTACTCGACAAATCCGGACGTTCCGTGCAGCTGAGCTG> RC5|5|CFTR
3218320 Dscam:14
3218106 Dscam:13
3217357 Dscam:12
3217195 Dscam:11
3217178 52257 GCTTCTGACATTTTGAACACCCGGACCAAGGGACAGAAGCCCAAGCTGCCCGAGAAACCTCG> RC5|5|CFTR
3216961 Dscam:10
3216459 52976 AACAAATTGCACAGTATATAAAATTATATTATTCCTATTTTTTGTTGTTCAAACCAAGCTTG> RC5|5|CFTR
3216293 53130 AAAATCATTAGTGTAAAATAATAATGATTTTTCTTACGTAAATGCAATTT> RC5|5|CFTR
3216028 53416 TTTTGTTCAGTTTTTCAGCTCACGTAAGGTTAAAAAAAAAAAAACAAAAGTAGAGCTTTCTTAAATTTTAA> RC5|5|CFTR
3214571 54860 CGAAAACGACTACATATCGACAAGTTAACCTTTGAATTTTTCGCCTGCCACAGTCTGT> RC5|5|CFTR
3214290 Dscam:9
3213870 Dscam:8
3213504 55919 TATTATCCTTTCATTTACAAAGATAATATTTTGCATCCAATTAACTAATT> RC5|5|CFTR
3212243 Dscam:7
3211474 Dscam:6
3211209 58231 GGCTTAATATGTCTGGATTAGCTAGTCTATAATCTATGTTAAGCCATACTGCCTCTACTCTTTGAGT> RC5|5|CFTR
3210838 Dscam:5
3210462 Dscam:4
3210224 Dscam:3
3209155 Dscam:2
3208270 Dscam:1






References
==========


[1] Graveley 2004 , http://www.rnajournal.org/cgi/reprint/10/10/1499
[2] Celotto and Graveley, 2001 http://www.genetics.org/cgi/reprint/159/2/599.pdf
[3] http://flybase.bio.indiana.edu/reports/FBgn0033159.html
[4] Buratti 2007, http://nar.oxfordjournals.org/cgi/reprint/35/13/4369
[5] Kreahling and Graveley 2005 http://mcb.asm.org/cgi/content/full/25/23/10251
[6] Olson... Graveley 2007 http://www.nature.com/nsmb/journal/v14/n12/full/nsmb1339.html
[7] ref 5 in [6], Schmucker etl al 2000 http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6WSN-4194S59-F&_user=10&_rdoc=1&_fmt=&_orig=search&_sort=d&view=c&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=d159aee1d55f9b955b8a9dc96344a5f4
[8] Anastassiou 2006 http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=1431710&blobtype=pdf






Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
marchywka at hotmail.com
Note: Hotmail is blocking my mom's entire
ISP claiming it is to reduce spam but probably
to force users to use hotmail. Please DON'T
assume I am ignoring you and try
me on marchywka at yahoo.com if no reply
here. Thanks.


_________________________________________________________________
Helping your favorite cause is as easy as instant messaging. You IM, we give.
http://im.live.com/Messenger/IM/Home/?source=text_hotmail_join



More information about the BBB mailing list