<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML xmlns:o = "urn:schemas-microsoft-com:office:office"><HEAD>

<META http-equiv=Content-Type content="text/html; charset=us-ascii">

<META content="MSHTML 6.00.2900.2802" name=GENERATOR></HEAD>

<BODY>

<DIV dir=ltr align=left><SPAN class=679033218-10042006><FONT face=Verdana 

color=#0000ff size=2>Hi, </FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=679033218-10042006>&nbsp;&nbsp;&nbsp; <FONT 

face=Verdana color=#0000ff size=2>I don't know exactly what you are looking for, 

but if you assume all polymorphisms are single base substitutions and that there 

are no insertions or deletions (is this correct??), then the basic code is 

pretty easy. Just look at each position in each sequence and see if it matches 

the reference. If so, keep going. If not, record a polymorphism. 

</FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=679033218-10042006>&nbsp;&nbsp;&nbsp; <FONT 

face=Verdana color=#0000ff size=2>Allowing insertions for deletions is trickier 

because there is a chance that your sequences will get out of alignment with 

each other and that would cause massive problems. You would probably have to 

check alignment with every position. I am not sure off hand what the best way to 

do this would be, but I think it would not be too hard...</FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=679033218-10042006><FONT face=Verdana 

color=#0000ff size=2>Ethan</FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=679033218-10042006><FONT face=Verdana 

color=#0000ff size=2></FONT></SPAN>&nbsp;</DIV><FONT face=Verdana color=#0000ff 

size=2>Ethan Strauss Ph.D.<BR>Bioinformatics Scientist<BR>Promega 

Corporation<BR>2800 Woods Hollow Rd.<BR>Madison, WI 

53711<BR>608-274-4330<BR>800-356-9526<BR><A 

href="mailto:ethan.strauss@promega.com">ethan.strauss@promega.com</A></FONT><BR>

<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>

<HR tabIndex=-1>

<FONT face=Tahoma size=2><B>From:</B> 

biodevelopers-bounces+ethan.strauss=promega.com@bioinformatics.org 

[mailto:biodevelopers-bounces+ethan.strauss=promega.com@bioinformatics.org] 

<B>On Behalf Of </B>David Whyte<BR><B>Sent:</B> Saturday, April 08, 2006 4:31 

PM<BR><B>To:</B> biodevelopers@bioinformatics.org<BR><B>Subject:</B> 

[Biodevelopers] batch tool for finding mitochondrial 

DNApolymorphisms<BR></FONT><BR></DIV>

<DIV></DIV>

<DIV><FONT face=Arial size=2>

<DIV><FONT face=Arial size=2>

<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT face="Times New Roman" 

size=3>Hi,</FONT></P>

<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT face="Times New Roman" 

size=3>I have a bioinformatics project that involves finding polymorphisms in 

mitochondrial DNA (mtDNA).<SPAN style="mso-spacerun: yes">&nbsp; </SPAN>The 

polymorphisms are typically denoted as "reference base/position/polymorphic 

base", as in A750G.<SPAN style="mso-spacerun: yes">&nbsp; </SPAN>I'd like to add 

a software tool to our company website where a visitor could paste in a set of 

mitochondrial genomes, and a reference sequence, and get back a list of 

polymorphisms.<SPAN style="mso-spacerun: yes">&nbsp; </SPAN>Something 

like:</FONT></P>

<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT size=3><FONT 

face="Times New Roman">&nbsp;<o:p></o:p></FONT></FONT></P>

<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT face="Times New Roman" 

size=3>&gt;Seq1</FONT></P>

<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT face="Times New Roman" 

size=3>A458G, T4899A....</FONT></P>

<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT face="Times New Roman" 

size=3>&gt;SEQ2</FONT></P>

<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT face="Times New Roman" 

size=3>T678C, G6789C....</FONT></P>

<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT size=3><FONT 

face="Times New Roman">etc.<SPAN style="mso-spacerun: yes">&nbsp;&nbsp; 

</SPAN></FONT></FONT></P>

<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT size=3><FONT 

face="Times New Roman">&nbsp;<o:p></o:p></FONT></FONT></P>

<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT face="Times New Roman" 

size=3>We sequence mitochondrial DNA for customers interested in learning about 

their ancient ancestry.</FONT></P>

<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT size=3><FONT 

face="Times New Roman">&nbsp;</FONT></FONT><FONT size=3><FONT 

face="Times New Roman">&nbsp;<o:p></o:p></FONT></FONT></P>

<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT 

face="Times New Roman"><FONT size=3>The site will be freely available.<SPAN 

class=562320221-08042006>&nbsp; It will be attached to our company site, <A 

href="http://www.argusbio.com/">www.argusbio.com</A>, which is still in 

development at LunarPages.&nbsp; The author's name and an email link 

could&nbsp;be listed on&nbsp;the page.</SPAN></FONT></FONT></P>

<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT size=3><FONT 

face="Times New Roman">&nbsp;<o:p></o:p></FONT></FONT></P>

<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT face="Times New Roman" 

size=3>A full-length genome is 16,569 bases long.<SPAN 

style="mso-spacerun: yes">&nbsp; </SPAN>Typically two people will have around 30 

to 50 differences in their mtDNAs - more (but less than 100) if they have very 

different ancestry (African vs European, for example).<SPAN 

style="mso-spacerun: yes">&nbsp; </SPAN>These polymorphisms determine the 

person&#8217;s mitochondrial haplogroup.</FONT></P>

<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT size=3><FONT 

face="Times New Roman">&nbsp;<o:p></o:p></FONT></FONT></P>

<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT face="Times New Roman" 

size=3>It would be very helpful if the program were able to determine which 

haplogroup the mtDNA belongs in based on the list of polymorphisms.<SPAN 

style="mso-spacerun: yes">&nbsp; </SPAN>I have tables of diagnostic 

polymorphisms used for classing mt genomes.</FONT></P>

<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT size=3><FONT 

face="Times New Roman">&nbsp;<o:p></o:p></FONT></FONT></P>

<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT face="Times New Roman" 

size=3>It would also be very useful if there were an option to generate a fasta 

file that consisted of just polymorphic sites.<SPAN 

style="mso-spacerun: yes">&nbsp; </SPAN>So if someone put in 100 full-length 

genomes, and a reference genome, the output would be fasta sequences where each 

base varied from the reference in at least one test sequence.<SPAN 

style="mso-spacerun: yes">&nbsp; </SPAN>This output would be much easier to 

align with CLUSTALW than the full-length sequences, which are typically &gt; 99% 

invariant. </FONT></P>

<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT size=3><FONT 

face="Times New Roman">&nbsp;<o:p></o:p></FONT></FONT></P>

<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT size=3><FONT 

face="Times New Roman">I am looking for some ideas of how best to implement this 

web-based tool.<SPAN style="mso-spacerun: yes">&nbsp; </SPAN></FONT></FONT></P>

<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT size=3><FONT 

face="Times New Roman">&nbsp;<o:p></o:p></FONT></FONT></P>

<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT face="Times New Roman" 

size=3>Thanks,</FONT></P><SPAN 

style="FONT-SIZE: 12pt; FONT-FAMILY: 'Times New Roman'; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA"><SPAN 

style="mso-spacerun: yes"></SPAN></SPAN></FONT></DIV>

<P>David B. Whyte, Ph.D.<BR>Argus Biosciences, LLC<BR>650-954-1055</P>

<P><SPAN class=562320221-08042006></SPAN><A 

href="mailto:dwhyte@argusbio.com">d<SPAN 

class=562320221-08042006>whyte@argusbio.com</A></SPAN><BR><A 

href="http://www.argusbio.com/">www.argusbio.com</A><BR>&nbsp; </P></FONT></DIV>

<DIV>&nbsp;</DIV></BODY></HTML>