<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML xmlns:o = "urn:schemas-microsoft-com:office:office"><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.2900.2802" name=GENERATOR></HEAD>
<BODY>
<DIV dir=ltr align=left><SPAN class=679033218-10042006><FONT face=Verdana
color=#0000ff size=2>Hi, </FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=679033218-10042006> <FONT
face=Verdana color=#0000ff size=2>I don't know exactly what you are looking for,
but if you assume all polymorphisms are single base substitutions and that there
are no insertions or deletions (is this correct??), then the basic code is
pretty easy. Just look at each position in each sequence and see if it matches
the reference. If so, keep going. If not, record a polymorphism.
</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=679033218-10042006> <FONT
face=Verdana color=#0000ff size=2>Allowing insertions for deletions is trickier
because there is a chance that your sequences will get out of alignment with
each other and that would cause massive problems. You would probably have to
check alignment with every position. I am not sure off hand what the best way to
do this would be, but I think it would not be too hard...</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=679033218-10042006><FONT face=Verdana
color=#0000ff size=2>Ethan</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=679033218-10042006><FONT face=Verdana
color=#0000ff size=2></FONT></SPAN> </DIV><FONT face=Verdana color=#0000ff
size=2>Ethan Strauss Ph.D.<BR>Bioinformatics Scientist<BR>Promega
Corporation<BR>2800 Woods Hollow Rd.<BR>Madison, WI
53711<BR>608-274-4330<BR>800-356-9526<BR><A
href="mailto:ethan.strauss@promega.com">ethan.strauss@promega.com</A></FONT><BR>
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B>
biodevelopers-bounces+ethan.strauss=promega.com@bioinformatics.org
[mailto:biodevelopers-bounces+ethan.strauss=promega.com@bioinformatics.org]
<B>On Behalf Of </B>David Whyte<BR><B>Sent:</B> Saturday, April 08, 2006 4:31
PM<BR><B>To:</B> biodevelopers@bioinformatics.org<BR><B>Subject:</B>
[Biodevelopers] batch tool for finding mitochondrial
DNApolymorphisms<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV><FONT face=Arial size=2>
<DIV><FONT face=Arial size=2>
<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT face="Times New Roman"
size=3>Hi,</FONT></P>
<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT face="Times New Roman"
size=3>I have a bioinformatics project that involves finding polymorphisms in
mitochondrial DNA (mtDNA).<SPAN style="mso-spacerun: yes"> </SPAN>The
polymorphisms are typically denoted as "reference base/position/polymorphic
base", as in A750G.<SPAN style="mso-spacerun: yes"> </SPAN>I'd like to add
a software tool to our company website where a visitor could paste in a set of
mitochondrial genomes, and a reference sequence, and get back a list of
polymorphisms.<SPAN style="mso-spacerun: yes"> </SPAN>Something
like:</FONT></P>
<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT size=3><FONT
face="Times New Roman"> <o:p></o:p></FONT></FONT></P>
<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT face="Times New Roman"
size=3>>Seq1</FONT></P>
<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT face="Times New Roman"
size=3>A458G, T4899A....</FONT></P>
<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT face="Times New Roman"
size=3>>SEQ2</FONT></P>
<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT face="Times New Roman"
size=3>T678C, G6789C....</FONT></P>
<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT size=3><FONT
face="Times New Roman">etc.<SPAN style="mso-spacerun: yes">
</SPAN></FONT></FONT></P>
<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT size=3><FONT
face="Times New Roman"> <o:p></o:p></FONT></FONT></P>
<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT face="Times New Roman"
size=3>We sequence mitochondrial DNA for customers interested in learning about
their ancient ancestry.</FONT></P>
<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT size=3><FONT
face="Times New Roman"> </FONT></FONT><FONT size=3><FONT
face="Times New Roman"> <o:p></o:p></FONT></FONT></P>
<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT
face="Times New Roman"><FONT size=3>The site will be freely available.<SPAN
class=562320221-08042006> It will be attached to our company site, <A
href="http://www.argusbio.com/">www.argusbio.com</A>, which is still in
development at LunarPages. The author's name and an email link
could be listed on the page.</SPAN></FONT></FONT></P>
<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT size=3><FONT
face="Times New Roman"> <o:p></o:p></FONT></FONT></P>
<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT face="Times New Roman"
size=3>A full-length genome is 16,569 bases long.<SPAN
style="mso-spacerun: yes"> </SPAN>Typically two people will have around 30
to 50 differences in their mtDNAs - more (but less than 100) if they have very
different ancestry (African vs European, for example).<SPAN
style="mso-spacerun: yes"> </SPAN>These polymorphisms determine the
person’s mitochondrial haplogroup.</FONT></P>
<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT size=3><FONT
face="Times New Roman"> <o:p></o:p></FONT></FONT></P>
<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT face="Times New Roman"
size=3>It would be very helpful if the program were able to determine which
haplogroup the mtDNA belongs in based on the list of polymorphisms.<SPAN
style="mso-spacerun: yes"> </SPAN>I have tables of diagnostic
polymorphisms used for classing mt genomes.</FONT></P>
<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT size=3><FONT
face="Times New Roman"> <o:p></o:p></FONT></FONT></P>
<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT face="Times New Roman"
size=3>It would also be very useful if there were an option to generate a fasta
file that consisted of just polymorphic sites.<SPAN
style="mso-spacerun: yes"> </SPAN>So if someone put in 100 full-length
genomes, and a reference genome, the output would be fasta sequences where each
base varied from the reference in at least one test sequence.<SPAN
style="mso-spacerun: yes"> </SPAN>This output would be much easier to
align with CLUSTALW than the full-length sequences, which are typically > 99%
invariant. </FONT></P>
<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT size=3><FONT
face="Times New Roman"> <o:p></o:p></FONT></FONT></P>
<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT size=3><FONT
face="Times New Roman">I am looking for some ideas of how best to implement this
web-based tool.<SPAN style="mso-spacerun: yes"> </SPAN></FONT></FONT></P>
<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT size=3><FONT
face="Times New Roman"> <o:p></o:p></FONT></FONT></P>
<P class=MsoNormal style="MARGIN: 0mm 0mm 0pt"><FONT face="Times New Roman"
size=3>Thanks,</FONT></P><SPAN
style="FONT-SIZE: 12pt; FONT-FAMILY: 'Times New Roman'; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA"><SPAN
style="mso-spacerun: yes"></SPAN></SPAN></FONT></DIV>
<P>David B. Whyte, Ph.D.<BR>Argus Biosciences, LLC<BR>650-954-1055</P>
<P><SPAN class=562320221-08042006></SPAN><A
href="mailto:dwhyte@argusbio.com">d<SPAN
class=562320221-08042006>whyte@argusbio.com</A></SPAN><BR><A
href="http://www.argusbio.com/">www.argusbio.com</A><BR> </P></FONT></DIV>
<DIV> </DIV></BODY></HTML>