[BiO BB] Observation: multiple sequence alignment affected by theinput sequence order
Mike Marchywka
marchywka at hotmail.com
Thu Aug 16 20:45:01 EDT 2007
I've never bothered to check these details but
you really have to evaluate these ill-defined fits in light of some
objective. That is, given
two sequences you really don't know if one was generated from the other by
any
particular set of operations. It may even make sense, from the standpoint
offitting to
an evolution model, to assume one is derived from the other in
non-symmetric ways.
Perhaps it would make more sense to output a list of steps to turn one into
the other?
Clustal source code is available.
Having said that, I think I've actually got what you mention but only
because
I was lazy and my needs don't care about evolution of one string from
another.
If you take two strings, and generate a matrix of all possible comparisons,
you can
generate you own "best-fits." This one for example, recursively takes the
largest
exact matches irrepsective of offset ( so I think it is insensitive to
order)
and tries to align the leftovers in the same way. I've compared this
to clustalw and the clustalw "makes more sense" as this
thing seems to think nothing of inserting gaps ( obviously adjustable
parameters for a figure of merit is a nice feature...):
$ ./string_correlator abcdefghijkl abdddefhjkl
abc-defghijkl
abdddef-h-jkl
$ ./string_correlator abdddefhjkl abcdefghijkl
full one:11 12 132
ab{dd,c}def{,g}h{,i}jkl
abdddef-h-jkl
abc-defghijkl
I've been using this approach to make my own blast database of genomic
repeats- while
its too early to tell if this will be useful initial alignments with known
stuff like ORF's
seems encouraging ( hits from this database seem to occur in only a few
consistent
places in the few cases I've examined and do not appear to just place litter
in and around coding sequences. ).
Anyway, my question is, now that I have my own text and graphical alignment
tools, what software exists for taking a bunch of notes from various sources
( blast hits, genome annotations, etc) and aligning them in one picture or
text document? I have my own now that I'd like to discuss with interested
parties ( I'd be willing to post some gzipped bmp files too).
Thanks.
Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
marchywka at hotmail.com
Note: Hotmail is blocking my mom's entire
ISP claiming it is to reduce spam but probably
to force users to use hotmail. Please DON'T
assume I am ignoring you and try
me on marchywka at yahoo.com if no reply
here. Thanks.
>From: Hongyu Zhang <forward at hongyu.org>
[ deleted to meet size limits ]
_________________________________________________________________
More photos, more messages, more storageget 2GB with Windows Live Hotmail.
http://imagine-windowslive.com/hotmail/?locale=en-us&ocid=TXT_TAGHM_migration_HM_mini_2G_0507
More information about the BBB
mailing list