[BiO BB] Observation: multiple sequence alignment affected by theinput sequence order

Mike Marchywka marchywka at hotmail.com
Thu Aug 16 20:45:01 EDT 2007


I've never bothered to check these details but
you really have to evaluate these ill-defined fits in light of some 
objective. That is, given
two sequences you really don't know if one was generated from the other by 
any
particular set of operations. It may even make sense, from the standpoint 
offitting to
an evolution model, to assume one is derived from the other in  
non-symmetric ways.
Perhaps it would make more sense to output a list of steps to turn one into 
the other?
Clustal source code is available.

Having said that, I think I've actually got what you mention but only 
because
I was lazy and my needs don't care about evolution of one string from 
another.
If you take two strings, and generate a matrix of all possible comparisons, 
you can
generate you own "best-fits." This one for example, recursively takes the 
largest
exact matches irrepsective of offset ( so I think it is insensitive to 
order)
and tries to align the leftovers in the same way. I've compared this
to clustalw and the clustalw "makes more sense" as this
thing seems to think nothing of inserting gaps ( obviously adjustable
parameters for a figure of merit is a nice feature...):
$ ./string_correlator abcdefghijkl abdddefhjkl
abc-defghijkl
abdddef-h-jkl
$ ./string_correlator abdddefhjkl abcdefghijkl
full one:11 12 132
ab{dd,c}def{,g}h{,i}jkl
abdddef-h-jkl
abc-defghijkl

I've been using this approach to make my own blast database of genomic 
repeats- while
its too early to tell if this will be useful initial alignments with known 
stuff like ORF's
seems encouraging ( hits from this database seem to occur in only a few 
consistent
places in the few cases I've examined and do not appear to just place litter
in and around coding sequences. ).

Anyway, my question is, now that I have my own text and graphical alignment
tools, what software exists for taking a bunch of notes from various sources
( blast hits, genome annotations, etc) and aligning them in one picture or
text document? I have my own now that I'd like to discuss with interested
parties ( I'd be willing to post some gzipped bmp files too).

Thanks.


Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
marchywka at hotmail.com
Note: Hotmail is blocking my mom's entire
ISP claiming it is to reduce spam but probably
to force users to use hotmail. Please DON'T
assume I am ignoring you and try
me on marchywka at yahoo.com if no reply
here. Thanks.





>From: Hongyu Zhang <forward at hongyu.org>
[ deleted to meet size limits ]

_________________________________________________________________
More photos, more messages, more storage—get 2GB with Windows Live Hotmail. 
http://imagine-windowslive.com/hotmail/?locale=en-us&ocid=TXT_TAGHM_migration_HM_mini_2G_0507




More information about the BBB mailing list