[BiO BB] time efficient global alignment algorithm

Mike Marchywka marchywka at hotmail.com
Mon Aug 3 16:29:02 EDT 2009









----------------------------------------
> Date: Mon, 3 Aug 2009 13:45:57 -0400
> From: golhara
> To: bbb at bioinformatics.org
> Subject: [BiO BB] time efficient global alignment algorithm
>
> I'm trying to perform a large amount of sequence alignments of long DNA
> sequences, some up to 163,000+ bp in length. I was trying to use the
> standard Needleman-Wunsch algorithm, but the matrix used requires a
> large amount of memory...about 100 GB of memory. This obviously won't work.

How many were you trying to align? You mean 163kb or 163Mb?
I was looking for test or comparisons for some alignment code I 
had which indexed the target sequences, don't recall the suggestions
for that discussion but I was able to do simple genomes reasonably well 
( I think I used 2 strains of e coli or something about 5 megs long)
on a desktop. If you can find responses to my request from a few years 
ago that may ( or may not ) help. I'd offer my code, and indeed I think
I have it on a website, but I stopped development and not sure
it is nearly useful as-is unless you just want coarse alignment on
two similar sequences. 

Many implementations of just about anything are bad with
memory management- sometimes just blocking or sorting or
compacting the internal representation can make a big improvement.
Not sure what exists along these lines but often some simplifcations
don't change results but decrease time/memory on futile possibilities. 


>
> I tried using stretcher from the EMBOSS package, but it takes way too
> long to align each pair of sequences. I'm looking for something that
> can perform alignments fast using a reasonable amount of memory.
>
> I found one tool, called AVID, but have been unsuccessful in getting it
> to run to the sequence set I have.
>
> Before I go an try to develop a new solution to this, does anyone have
> or recommend a program to perform a large number of global pairwise
> alignments for long sequences?

Are all of these nominally the same or are you trying to align
noise to noise? 

>
> Ideally, something with the speed similar to BLAST.

I guess in an odd way my approach could get there as it essentially 
queries each string for "interesting" short sequences but I'd have to
check order ( howmany of these does it use etc). Last time I checked the
academic lit, IIRC this exact-string matching was an open research area maybe there have been advancements
in last few years that are trivial to code or exist in an academic's lab.

>
> Ryan
>
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb

_________________________________________________________________
Get your vacation photos on your phone!
http://windowsliveformobile.com/en-us/photos/default.aspx?&OCID=0809TL-HM



More information about the BBB mailing list