[Bioclusters] Assembly_contd

Rob Edwards redwards at salmonella.org
Thu Jul 6 21:50:29 EDT 2006

I have a wrapper for phrap that will take sequences and assemble them  
in batches in parallel using the SGE.

The wrapper is here:

and it uses the Schedule::SGE interface available from CPAN, and you  
will need to supply phrap, of course.

This was written for a specific assembly problem, but I think it may  
work for others. Basically it takes a fasta file and quality scores  
file, and assembles that in user defined subset of sequences. Then it  
takes all those sequences, and can assemble those too. The key is to  
randomize the input order of the sequences each time.

Usually disclaimers apply about using with caution, not guaranteed  
under any circumstances, the assemblies may be wrong, etc etc. But it  
may work and they may not be :)


On Jul 6, 2006, at 5:46 PM, francois.fauteux2 at mail.mcgill.ca wrote:

> Hi; thanks very much for reply;
> There seems to be two main streams in assembly, apart from that of
> TIGR, one is CAP3 and the other one is PHRAP (http://www.phrap.org/).
> For running jobs more efficiently, there is also PaCE (see  
> Kalyanaraman
> NO. 12, DECEMBER 2003) and probaly other tools/algorithms also.
> We do prefer the PHRAP assembly suite of tools. It is available for
> several platforms, and apart from setting up a SOLARIS/SPARC SMP for
> that purpose, we aimed to try it locally in a small Mac cluster. I do
> not know if PHRAP is compatible with SGE and'll find out as soon as  
> the
> cluster's on by trying a small set of sequences for assembly. PHRAP
> runs OK with smaller sets of sequences on a single Mac.


More information about the Bioclusters mailing list