[Bioclusters] Assembly programs

don gilbert gilbertd at indiana.edu
Wed Jul 12 14:05:31 EDT 2006


We are using PCAP and Arachne to assembly Daphnia pulex (crustacean) and
compare/refine the JGI JAZZ assembly of this water flea.  Arachne is 
still designed mostly
for 1-cpu, but PCAP (and newer variant PCAPrep) is designed for 
cluster/grid usage,
although it has 1 step where most of the work needs to be done on 1-cpu 
with all
data in memory.

See some info in this proposal:
http://iubio.bio.indiana.edu/biogrid/teragrid-genomics06-proposal.pdf

We are using TeraGrid for most of the cpus, including
running Arachne at rachel.PSC.teragrid.org with its alpha processors.

X.  Huang, the author of PCAP, needed to provide us some small software 
adjustments to run on the
NCSA TeraGrid IA64 cluster (something about mismatched system libraries 
with his binary release),
PCAP's parallel parts work mainly as a set of independent processes 
(don't recall if it uses MPI
or not).

By the way, I'm working on general methods to use Grid systems, esp. US 
TeraGrid
as a replacement for maintaining your own local clusters in genome 
analyses.
Anyone interested in using such methods, please feel free to contact me.

In terms of assembly quality for Daphnia (about 200MB genome), in our 
hands Arachne provides
bigger, fewer scaffolds, but we are not yet sure if they are a better 
assembly.  PCAP has generated
more, smaller ones.  Daphnia has 12 chromosomes, we get a few thousand 
scaffolds
from Arachne, and 10,000s of scaffolds from PCAP, for this 9X WGS data 
set.

- Don Gilbert
-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- gilbertd at indiana.edu -- http://marmot.bio.indiana.edu/



More information about the Bioclusters mailing list