[Bioclusters] Assembly programs

Thu Jul 6 19:41:15 EDT 2006

Have not had time time to dig into this further but I'm pulling these  
app names from notes I had taken during a recent conversation about  
assembly with someone ...

The person was a current heavy user of "CAP3" on a 32GB Solaris/sparc  
system and was looking at a program called "PCAP" as a way of running  
across a cluster since the 32GB memory machine was no longer  
performing well on large assembly problems. Also mentioned repeatedly  
as a possible parallel-and-low-memory-requirements alternative was EULER

CAP3: http://www.genome.org/cgi/content/full/9/9/868

PCAP and CAP3 seem to be from the same authors but the main website  
cited by google seems to be down at the moment.

EULER looks pretty interesting and seems to live here:
http://nbcr.sdsc.edu/euler/

-Chris

On Jul 6, 2006, at 7:26 PM, Joe Landman wrote:

> Hi folks:
>
>   Was asked recently about genome assembly, and I gave the answer  
> that Chris gave below.  What bugs me is that I haven't followed the  
> assembly work for a while, and all I remember are the TIGR tools.
>
>   Basically what I am asking is whether or not people have built  
> assembly algorithms to run on smaller memory machines, or do we  
> still need  large memory SMPs to do the job?  64GB and up, or can  
> we run some set of tools in under 16 GB on lots of cluster nodes?
>
>   Thanks!
>
> Joe
>
> Chris Dagdigian wrote:
>> Hi François,
>> First off, what assembly program are you trying to run on your  
>> cluster? Are you sure it is even capable of running in parallel  
>> across many machines? Most people I know doing assembly are doing  
>> it within a single large SMP system because shared memory is  
>> easier/faster and (I think...) there is a relative lack of "true  
>> parallel" assembly algorithms.
>> Here are some official grid engine helpful URLs:
>> - http://gridengine.sunsource.net (main site for the codebase)
>> - http://docs.sun.com/app/docs/coll/1017.3  (official  
>> documentation site)
>> I also run a site at http://gridengine.info but that may not be  
>> helpful until you are at least up and running.
>> Some specific suggestions for you and your current setup:
>> (1) Ignore the 'qmon' GUI. You won't be using it anyway with your  
>> assembler and it just gets in the way of the more flexible command  
>> line programs. Stick with the unix binaries like "qstat", "qrsh"  
>> and "qsub".   You won't be able to use SGE to its fullest unless  
>> you are comfortable with the command line programs
>> (2) Send us (or me) the output of the command "qstat -f" when run  
>> on your system. It may explain why you could not run the simple.sh  
>> example job.
>> (3) Learn where your spool logs are, they will be invaluable in  
>> debugging failures. The default location is something along the  
>> lines of $SGE_ROOT/<cell>/spool/ -- in particular you want to look  
>> at the last few lines of "qmaster/messages", "qmaster/schedd/ 
>> messages" and any messages files belonging to exec hosts that are  
>> not behaving.
>> Regards,
>> Chris
>> On Jul 6, 2006, at 4:42 PM, francois.fauteux2 at mail.mcgill.ca wrote:
>>> Hi;
>>>
>>> I am totally new to grid computing. I recently tried to run some  
>>> sequence assembly process on a G5 (8Gb RAM) but the process did  
>>> require more memory.
>>>
>>> I installed N1SGE6 on 3 MACs G5 under 10.4.7 (connected trough a  
>>> router) (alltogheter 13Gb RAM) and I would like to run the  
>>> assembly process in parallel trough the cluster hoping that  
>>> memory resources would be sufficient for the process to complete.
>>>
>>> I would appreciate hints as to "for-dummies-fast-how-to"  
>>> configure the cluster / submit the job properly.
>>>
>>> I installed master and hosts with defaults settings. First try  
>>> with examples/simple.sh returns (w. qmon):
>>> No free slots for interactive job!
>>> while 5 PCUs are available.
>>>
>>> Any hint as to how to properly configure the cluster/project/ 
>>> queues/parallel environments; or to use qsub with usefull options  
>>> -for a fast getting started- would be greatly appreciated; thanks.
>>>
>>> François
>>>
>>> _______________________________________________
>>> Bioclusters maillist  -  Bioclusters at bioinformatics.org
>>> https://bioinformatics.org/mailman/listinfo/bioclusters
>> _______________________________________________
>> Bioclusters maillist  -  Bioclusters at bioinformatics.org
>> https://bioinformatics.org/mailman/listinfo/bioclusters
>
> -- 
> Joseph Landman, Ph.D
> Founder and CEO
> Scalable Informatics LLC,
> email: landman at scalableinformatics.com
> web  : http://www.scalableinformatics.com
> phone: +1 734 786 8423
> fax  : +1 734 786 8452
> cell : +1 734 612 4615
> _______________________________________________
> Bioclusters maillist  -  Bioclusters at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters