[Bioclusters] Assembly programs

Joe Landman landman at scalableinformatics.com
Thu Jul 6 19:26:34 EDT 2006


Hi folks:

   Was asked recently about genome assembly, and I gave the answer that 
Chris gave below.  What bugs me is that I haven't followed the assembly 
work for a while, and all I remember are the TIGR tools.

   Basically what I am asking is whether or not people have built 
assembly algorithms to run on smaller memory machines, or do we still 
need  large memory SMPs to do the job?  64GB and up, or can we run some 
set of tools in under 16 GB on lots of cluster nodes?

   Thanks!

Joe

Chris Dagdigian wrote:
> 
> Hi François,
> 
> First off, what assembly program are you trying to run on your cluster? 
> Are you sure it is even capable of running in parallel across many 
> machines? Most people I know doing assembly are doing it within a single 
> large SMP system because shared memory is easier/faster and (I think...) 
> there is a relative lack of "true parallel" assembly algorithms.
> 
> Here are some official grid engine helpful URLs:
> 
> - http://gridengine.sunsource.net (main site for the codebase)
> 
> - http://docs.sun.com/app/docs/coll/1017.3  (official documentation site)
> 
> I also run a site at http://gridengine.info but that may not be helpful 
> until you are at least up and running.
> 
> Some specific suggestions for you and your current setup:
> 
> (1) Ignore the 'qmon' GUI. You won't be using it anyway with your 
> assembler and it just gets in the way of the more flexible command line 
> programs. Stick with the unix binaries like "qstat", "qrsh" and 
> "qsub".   You won't be able to use SGE to its fullest unless you are 
> comfortable with the command line programs
> 
> (2) Send us (or me) the output of the command "qstat -f" when run on 
> your system. It may explain why you could not run the simple.sh example 
> job.
> 
> (3) Learn where your spool logs are, they will be invaluable in 
> debugging failures. The default location is something along the lines of 
> $SGE_ROOT/<cell>/spool/ -- in particular you want to look at the last 
> few lines of "qmaster/messages", "qmaster/schedd/messages" and any 
> messages files belonging to exec hosts that are not behaving.
> 
> Regards,
> Chris
> 
> 
> 
> 
> 
> On Jul 6, 2006, at 4:42 PM, francois.fauteux2 at mail.mcgill.ca wrote:
> 
>> Hi;
>>
>> I am totally new to grid computing. I recently tried to run some 
>> sequence assembly process on a G5 (8Gb RAM) but the process did 
>> require more memory.
>>
>> I installed N1SGE6 on 3 MACs G5 under 10.4.7 (connected trough a 
>> router) (alltogheter 13Gb RAM) and I would like to run the assembly 
>> process in parallel trough the cluster hoping that memory resources 
>> would be sufficient for the process to complete.
>>
>> I would appreciate hints as to "for-dummies-fast-how-to" configure the 
>> cluster / submit the job properly.
>>
>> I installed master and hosts with defaults settings. First try with 
>> examples/simple.sh returns (w. qmon):
>> No free slots for interactive job!
>> while 5 PCUs are available.
>>
>> Any hint as to how to properly configure the 
>> cluster/project/queues/parallel environments; or to use qsub with 
>> usefull options -for a fast getting started- would be greatly 
>> appreciated; thanks.
>>
>> François
>>
>> _______________________________________________
>> Bioclusters maillist  -  Bioclusters at bioinformatics.org
>> https://bioinformatics.org/mailman/listinfo/bioclusters
> 
> _______________________________________________
> Bioclusters maillist  -  Bioclusters at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615


More information about the Bioclusters mailing list