[Bioclusters] Assembly_contd

francois.fauteux2 at mail.mcgill.ca francois.fauteux2 at mail.mcgill.ca
Thu Jul 6 20:46:10 EDT 2006


Hi; thanks very much for reply;

There seems to be two main streams in assembly, apart from that of 
TIGR, one is CAP3 and the other one is PHRAP (http://www.phrap.org/). 
For running jobs more efficiently, there is also PaCE (see Kalyanaraman 
et al. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 14, 
NO. 12, DECEMBER 2003) and probaly other tools/algorithms also.

We do prefer the PHRAP assembly suite of tools. It is available for 
several platforms, and apart from setting up a SOLARIS/SPARC SMP for 
that purpose, we aimed to try it locally in a small Mac cluster. I do 
not know if PHRAP is compatible with SGE and'll find out as soon as the 
cluster's on by trying a small set of sequences for assembly. PHRAP 
runs OK with smaller sets of sequences on a single Mac.

The qstat -f command outputs:

queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q at mac2                  BIP   0/2       -NA-     -NA-          au
----------------------------------------------------------------------------
all.q at mac1   BIP   0/1       -NA-     -NA-          au
----------------------------------------------------------------------------
all.q at mac3                     BIP   0/2       -NA-     -NA-          au


For the use of qmaster messages (many of them already!) I'll try and 
find out what's what...

Being able to run the simple.sh will already be something, thanks!

- François Fauteux

**********


Have not had time time to dig into this further but I'm pulling these  
app names from notes I had taken during a recent conversation about  
assembly with someone ...

The person was a current heavy user of "CAP3" on a 32GB Solaris/sparc  
system and was looking at a program called "PCAP" as a way of running  
across a cluster since the 32GB memory machine was no longer  
performing well on large assembly problems. Also mentioned repeatedly  
as a possible parallel-and-low-memory-requirements alternative was EULER

CAP3: http://www.genome.org/cgi/content/full/9/9/868

PCAP and CAP3 seem to be from the same authors but the main website  
cited by google seems to be down at the moment.

EULER looks pretty interesting and seems to live here:
http://nbcr.sdsc.edu/euler/


-Chris



On Jul 6, 2006, at 7:26 PM, Joe Landman wrote:

> Hi folks:
>
>   Was asked recently about genome assembly, and I gave the answer  
> that Chris gave below.  What bugs me is that I haven't followed the  
> assembly work for a while, and all I remember are the TIGR tools.
>
>   Basically what I am asking is whether or not people have built  
> assembly algorithms to run on smaller memory machines, or do we  
> still need  large memory SMPs to do the job?  64GB and up, or can  we 
> run some set of tools in under 16 GB on lots of cluster nodes?
>
>   Thanks!
>
> Joe
>
> Chris Dagdigian wrote:
>> Hi François,
>> First off, what assembly program are you trying to run on your  
>> cluster? Are you sure it is even capable of running in parallel  
>> across many machines? Most people I know doing assembly are doing  
>> it within a single large SMP system because shared memory is  
>> easier/faster and (I think...) there is a relative lack of "true  
>> parallel" assembly algorithms.
>> Here are some official grid engine helpful URLs:
>> - http://gridengine.sunsource.net (main site for the codebase)
>> - http://docs.sun.com/app/docs/coll/1017.3  (official  documentation site)
>> I also run a site at http://gridengine.info but that may not be  
>> helpful until you are at least up and running.
>> Some specific suggestions for you and your current setup:
>> (1) Ignore the 'qmon' GUI. You won't be using it anyway with your  
>> assembler and it just gets in the way of the more flexible command  
>> line programs. Stick with the unix binaries like "qstat", "qrsh"  
>> and "qsub".   You won't be able to use SGE to its fullest unless  
>> you are comfortable with the command line programs
>> (2) Send us (or me) the output of the command "qstat -f" when run  
>> on your system. It may explain why you could not run the simple.sh  
>> example job.
>> (3) Learn where your spool logs are, they will be invaluable in  
>> debugging failures. The default location is something along the  
>> lines of $SGE_ROOT/<cell>/spool/ -- in particular you want to look  
>> at the last few lines of "qmaster/messages", "qmaster/schedd/ 
>> messages" and any messages files belonging to exec hosts that are  
>> not behaving.
>> Regards,
>> Chris
>> On Jul 6, 2006, at 4:42 PM, francois.fauteux2 at mail.mcgill.ca wrote:
>>> Hi;
>>>
>>> I am totally new to grid computing. I recently tried to run some  
>>> sequence assembly process on a G5 (8Gb RAM) but the process did  
>>> require more memory.
>>>
>>> I installed N1SGE6 on 3 MACs G5 under 10.4.7 (connected trough a  
>>> router) (alltogheter 13Gb RAM) and I would like to run the  
>>> assembly process in parallel trough the cluster hoping that  memory 
>>> resources would be sufficient for the process to complete.
>>>
>>> I would appreciate hints as to "for-dummies-fast-how-to"  configure 
>>> the cluster / submit the job properly.
>>>
>>> I installed master and hosts with defaults settings. First try  
>>> with examples/simple.sh returns (w. qmon):
>>> No free slots for interactive job!
>>> while 5 PCUs are available.
>>>
>>> Any hint as to how to properly configure the cluster/project/ 
>>> queues/parallel environments; or to use qsub with usefull options  
>>> -for a fast getting started- would be greatly appreciated; thanks.
>>>
>>> François
>>>
>>> _______________________________________________
>>> Bioclusters maillist  -  Bioclusters at bioinformatics.org
>>> https://bioinformatics.org/mailman/listinfo/bioclusters
>> _______________________________________________
>> Bioclusters maillist  -  Bioclusters at bioinformatics.org
>> https://bioinformatics.org/mailman/listinfo/bioclusters
>
> -- 
> Joseph Landman, Ph.D
> Founder and CEO
> Scalable Informatics LLC,
> email: landman at scalableinformatics.com
> web  : http://www.scalableinformatics.com
> phone: +1 734 786 8423
> fax  : +1 734 786 8452
> cell : +1 734 612 4615
> _______________________________________________
> Bioclusters maillist  -  Bioclusters at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters

_______________________________________________
Bioclusters maillist  -  Bioclusters at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bioclusters




More information about the Bioclusters mailing list