[Bioclusters] SGE/MPI on OS X

David Adelson bioclusters@bioinformatics.org
Tue, 20 Jul 2004 16:20:37 -0500


Thanks Chris,

It is the start,stop, cleanup business that worries me, and the fact 
that in loose integration mode SGE won't be able to relaunch a job 
segment should a compute element hang.  I guess the first thing to do 
is install the mpi libraries so that tree-puzzle compiles in parallel 
mode and then proceed through the phases that you outlined.

Cheers,

Dave

On Jul 20, 2004, at 3:53 PM, Chris Dagdigian wrote:

> { My $.02 }
>
> SGE comes with preconfigured "example" templates for both mpich and 
> pvm integration. Take a look in $SGE_ROOT/examples/mpi/ for the MPI 
> files.
>
> My experience with parallel environments within grid engine is that 
> they are largely application specific in that most times you need to 
> configure a discrete parallel environment within Grid Engine for each 
> app you hope to run in the cluster. The reason for this is that each 
> app often needs customized start/stop/cleanup commands that often 
> don't generalize all that well.
>
> Your best bet initially is to take things in phases,
>
> First: get your MPI application running on the cluster outside of Grid 
> Engine
>
> Next: Go for "loose integration" of a parallel environment with SGE
>
> With "loose" integration all SGE is responsible for is finding the 
> correct number of host and job slots and then generating a custom mpi 
> hostsfile that your app must "honor" when it runs.
>
> The nice thing about loose integration is that it is easy to set up -- 
> SGE may output the hostfile in a format that your app can recognize 
> and use right away or you may have to take the simple extra step of 
> writing a prolog method script in your PE that handles the task of 
> "translating" the machinefile format into one that is recognized by 
> the applications.
>
> The usage is pretty simple:
>
> $ qsub -pe myParallelEnvironment 10 ./my-10-CPU-parallel-job.sh
>
> When SGE launches the job the location of the custom hostfile will be 
> visible as an environment variable. Your script then takes that file 
> and passes it to mpirun or the equiv parallel program launcher.
>
> The downside to loose integration is that SGE does not manage or deal 
> with the parallel job at all and thus can't get good accounting stats 
> or cleanup the aftermath of runaway jobs.
>
> That is why people often try to achieve "tight integration" which is 
> when SGE is responsible for actually launching and managing the 
> parallel job and all it's children.
>
> Tight integration is often pretty hard to get going robustly.
>
>
> -Chris
>
>
>
>
>
>
>
> David Adelson wrote:
>
>> Does anyone on this listserv have any experience integrating SGE and  
>> MPI within an OS X cluster?
>> Specifically we have user who wants to run the mpi parallelized 
>> version  of tree-puzzle on our OS X cluster that is currently managed 
>> using SGE.   While I have seen a preliminary integration of LAM-MPI 
>> and SGE on  
>> http://gridengine.sunsource.net/project/gridengine/howto/lam/ 
>> SGE_LAM_Integration.html it is not clear how straightforward this 
>> might  be in the real world.
>> Any firsthand info and experience you want to share would be welcome.
>> Cheers,
>> Dave Adelson
>> Texas A&M University
>> _______________________________________________
>> Bioclusters maillist  -  Bioclusters@bioinformatics.org
>> https://bioinformatics.org/mailman/listinfo/bioclusters
>
> -- 
> Chris Dagdigian, <dag@sonsorol.org>
> Independent life science IT & informatics consulting
> Office: 617-666-6454, Mobile: 617-877-5498, Fax: 425-699-0193
> PGP KeyID: 83D4310E Yahoo IM: craffi Web: http://bioteam.net
> _______________________________________________
> Bioclusters maillist  -  Bioclusters@bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
>