[Bioclusters] SGE/MPI on OS X
David Adelson
bioclusters@bioinformatics.org
Tue, 20 Jul 2004 16:20:37 -0500
Thanks Chris,
It is the start,stop, cleanup business that worries me, and the fact
that in loose integration mode SGE won't be able to relaunch a job
segment should a compute element hang. I guess the first thing to do
is install the mpi libraries so that tree-puzzle compiles in parallel
mode and then proceed through the phases that you outlined.
Cheers,
Dave
On Jul 20, 2004, at 3:53 PM, Chris Dagdigian wrote:
> { My $.02 }
>
> SGE comes with preconfigured "example" templates for both mpich and
> pvm integration. Take a look in $SGE_ROOT/examples/mpi/ for the MPI
> files.
>
> My experience with parallel environments within grid engine is that
> they are largely application specific in that most times you need to
> configure a discrete parallel environment within Grid Engine for each
> app you hope to run in the cluster. The reason for this is that each
> app often needs customized start/stop/cleanup commands that often
> don't generalize all that well.
>
> Your best bet initially is to take things in phases,
>
> First: get your MPI application running on the cluster outside of Grid
> Engine
>
> Next: Go for "loose integration" of a parallel environment with SGE
>
> With "loose" integration all SGE is responsible for is finding the
> correct number of host and job slots and then generating a custom mpi
> hostsfile that your app must "honor" when it runs.
>
> The nice thing about loose integration is that it is easy to set up --
> SGE may output the hostfile in a format that your app can recognize
> and use right away or you may have to take the simple extra step of
> writing a prolog method script in your PE that handles the task of
> "translating" the machinefile format into one that is recognized by
> the applications.
>
> The usage is pretty simple:
>
> $ qsub -pe myParallelEnvironment 10 ./my-10-CPU-parallel-job.sh
>
> When SGE launches the job the location of the custom hostfile will be
> visible as an environment variable. Your script then takes that file
> and passes it to mpirun or the equiv parallel program launcher.
>
> The downside to loose integration is that SGE does not manage or deal
> with the parallel job at all and thus can't get good accounting stats
> or cleanup the aftermath of runaway jobs.
>
> That is why people often try to achieve "tight integration" which is
> when SGE is responsible for actually launching and managing the
> parallel job and all it's children.
>
> Tight integration is often pretty hard to get going robustly.
>
>
> -Chris
>
>
>
>
>
>
>
> David Adelson wrote:
>
>> Does anyone on this listserv have any experience integrating SGE and
>> MPI within an OS X cluster?
>> Specifically we have user who wants to run the mpi parallelized
>> version of tree-puzzle on our OS X cluster that is currently managed
>> using SGE. While I have seen a preliminary integration of LAM-MPI
>> and SGE on
>> http://gridengine.sunsource.net/project/gridengine/howto/lam/
>> SGE_LAM_Integration.html it is not clear how straightforward this
>> might be in the real world.
>> Any firsthand info and experience you want to share would be welcome.
>> Cheers,
>> Dave Adelson
>> Texas A&M University
>> _______________________________________________
>> Bioclusters maillist - Bioclusters@bioinformatics.org
>> https://bioinformatics.org/mailman/listinfo/bioclusters
>
> --
> Chris Dagdigian, <dag@sonsorol.org>
> Independent life science IT & informatics consulting
> Office: 617-666-6454, Mobile: 617-877-5498, Fax: 425-699-0193
> PGP KeyID: 83D4310E Yahoo IM: craffi Web: http://bioteam.net
> _______________________________________________
> Bioclusters maillist - Bioclusters@bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
>