Thanks Chris, It is the start,stop, cleanup business that worries me, and the fact that in loose integration mode SGE won't be able to relaunch a job segment should a compute element hang. I guess the first thing to do is install the mpi libraries so that tree-puzzle compiles in parallel mode and then proceed through the phases that you outlined. Cheers, Dave On Jul 20, 2004, at 3:53 PM, Chris Dagdigian wrote: > { My $.02 } > > SGE comes with preconfigured "example" templates for both mpich and > pvm integration. Take a look in $SGE_ROOT/examples/mpi/ for the MPI > files. > > My experience with parallel environments within grid engine is that > they are largely application specific in that most times you need to > configure a discrete parallel environment within Grid Engine for each > app you hope to run in the cluster. The reason for this is that each > app often needs customized start/stop/cleanup commands that often > don't generalize all that well. > > Your best bet initially is to take things in phases, > > First: get your MPI application running on the cluster outside of Grid > Engine > > Next: Go for "loose integration" of a parallel environment with SGE > > With "loose" integration all SGE is responsible for is finding the > correct number of host and job slots and then generating a custom mpi > hostsfile that your app must "honor" when it runs. > > The nice thing about loose integration is that it is easy to set up -- > SGE may output the hostfile in a format that your app can recognize > and use right away or you may have to take the simple extra step of > writing a prolog method script in your PE that handles the task of > "translating" the machinefile format into one that is recognized by > the applications. > > The usage is pretty simple: > > $ qsub -pe myParallelEnvironment 10 ./my-10-CPU-parallel-job.sh > > When SGE launches the job the location of the custom hostfile will be > visible as an environment variable. Your script then takes that file > and passes it to mpirun or the equiv parallel program launcher. > > The downside to loose integration is that SGE does not manage or deal > with the parallel job at all and thus can't get good accounting stats > or cleanup the aftermath of runaway jobs. > > That is why people often try to achieve "tight integration" which is > when SGE is responsible for actually launching and managing the > parallel job and all it's children. > > Tight integration is often pretty hard to get going robustly. > > > -Chris > > > > > > > > David Adelson wrote: > >> Does anyone on this listserv have any experience integrating SGE and >> MPI within an OS X cluster? >> Specifically we have user who wants to run the mpi parallelized >> version of tree-puzzle on our OS X cluster that is currently managed >> using SGE. While I have seen a preliminary integration of LAM-MPI >> and SGE on >> http://gridengine.sunsource.net/project/gridengine/howto/lam/ >> SGE_LAM_Integration.html it is not clear how straightforward this >> might be in the real world. >> Any firsthand info and experience you want to share would be welcome. >> Cheers, >> Dave Adelson >> Texas A&M University >> _______________________________________________ >> Bioclusters maillist - Bioclusters@bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/bioclusters > > -- > Chris Dagdigian, <dag@sonsorol.org> > Independent life science IT & informatics consulting > Office: 617-666-6454, Mobile: 617-877-5498, Fax: 425-699-0193 > PGP KeyID: 83D4310E Yahoo IM: craffi Web: http://bioteam.net > _______________________________________________ > Bioclusters maillist - Bioclusters@bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bioclusters >