Bastian- I usually don't see these types of errors. Here are a few questions: How did you format the database for mpiblast? Is the mpiblast database on a shared filesystem, like NFS (I don't think symlinks will work)? How did you launch the job, SGE? Can you try a smaller job using just the 6 compute nodes (and formatting the db into 6 pieces)? Can you try a smaller blast job using p53, p53db from ftp:// ftp.bioinformatics.org/pub/biobrew/ and blastp? Glen On Apr 6, 2006, at 3:56 AM, Bastian Friedrich wrote: > Hi, > > after installing our new cluster with Rocks 4.1 and BioBrew (plus a > number of rolls; hpc roll included), I have a hard time getting > mpiblast to run. > > The cluster consists of 7 machines (head node plus 6 compute nodes), > each equipped with two dual-core Opteron CPUs and 8 GB of RAM. > > These are the steps I did: > * Extend sysctl.conf to provide more shared mem: > # Shared mem = 1 GB!! > kernel.shmmax = 1099511627776 > kernel.shmall = 1099511627776 > on frontend and all nodes > * Extended .bashrc to use mpich, and increase P4_GLOBMEMSIZE: > export PATH=/opt/mpich/gnu/bin:$PATH > export P4_GLOBMEMSIZE=157286400 > * Put all nodes into /opt/mpich/gnu/share/machines.LINUX (hm... did > I do > this manually? Don't remember) > > I was trying Glen's "mpiblast introduction" as published on the > Rocks-Discuss mailing list on 2005-03-24 and executed the following > command line: > mpirun -np 30 /usr/local/bin/mpiblast -p blastn -d Hs.seq.uniq -i > IL2RA > -o blast_results > > ~/.ncbirc is configured like this: > ====================================================================== > [NCBI] > Data=/usr/share/ncbi/data/ > > [BLAST] > BLASTDB=/state/partition1/blastdb > BLASTMAT=/usr/share/ncbi/data/ > > > [mpiBLAST] > Shared=/state/partition1/blastdb > Local=/tmp > ====================================================================== > > (/state/partition1/blastdb is a symlink to the blastdb path on the > frontend, and contains the database on the nodes. I tried this via > NFS, > too) > > Depending on the value of P4_GLOBMEMSIZE, I get different errors - but > errors in all cases. The jobs are distributed among the nodes, though. > For "smaller" values of P4_GLOBMEMSIZE (i.e. 104857600 == 100 MB, most > of the time 200 MB) I get this error: > ====================================================================== > p0_8400: (23.453125) xx_shmalloc: returning NULL; requested 22880510 > bytes > p0_8400: (23.453125) p4_shmalloc returning NULL; request = 22880510 > bytes > You can increase the amount of memory by setting the environment > variable > P4_GLOBMEMSIZE (in bytes); the current size is 104857600 > p0_8400: p4_error: alloc_p4_msg failed: 0 > ====================================================================== > > For 200 MB, I sometimes (?) get the same error, sometimes this one: > ====================================================================== > rm_21956: p4_error: semget failed for setnum: 19 > ====================================================================== > > For 300 MB, I get this: > ====================================================================== > p0_20214: p4_error: exceeding max num of P4_MAX_SYSV_SHMIDS: 256 > ====================================================================== > > I tried to test my mpich installation with the sample programs > included > (cpi.c, mainly). I was able to get it running with -np <small number>, > but the errors described above occured when I increased the process > number. > > Yes, I executed "cleanipcs; cluster-fork cleanipcs" in advance in all > cases. > > I frankly have not yet understood the correlation between (possible) > shmmax/shmall settings, P4_GLOBMEMSIZE and P4_MAX_SYSV_SHMIDS and how > to tune each one for a successful mpich parallelization. > > Due to these mpich problems, I installed OpenMPI and compiled the > mpiblast src.rpm against OpenMPI; the errors above did not occur, but > the blast job seemed to get stuck somewhere, too (no error message, > but > the job seemed to last forever). > > As I am quite new to clusters, MPI and mpiblast, I feel a little lost. > Do you have any ideas what the problems may be, and how to fix them? > > Thx and Regards, > Bastian > > -- > Bastian Friedrich bastian at bastian-friedrich.de > Adress & Fon available on my HP http://www.bastian-friedrich.de/ > \~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\ > \ To learn more about paranoids, follow them around! > _______________________________________________ > BioBrew-Users mailing list > BioBrew-Users at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/BioBrew-Users