[BioBrew Users] Unable to get mpiblast running

Thu Apr 6 09:53:52 EDT 2006

Bastian-

I usually don't see these types of errors. Here are a few questions:

How did you format the database for mpiblast?
Is the mpiblast database on a shared filesystem, like NFS (I don't  
think symlinks will work)?
How did you launch the job, SGE?
Can you try a smaller job using just the 6 compute nodes (and  
formatting the db into 6 pieces)?
Can you try a smaller blast job using p53, p53db from ftp:// 
ftp.bioinformatics.org/pub/biobrew/ and blastp?

Glen

On Apr 6, 2006, at 3:56 AM, Bastian Friedrich wrote:

> Hi,
>
> after installing our new cluster with Rocks 4.1 and BioBrew (plus a
> number of rolls; hpc roll included), I have a hard time getting
> mpiblast to run.
>
> The cluster consists of 7 machines (head node plus 6 compute nodes),
> each equipped with two dual-core Opteron CPUs and 8 GB of RAM.
>
> These are the steps I did:
> * Extend sysctl.conf to provide more shared mem:
>   # Shared mem = 1 GB!!
>   kernel.shmmax = 1099511627776
>   kernel.shmall = 1099511627776
>   on frontend and all nodes
> * Extended .bashrc to use mpich, and increase P4_GLOBMEMSIZE:
>   export PATH=/opt/mpich/gnu/bin:$PATH
>   export P4_GLOBMEMSIZE=157286400
> * Put all nodes into /opt/mpich/gnu/share/machines.LINUX (hm... did  
> I do
>   this manually? Don't remember)
>
> I was trying Glen's "mpiblast introduction" as published on the
> Rocks-Discuss mailing list on 2005-03-24 and executed the following
> command line:
> mpirun -np 30 /usr/local/bin/mpiblast -p blastn -d Hs.seq.uniq -i  
> IL2RA
> -o blast_results
>
> ~/.ncbirc is configured like this:
> ======================================================================
> [NCBI]
> Data=/usr/share/ncbi/data/
>
> [BLAST]
> BLASTDB=/state/partition1/blastdb
> BLASTMAT=/usr/share/ncbi/data/
>
>
> [mpiBLAST]
> Shared=/state/partition1/blastdb
> Local=/tmp
> ======================================================================
>
> (/state/partition1/blastdb is a symlink to the blastdb path on the
> frontend, and contains the database on the nodes. I tried this via  
> NFS,
> too)
>
> Depending on the value of P4_GLOBMEMSIZE, I get different errors - but
> errors in all cases. The jobs are distributed among the nodes, though.
> For "smaller" values of P4_GLOBMEMSIZE (i.e. 104857600 == 100 MB, most
> of the time 200 MB) I get this error:
> ======================================================================
> p0_8400: (23.453125) xx_shmalloc: returning NULL; requested 22880510
> bytes
> p0_8400: (23.453125) p4_shmalloc returning NULL; request = 22880510
> bytes
> You can increase the amount of memory by setting the environment
> variable
> P4_GLOBMEMSIZE (in bytes); the current size is 104857600
> p0_8400:  p4_error: alloc_p4_msg failed: 0
> ======================================================================
>
> For 200 MB, I sometimes (?) get the same error, sometimes this one:
> ======================================================================
> rm_21956:  p4_error: semget failed for setnum: 19
> ======================================================================
>
> For 300 MB, I get this:
> ======================================================================
> p0_20214:  p4_error: exceeding max num of P4_MAX_SYSV_SHMIDS: 256
> ======================================================================
>
> I tried to test my mpich installation with the sample programs  
> included
> (cpi.c, mainly). I was able to get it running with -np <small number>,
> but the errors described above occured when I increased the process
> number.
>
> Yes, I executed "cleanipcs; cluster-fork cleanipcs" in advance in all
> cases.
>
> I frankly have not yet understood the correlation between (possible)
> shmmax/shmall settings, P4_GLOBMEMSIZE and P4_MAX_SYSV_SHMIDS and how
> to tune each one for a successful mpich parallelization.
>
> Due to these mpich problems, I installed OpenMPI and compiled the
> mpiblast src.rpm against OpenMPI; the errors above did not occur, but
> the blast job seemed to get stuck somewhere, too (no error message,  
> but
> the job seemed to last forever).
>
> As I am quite new to clusters, MPI and mpiblast, I feel a little lost.
> Do you have any ideas what the problems may be, and how to fix them?
>
> Thx and Regards,
>    Bastian
>
> -- 
>  Bastian Friedrich                  bastian at bastian-friedrich.de
>  Adress & Fon available on my HP   http://www.bastian-friedrich.de/
> \~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\
> \ To learn more about paranoids, follow them around!
> _______________________________________________
> BioBrew-Users mailing list
> BioBrew-Users at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/BioBrew-Users