[Bioclusters] mpiBLAST configuration issues

Lucas Carey bioclusters@bioinformatics.org
Mon, 29 Mar 2004 07:37:45 -0500

On Monday, March 29, 2004 at 12:58 +0100, Micha Bayer wrote:
> Hi,
> We have three nodes reserved for jobs of less than one hour's wall time.
> I am part of the bio group and we have a share of 20% of the total
> compute time on this cluster. Jobs get submitted and queued via the
> OpenPBS batch system. The queue priority is worked out by a formula
> which among other things takes into account recent usage (if you had
> lots of jobs recently you get penalised) and job size (if your job is
> small it gets a higher priority).
> Questions:
> 1. How many database fragments should I generate?
You should generate 5 fragments, and always run with '-np 6'. If you want instead to run with a variable number of CPUs (<= 6) creating 15 fragments should give you the ability to do so with good load-balancing. There is a small performance hit moving from 5->15 fragments, but 15 could be faster depending on both the database and queries. 
> 2. How will the spasmodic traffic on the cluster affect the performance
> of mpiBLAST? 
Once the fragments are distributed to the nodes it shouldn't matter at all. If you keep running queries against the same database(s) and the fragments remain on local storage on those 3 nodes, mpiBLAST does very little communication.
> 3. How are jobs partitioned for queuing with PBS (given an input file
> with one sequence and a different scenario where the input file contains
> multiple query sequences)?
One 'run' of mpiBLAST will process an entire query file with multiple individual queries. PBS views this as a single job, no matter how many individual queries the file contains.
> 4. When I issue the mpirun command and I specify the number of nodes to
> be used, what does that do? Will this actually work on a cluster like
> this where I don't have any control over the scheduling process?
In the documentation a node refers to a CPU.  As far as both mpiBLAST and PBS are concerned, your cluster has 6 nodes reserved for short jobs.