[Bioclusters] mpiBlast and NFS

Aaron Darling bioclusters@bioinformatics.org
Tue, 3 Feb 2004 19:55:21 -0600 (CST)

Let me provide a disclaimer before answering that I'm not an expert on
caching in the various NFS implementations.

That said, if you were to set the local storage path in the mpiBLAST
configuration file to a directory on shared storage, mpiBLAST should be
able to work without any local storage.

In such a situation, database fragments would be cached both locally on
each node and on the server.  The current implementation of mpiBLAST
assigns database fragments to workers based on which fragments the worker
has in its 'local storage' directory (in this case the shared NFS dir).
Because all workers will appear to have all fragments available 'locally',
the master will have no preference for assigning the same fragment to the
same node during consecutive executions of mpiBLAST.

If your worker nodes are dedicated and have enough RAM to cache the entire
database there won't be a problem.  Otherwise the master may assign
database fragments the workers don't have cached locally, and even if your
NFS server has them cached there will be some performance impact due to
the latency of accessing data over the network.

Of course, this situation could be remedied by slightly modifying the
mpiBLAST scheduler algorithm to store some persistent state information
about which nodes have most recently searched each fragment to exploit
the worker's buffer-cache effectively.

In designing mpiBLAST we opted to copy fragments to local storage because
in practice it significantly reduces the cost of a buffer-cache miss.
Reading a block from local storage is much faster (latency and
bandwidth) than from the average NFS server, and it eliminates the
potential server contention that arises when several nodes simultaneously
make requests to the NFS server.


On Mon, 2 Feb 2004, Joydeep Sen Sarma wrote:

> Hi folks,
> I work on file systems and am doing some research into
> NFS issues when running Blast. I have read a number of
> posts on the bioclusters mailing list regarding usage
> of local disks being better.
> However, after reading the mpiBlast white paper, I
> got the impression that mpiBlast would avoid nfs read
> io after the server caches are warmed up. (Of course
> as long as the data fits in the server memory pool).
> So I guess i am a little curious as to whether people
> still feel nfs is not suited for mpiBlast and if so,
> why ? Do you have multiple databases against which
> searches are performed (so that the cache is purged
> periodically ?). Or does the database not fit into the
> combined memory of a typical cluster ?
> thanks in advance for your response,
> Joydeep
> __________________________________
> Do you Yahoo!?
> Yahoo! SiteBuilder - Free web site building tool. Try it!
> http://webhosting.yahoo.com/ps/sb/
> _______________________________________________
> Bioclusters maillist  -  Bioclusters@bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters