[Bioclusters] blast and nfs
Chris Dagdigian
bioclusters@bioinformatics.org
Mon, 21 Apr 2003 15:18:00 -0400
Duzlevski, Ognen wrote:
> Hi all,
>
> we have a 40 node cluster (2 cpus each) and a cluster master that has
>
attached storage over fibre, pretty much a standard thingie.
>
> All of the nodes get their shared space from the cluster master over
nfs. I have a user who has set-up an experiment that fragmented a
database into 200,000 files which are then being blasted against the
standard NCBI databases which reside on the same shared space on the
cluster master and are visible on the nodes (he basically rsh-s into all
the nodes in a loop and starts jobs). He could probably go about his
business in a better way but for the sake of optimizing the setup, I am
actually glad that testing is being done the way it is.
>
> I noticed that the cluster master itself is under heavy load (it is a
>
2 CPU machine), and most of the load comes from the nfsd threads (kernel
space nfs used).
>
> Are there any usual tricks or setup models utilized in setting up
clusters? For example, all of my nodes mount the shared space with
rw/async/rsize=8192,wsize=8192 options. How many nfsd threads usually
run on a master node? Any advice as to the locations of NCBI databases
vs. shared space? How would one go about measuring/observing for the
bottlenecks?
Hi Ognen,
There are many people on this list who have similar setups and have
worked around NFS related bottlenecks in various ways depending on the
complexity of their needs.
One easy way to avoid NFS bottlenecks is to realize that BLAST is
_aways_ going to be performance bound by IO speeds and that generally
your IO access to local disk is going to be far faster than your NFS
connection. Done right, local IDE drives in a software RAID
configuration can get you better speeds than a direct GigE connection to
a NetApp filer or fibrechannel SAN.
Another way to put this: You will NEVER (well, without exotic storage
hardware) be able to build a NFS fileserver that cannot be swamped by
lots of cheap compute nodes going long sequential reads against
network-mounted BLAST databases. You need to engineer around the NFS
bottleneck that is slowing you down.
All you need to do is have enough local disk in each of your compute
nodes to hold all (or some) of your BLAST datasets. The idea is that you
use the NFS mounted blast databases only as a 'staging area' for
rsync'ing or copying your files to scratch or temp space on your compute
nodes. Given the cheap cost of 40-80gb IDE disk drives this is a quick
and easy way to get around NFS related bottlenecks.
Each search can then be done against local disk on each compute node
rather than all nodes hitting the NFS fileserver and beating it to death...
This is generally what most BLAST farm operators will do as a "first
pass" approach. It works very well and is pretty much standard practice
these days.
The "second pass" approach is more complicated and involves splitting up
your blast datasets into RAM-sized chunks, distributing them across the
nodes in your cluster and then multiplexing your query across all the
nodes to get faster throughput times. This is harder to implement and is
useful only for long queries against big databases as there is a certain
amount of overhead required to merge your multiplexed query results back
into one human or machine parsable file.
People only implement the 'second pass' approach when they really need
to. Usually in places where pipelines are constantly repeating the same
big searches over and over again.
My $.02 of course
-Chris
www.bioteam.net