[Bioclusters] NFS performance with multiple clients.

Joe Landman landman at scalableinformatics.com
Wed Nov 24 09:50:27 EST 2004

Hi Humberto:

Humberto Ortiz Zuazaga wrote:

>We've only got 12 nodes or so up (AC problems), and our users are
>already complaining about lousy disk IO.

I have a rule of thumb when building clusters:  Never more than about 10 
machines (20 CPU's) to a single pipe on the file server.  If you need 
more performance you should look at alternatives on the file server and 
connection to cluster aspect. A SAN will not help you here.

>I've written up a summary of some tests I've run, I'd like people to
>read it and tell me if this performance is adequate for our hardware, of
>if we should be looking for problems.
>Here is a summary of bonnie++ results for 1, 2, 4, 6 and 8 simultaneous
>bonne runs on the cluster, these are the average of however many bonnie
>processes were run simultaneously, results are in KB/sec.
>#Procs	   ch-out blk-out	rw	ch-in	blk-in
>1	   10285  10574		12116	28753	71982
>2	   4296	  4386		954	16965	22997
>4	   2336	  2266		412	7870	7913
>6	   1098	  602		286	2789	3545
>8	   1322	  970		181	2518	2750
I see a number of issues, but first, this beautifully (and sadly) 
illustrates what I have been saying for years as the "1/N" problem.  As 
you take a single shared resource of fixed size (the single network pipe 
into your server), start sharing it with N requestors (N processors or 
nodes requesting traffic on that network pipe), you introduce 
contention, and you get less resource on average per node as you 
increase the number of nodes.  That is, you are sharing a slow pipe 
among N heavy users of that pipe, and you will on average get about 1/N 
of that pipe per heavy user.

Second, it appears that your performance through your switch is 
abysmal.  1 processor reading and writing should be able to hit about 
25-35 MB/s on a gigabit network mounted RAID5 from a 3ware card.  This 
is what I see on mine, connected to an older/slower Athlon.  You are 
getting 10 MB/s.  In fact it looks suspiciously like your network has 
switched to 100 mode somewhere along the lines.  Check each machine with 
mii-tool or ethtool to see what state the networks are in.  I had a 
switch that continuously renegotiated speed until I turned this off on 
the head node and compute nodes.  It looks like your head node may have 
negotiated down.

Third, there are a number of kernel tunables that can improve the disk 
IO performance.  If you bug me, I can find a link for you.

Fourth, RAID 5 is not a high performance architecture.  It is safe, just 
not fast (even with 3ware units) on writes.   RAID 5 can tolerate a 
single disk failure.  RAID 1 is a mirror and can tolerate a single 
failure on one drive.  RAID 10 is a RAID 0 (stripe) of RAID 1 (mirror).  
Consumes lots of disk, but it is quite fast.  RAID 50 is a RAID 0 
(stripe) of a RAID 5 (crc).  Consumes less disk, and is fast. 

RAID 5 is slow on writes, as each write is a read-modify-write 
operation.  Best I have seen out of 3ware 75xx/85xx series units for 
RAID5 writes is about 30-40 MB/s.  Considering that your network pipe 
should be capable of about 100 MB/s (gigabit), you should be 
bottlenecked at the disk.  You might want to consider moving to a RAID 
10 if possible.  You lose storage space, but gain speed.

You can also look at alternative architecture disks, or server mods.  If 
you contact me offline, I can give you some ideas.


>The rw (rewrite) results are especially lousy. Even with two bonnie
>clients, performance drops precipitously.
>Any comments, tips or suggestions?
>Bioclusters maillist  -  Bioclusters at bioinformatics.org

Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 612 4615

More information about the Bioclusters mailing list