[Bioclusters] NFS performance with multiple clients.

Malay mbasu at mail.nih.gov
Wed Nov 24 13:48:14 EST 2004

Looks like you have a network problem. From all practical purpose I can 
say I run routinely complete BLAST database shared on a NetApps NFS 
server for ~100 nodes and still get a decent performance. So I gave up 
splitting databases.


Joe Landman wrote:
> Hi Humberto:
> Humberto Ortiz Zuazaga wrote:
>> We've only got 12 nodes or so up (AC problems), and our users are
>> already complaining about lousy disk IO.
> I have a rule of thumb when building clusters:  Never more than about 10 
> machines (20 CPU's) to a single pipe on the file server.  If you need 
> more performance you should look at alternatives on the file server and 
> connection to cluster aspect. A SAN will not help you here.
>> I've written up a summary of some tests I've run, I'd like people to
>> read it and tell me if this performance is adequate for our hardware, of
>> if we should be looking for problems.
>> http://plone.hpcf.upr.edu/Members/humberto/Wiki_Folder.2003-07-17.5848/NfsPerformanceTests 
>> Here is a summary of bonnie++ results for 1, 2, 4, 6 and 8 simultaneous
>> bonne runs on the cluster, these are the average of however many bonnie
>> processes were run simultaneously, results are in KB/sec.
>> #Procs       ch-out blk-out    rw    ch-in    blk-in
>> 1       10285  10574        12116    28753    71982
>> 2       4296      4386        954    16965    22997
>> 4       2336      2266        412    7870    7913
>> 6       1098      602        286    2789    3545
>> 8       1322      970        181    2518    2750
> I see a number of issues, but first, this beautifully (and sadly) 
> illustrates what I have been saying for years as the "1/N" problem.  As 
> you take a single shared resource of fixed size (the single network pipe 
> into your server), start sharing it with N requestors (N processors or 
> nodes requesting traffic on that network pipe), you introduce 
> contention, and you get less resource on average per node as you 
> increase the number of nodes.  That is, you are sharing a slow pipe 
> among N heavy users of that pipe, and you will on average get about 1/N 
> of that pipe per heavy user.
> Second, it appears that your performance through your switch is 
> abysmal.  1 processor reading and writing should be able to hit about 
> 25-35 MB/s on a gigabit network mounted RAID5 from a 3ware card.  This 
> is what I see on mine, connected to an older/slower Athlon.  You are 
> getting 10 MB/s.  In fact it looks suspiciously like your network has 
> switched to 100 mode somewhere along the lines.  Check each machine with 
> mii-tool or ethtool to see what state the networks are in.  I had a 
> switch that continuously renegotiated speed until I turned this off on 
> the head node and compute nodes.  It looks like your head node may have 
> negotiated down.
> Third, there are a number of kernel tunables that can improve the disk 
> IO performance.  If you bug me, I can find a link for you.
> Fourth, RAID 5 is not a high performance architecture.  It is safe, just 
> not fast (even with 3ware units) on writes.   RAID 5 can tolerate a 
> single disk failure.  RAID 1 is a mirror and can tolerate a single 
> failure on one drive.  RAID 10 is a RAID 0 (stripe) of RAID 1 (mirror).  
> Consumes lots of disk, but it is quite fast.  RAID 50 is a RAID 0 
> (stripe) of a RAID 5 (crc).  Consumes less disk, and is fast.
> RAID 5 is slow on writes, as each write is a read-modify-write 
> operation.  Best I have seen out of 3ware 75xx/85xx series units for 
> RAID5 writes is about 30-40 MB/s.  Considering that your network pipe 
> should be capable of about 100 MB/s (gigabit), you should be 
> bottlenecked at the disk.  You might want to consider moving to a RAID 
> 10 if possible.  You lose storage space, but gain speed.
> You can also look at alternative architecture disks, or server mods.  If 
> you contact me offline, I can give you some ideas.
> joe
>> The rw (rewrite) results are especially lousy. Even with two bonnie
>> clients, performance drops precipitously.
>> Any comments, tips or suggestions?
>> ------------------------------------------------------------------------
>> _______________________________________________
>> Bioclusters maillist  -  Bioclusters at bioinformatics.org
>> https://bioinformatics.org/mailman/listinfo/bioclusters

More information about the Bioclusters mailing list