[Bioclusters] Urgent advice on RAID design requested

Thu Jan 18 11:54:45 EST 2007

On 18 Jan 2007, at 2:55 pm, Angulo, David wrote:

> you say your point stands.  I say it does not.  Please compare the  
> actual MTBF figures.

Practical experience of running a petabyte of storage arrays here is  
what I'm basing my opinion on, not claims for device MTBF.  Besides,  
MTBF is highly dependent on duty cycle.  MTBF figures are meaningless  
if you don't also consider the duty cycle.  You need to make sure the  
spindles are designed for a 24/7 duty cycle.  Cheap SATA drives  
normally are not.

Anyway, you asked for figures, so here we are, from a spindle  
manufacturer and (later) from Microsoft:

http://www.seagate.com/docs/pdf/marketing/tp_544_tiered_storage.pdf

Their nearline fibrechannel drives have an MTBF of 1 million hours,  
24/7 duty cycle, read/write.  Their desktop SATA drives have an MTBF  
of 600,000 hours, but that's for an 8x5 largely read-only duty  
cycle.  If you abuse them by running them 24x7 read-write, I dare say  
it will be considerably less.  But ignoring that,  by my rough  
estimate, a double drive failure causing data loss of your RAID array  
using desktop SATA drives will probably happen about four times more  
frequently than using fibrechannel disks.  Of course, you can buy  
high MTBF SATA drives but (surprise!) they cost about the same as the  
fibrechannel ones.

Deskside SATA units like the Lacie have their uses (archival, for  
example) but they just are not suitable for 24x7 cluster service.

More details here about what Seagate put in their cheap drives vs.  
the expensive ones:

http://www.seagate.com/content/docs/pdf/whitepaper/ 
D2c_More_than_Interface_ATA_vs_SCSI_042003.pdf

There are also some sobering graphs in this presentation from Seagate  
and Microsoft:

http://download.microsoft.com/download/9/8/f/98f3fe47- 
dfc3-4e74-92a3-088782200fe7/TWST05005_WinHEC05.ppt

The graph showing the probability of second disk error during RAID5  
rebuild on desktop and server drives is slightly scary even for  
server drives, but positively terrifying for the cheap drives.  This  
of course, is why we use RAID6 and a hot spare in our large Lustre  
filesystems.  RAID5 is simply not reliable enough, using the SATA  
drives which underly our SFS servers.

As many others have said in this thread, you get cheap or reliable.   
You do not get both.

Tim