[Bioclusters] Urgent advice on RAID design requested
Tim Cutts
tjrc at sanger.ac.uk
Thu Jan 18 11:54:45 EST 2007
On 18 Jan 2007, at 2:55 pm, Angulo, David wrote:
> you say your point stands. I say it does not. Please compare the
> actual MTBF figures.
Practical experience of running a petabyte of storage arrays here is
what I'm basing my opinion on, not claims for device MTBF. Besides,
MTBF is highly dependent on duty cycle. MTBF figures are meaningless
if you don't also consider the duty cycle. You need to make sure the
spindles are designed for a 24/7 duty cycle. Cheap SATA drives
normally are not.
Anyway, you asked for figures, so here we are, from a spindle
manufacturer and (later) from Microsoft:
http://www.seagate.com/docs/pdf/marketing/tp_544_tiered_storage.pdf
Their nearline fibrechannel drives have an MTBF of 1 million hours,
24/7 duty cycle, read/write. Their desktop SATA drives have an MTBF
of 600,000 hours, but that's for an 8x5 largely read-only duty
cycle. If you abuse them by running them 24x7 read-write, I dare say
it will be considerably less. But ignoring that, by my rough
estimate, a double drive failure causing data loss of your RAID array
using desktop SATA drives will probably happen about four times more
frequently than using fibrechannel disks. Of course, you can buy
high MTBF SATA drives but (surprise!) they cost about the same as the
fibrechannel ones.
Deskside SATA units like the Lacie have their uses (archival, for
example) but they just are not suitable for 24x7 cluster service.
More details here about what Seagate put in their cheap drives vs.
the expensive ones:
http://www.seagate.com/content/docs/pdf/whitepaper/
D2c_More_than_Interface_ATA_vs_SCSI_042003.pdf
There are also some sobering graphs in this presentation from Seagate
and Microsoft:
http://download.microsoft.com/download/9/8/f/98f3fe47-
dfc3-4e74-92a3-088782200fe7/TWST05005_WinHEC05.ppt
The graph showing the probability of second disk error during RAID5
rebuild on desktop and server drives is slightly scary even for
server drives, but positively terrifying for the cheap drives. This
of course, is why we use RAID6 and a hot spare in our large Lustre
filesystems. RAID5 is simply not reliable enough, using the SATA
drives which underly our SFS servers.
As many others have said in this thread, you get cheap or reliable.
You do not get both.
Tim
More information about the Bioclusters
mailing list