[Bioclusters] Urgent advice on RAID design requested

Fri Jan 19 13:48:57 EST 2007

On 1/19/07, Joe Landman <landman at scalableinformatics.com> wrote:

> Malay wrote:
>
> > Interesting discussion everyone. My limited experience says given the
> > price of redundant but cheap systems and reliable but expensive system,
> > one should go for cheapest systems that serves your purpose and
> > redundancy than reliable and more expensive system. To elaborate, two
>
> There is a story running around about this.  Some airplane manufacturer
> built a small plane with your choice of engines.  First engine was a
> single (unknown manufacturer) high quality and more expensive turboprop.
>   Second was a dual (also unknown manufacturer) "reasonable" quality
> piston engine.  Turns out that the company sold the benefits of "cheaper
> but redundant" to its audience.  The buyers who purchased them, looked
> at the statistics for failures, noted that even with the redundant pair,
> if one failed, you were pretty much in quite a bit of trouble.

MTBF is a statistical measure based on failure rates for a large
number of fresh units.  You may have a component with 10 year MTBF
that whose mechanical bits will wear out in 5 years.  Vendors have
become very adept at designing hardware that wears out a couple days
after the warranty expires.  I hear horror stories about the 2nd disk
failing while rebuilding a RAID, but how many sites have a schedule to
replace drives before they actually fail in service rather that
waiting until the first one fails.  I've experienced too many cases
where a number of identical parts (disks, power supplies, fans) in
workstations purchased at the same time all fail at roughly the same
time.  Sometimes there is a trigger event (A/C failure) that stresses
systems within limits that they would handle when new, but after 2-3
years cooling fans are less effective due to dust buildup, added
components have increased heat production in the machine room, etc. so
you get a cluster of failures.  Rebuilding a RAID is also a stressor.

> The point being, if you are going to bet your life, or your data on
> something, it makes sense to go with hard data as compared to speculation.
>
> The cheapest drives around, Maxtors and their ilk have seen failure
> rates higher than 3-4% in desktop and other apps.  Sure, you will save a
> buck or two on the front end (acquisition).  Unless you can tolerate
> data loss, do you want to deal with the impact on the back end?  Without
>   trying to FUD here, how much precisely is your data worth, how many
> thousands or millions of dollars (or euros, or ...) have been spent
> collecting it?  Once you frame the question in terms of how much risk
> you can afford, you start looking at how to ameliorate the risk.
>
> There are simple, (relatively) inexpensive methods.  N+1 supplies adds
> *marginal* additional cost to a unit.  Using better drives (notice I
> didn't say FC/SCSI/SATA), adds minute costs to the unit.  Using
> intelligent redundancy (RAID6 with hot spares, mirrored,...) reduces
> risk at an increase in cost.

So does a sensible schedule to replace older units before they fail.
For organizations where unscheduled downtime is expensive, the
benefits include being able to schedule  replacements to minimize
disruptions.

> We are not talking about EMC costs here.  Or NetAPP.  If you are
> spending north of $2.5/GB of space you are probably overspending, though
> this is a function of what it is and what technology you are buying.
>
> > separate machines with cheap components (chapest SATA drives with single
> > power supply) is better that one expensive machine (higher quality hard
> > drives, redundant power supply). What you Gurus say?

You have given an ill-posed question.  The answer is very sensitive to
the I/O profile of your workload. There can be a big performance hit
for the I/O it takes to replicate the data between the boxes.  Some
workloads will have low I/O windows where replication can be done.
How robust is your processing if the the separate machines get out of
sync? One approach is to keep the filesystem metadata on a small
highly reliable machine.

> I believe that you can save money at the most appropriate places to do
> so.  Im not sure this is it.  Its your data, and you have to deal
> with/answer for what happens if a disk or machine demise makes it
> un-recoverable.  People whom have not had a loss event usually dont get
> this (e.g. it hasnt bitten them personally).  If you have ever lost data
> due to a failure, and it cost you lots of time/energy/sweat/money to
> recover or replicate this, you quickly realize that the "added" cost is
> a steal, a bargin in comparison with your time.  Which you should value
> highly (your employer does, and rarely do they want you spending time on
> data recovery, unless this is your job, as compared to what you are paid
> to do).

There are usually people (who won't be around when the problems
appear) telling management "cheap, secure, and reliable? -- no
problem!".  In large organizations, the time/energy/sweat includes
sitting in the committees make the recommendations to management.
Many large organizations have people running spreadsheets to look at
the cost of data storage/processing in various sites. The results are
then used to require every site to use the approach that looks
cheapest -- often without appropriate consideration of the risks or of
differences in workloads.

-- 
George N. White III <aa056 at chebucto.ns.ca>
Head of St. Margarets Bay, Nova Scotia