[Bioclusters] file server for cluster
Chris Dagdigian
bioclusters@bioinformatics.org
Thu, 18 Apr 2002 19:55:03 -0400
Hi Ivo,
This is just my $.02 cents of course.
First of all I always prefer external drive shelves and external RAID
controllers -- having that stuff internal to a single server can be
problematic from both a future capacity and a "what do I do when the
server breaks" standpoint. A good small-budget compromise on the
internal-vs-external argument can be an internal raid controller card
that drives an external shelf of SCSI disks. With all raid controllers
the size of the cache and having it backed up by battery (so you can
enable write-through operations) are critical to getting the best
throughput and performance. Make sure you max out the cache in whatever
product you end up building or buying.
I'm a huge NAS fan because being able to have simple and reliable shared
read/write access to the same filesystem is key in most life science
settings.
This is especially true in clusters where (1) your cluster software may
require a shared filesystem and (2) you don't want to keep many
different copies of genbank etc. lying around. I'll keep using NFSv3 for
as long as it takes the various parallel distribtued filesystems to get
more reliable and easy to deploy.
For cluster fileservers I've done both (a) and (c) depending on size,
budget and the needs of the customer/end-user. In practice I've seen
many people choose (a) for cost reasons and then end up with regrets
because they ended up with a fileserver that was either too unreliable,
too small or not fast enough. It really sucks to see your expensive
cluster sit 99% idle because you have 100 processes all blocking on
pending I/O requests.
In big-project/big-budget situations where I've had to use a very large
NAS unit I've almost always gone with Network Appliance systems. Not
cheap but they are the company that every other NAS vendor is trying to
knock down. Of course if I was building my own system I'd make a
different choice due simply to budget.
The primary difference between a dedicated NAS box like a NetApp and a
build-your-own linux NFS server boils down to OS and cache. The OS stuff
is not a big deal- you can make Windows/Solaris/BSD/Linux/whatever into
a decent NFS server without all that much trouble (just cram your box
with as much RAM as possible + a fast NIC card). The OS inside a
dedicated NAS box will likely give you more software bells and whistles
like snapshots/remote-mirroring etc. etc. This can be nice or not
depending if you actually need the software add-ons.
Many of the low-end and midrange NAS appliances likely run Windows or
Linux internally. The higher end NAS boxes like NetApp tend to run
dedicated OS's that have been engineered from the ground up to do
nothing but fileserving.
Heck- most of the 'value' in a NetApp is not the hardware- if you crack
the case you can see pretty generic/commodity parts inside. What you are
paying for is their incredibly well engineered operating system and the
WAFL filesystem layout.
Cache is a big deal and is the reason why high end dedicated NAS
appliances outperform the general purpose servers. In a dedicated NAS
appliance you will likely see a very large cache (gigabytes in size)
that is backed up by internal batteries and redundant power supplies.
Having that redundancy internally allows the system to do tricks like
acknowledge client write operations without having to wait for the
mechanical disks to physically write the data to media. This is why the
higher end systems outperform the lower end general purpose systems-
they have a couple extra bits of internal hardare plus some cleverness
in their software that allow them to do some funky tricks to get lots
more performance and I/O throughput.
Speaking of competitors to NetApp - forget EMC and their IP4700 NAS
product. It's junk. The only person I know who actually bought one (he's
on this list...) regreted the decision. I faced the hard sell from Dell
on this recently because they have (unwisely IMHO) chosen to resell EMC
kit for midrange and enterprise storage. Fortunatly the customer made
the right choice despite some last minute price gouging from some really
agressive salespeople.
There are many small existing and startup companies who are bringing
radical "NetApp killers" to the NAS market. Two of the semi-stealthy
companies that I've talked to and come away impressed are: Panasas
(www.panasas.com) and Ibrix (www.ibrix.com). Both have been pretty quiet
although Panasas is squarly targeting the life sciences market for their
first products.
For cheap NAS and external SCSI or IDE RAID there are many, many
companies to choose from.
What I would do:
Given a need for simplicity, a big budget and a conservative datacenter
IT staff who demands stuff that is easy to manage and supportable 24/7
I'd choose Network Appliance every time.
If I was under budget constraints yet had a bit more freedom to pursue
more flexible options I'd probably end up building a small SAN with
good quality fibrechannel arrays hanging off the FC switch. I could then
hang N number of beefy linux boxes off the same switch and have a pretty
powerful/flexible/scalable fileserving infrastructure. The downside to
this of course is that it is more complicated and you have more stuff to
manage. The upside is that you don't lock yourself into any particular
vendor and you will probably get a really good price/performance ratio.
SANs are only expensive when very large or when Compaq/IBM/EMC are
trying to sell them to you. You can go along way for short money with a
small FC switch and some good quality drive arrays.
-Chris
Ivo Grosse wrote:
> Hi all,
>
> we want to buy a new fileserver (for our cluster) with about 1 TB, and
> we are thinking of a Linux machine. My question is: which kind of
> fileserver do YOU use (and why)?
>
> (a) NAS (network-attached storage)?
>
> (b) regular Linux machine with internal RAID?
>
> (c) regular Linux machine with external RAID?
>
> Thanks!!!
>
> Ivo
--
Chris Dagdigian, <dag@sonsorol.org>
Independent life science IT & research computing consulting
Office: 617-666-6454, Mobile: 617-877-5498, Fax: 425-699-0193
Work: http://BioTeam.net PGP KeyID: 83D4310E Yahoo IM: craffi