[Bioclusters] Urgent advice on RAID design requested

George Magklaras georgios at biotek.uio.no
Fri Jan 19 09:54:38 EST 2007

I would start by commenting on the following aspect that you mentioned:
> Our sysadmin is
>> really overloaded and would prefer something that does not suck up all her
>> time in maintenance and configuration.

I know of no SAN technology that does not suck up a substantial portion 
of sysadmin time :-) , especially when it comes to something that is 
point and click and has all the reliability features. If your sysadmin 
is overloaded and you throw him the task of monitoring an entirely new 
technology in the production environment, be very afraid of the amount 
of time your sysadmin will need to devote. If you run a one or two man 
shop, you might need either extra consultancy services from your chosen 
vendor (pointing to the budget), or temp extra man power to disengage 
your sys admin from already established duties, so he can feel 
comfortable enough to play with the new kit and establish the production 
baseline. Don't believe in point and click interfaces that make wonders 
and put everything on auto pilot, or expert monitoring interfaces that 
make sense.

Now, about the equipment. Capacity-wise you mention more or less 4 
Tbytes. For SAN standards that is not really very big, to justify a 
technology switch (RAID/NAS/DAS ---> SAN). If projections (that you do 
not mention) say it will jump to tenths of Tbytes within the next 24 
months, then yes, do go to a SAN solution. Alternatively, if within the 
next couple of years you are going to be well under the 10 Tbyte limit, 
chained configurations of SAS/SATA controllers do exist in the market 
that offer both adequate levels of redundancy if they are duplicated 
(typical scenario RAID BOX 1 at location 1 >-----> rsync <----> RAID 
BOX2 at location2), without the extra expense of SAN replication protocols. 
When we passed that limit, we looked at SANs.

In addition, whether you need and how you define your config parameters 
(RAID level at disk layer, fabric connects, replication paramaters) will 
also depend on the actual applications. Capacity is one factor, but 
average I/O performance, number of IOPS (I/O ops per sec) are also 
important parameters of workload characterization. Flat file 
environments typically need stripes (RAID 10) at controller level, but 
again how you choose to spread them and connect them to file system 
parameters (block size, number of inodes, or other SAN FS parameters) 
can vary if you choose to do formatting and allow say massive FTP-ing or 
file syncing from the same volumes simultaneously. What I suggest you do 
is get your sys admin to perform a base line of current activity using 
tools like iostat, vmstat or other relevant sys performance toolkits, 
and ask the vendor to comment on what would be the equivalent ones on 
the SAN kit/new box or let them get you a "try and buy" deal, before you 
commit to choices for performance, beyond the "I consolidate, therefore 
I gain" approach.

HP SAN kit tends to be good, especially at larger storage quantities. 
For smaller based SANs or SAS/SATA solutions, have a look at EMC and 
Dell, that have products for Linux.  I would stay clear (personal 
opinion) of XSAN based kits from Apple. The experiences I had were not 
good when it came to performance and file compatibility issues (even if 
Apple claims it can talk to all sorts of Unices),  but excellent when it 
came to user friendliness, which for me was not very helpful. There are 
people in the UK that are large Apple server customers, especially in 
the life science arena, that are aware of its limitations and probably 
could say similar things.

If you would like more specific answers, you have to be more specific 
about your exact workload. You mention flatfile storage. OK, but are you 
going to be formatting them on the same volume, formatting them and 
accessing them via FTP, number of simultaneous users on the access node?

Best Regards,

George B. Magklaras

Senior Computer Systems Engineer/UNIX Systems Administrator
The Biotechnology Centre of Oslo,
University of Oslo

EMBnet Norway: http://www.biotek.uio.no/EMBNET/

martin goodson wrote:
> I'd like to ask for some advice on the design of a new storage system:
> We are looking to buy a basic SAN storage system with ~ 4 TB usable
> capacity. Our total budget is £15,000 (~$25,000?). The filesystem is for
> bioinformatics computational work including a fair amount of database access
> but also typical bioinformatics flat file access (>1Gb files).
> We would like good performance but really reliability is the number one
> issue. The SAN would be in use day and night by a 60 node cluster so I guess
> we would be looking at enterprise level reliability if not 24/7 (is there a
> difference?). We plan to attach 4 servers to the SAN which all would be
> linux intel/AMD. 
> We have been using RAID5 SATA with an adaptec fs4500 box with really bad
> experiences so we would really like to get this right. (We have had problems
> with the controller as well drives failing during RAID5 rebuild.) Good
> hardware monitoring would be a must. The controller and basically the whole
> system must be really well supported, especially in Linux. Our sysadmin is
> really overloaded and would prefer something that does not suck up all her
> time in maintenance and configuration.
> Just to be perfectly clear, our priorities are reliability >>> size >
> performance.
> We already have a quote from HP for a SCSI Modular Storage Array with SAN
> Switch 2Gbit/8 port BASE SAN KIT.
> Is this a reasonable setup. Does anyone have any experience with this kit or
> can suggest alternatives? 
> Is SCSI over-specifying? Are enterprise SATA drives / controllers /systems
> now up to scratch? Should we be using RAID6 or RAID10?
> We would really really appreciate some help here.
> Thanks in advance,
> Martin Goodson
> Functional Genetics Unit
> Oxford University
> _______________________________________________
> Bioclusters maillist  -  Bioclusters at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters

More information about the Bioclusters mailing list