[Bioclusters] Linux cluster storage question (SAN/NAS/GPFS)

John Van Workum bioclusters@bioinformatics.org
Thu, 19 Aug 2004 18:04:01 -0400 (EDT)


Another direction that might be worth looking at is solid state disks (SSD).
These "disks" act like hard drives but are made up of RAM sticks rather than
moving parts. I have not personally used SSDs but from what I read, you can't
get anything faster. I also read SSDs are becoming more mainstream as prices
are starting to come down. A company called Cenatek makes a RAMDisk that plugs
into your PCI slot. I've been particularly interested in this product. They
claim 100,000 I/Os per second and 80-115 MB/s sustain rate. It can support 4GB
of storage and they have plans to make a 8GB disk. The last I looked the card
was about $3,000. I've seen larger SSDs (over 1TB) on the market but the price
is probably astronomical.

Does anyone have any experience with the Cenatek RAMDisks or SSDs in a cluster
environment?

John Van Workum
Tsunamic Technologies Inc.
http://www.clusterondemand.com


>
> Anand S Bisen Wrote:
>
> Hello,
>
> I wanted to know which is a better alternative for a cluster of 48 nodes
> (dual processor) that is working 24x7 for life science problems dealing
> with extensive I/O's (small files) for performance. The kind of I/O's i
> am talking about is small file read and writes say (10-20kb) each and
> 10000's of these operations simultaneously on the file system. How well
> does a distributed file system like GPFS on SAN works or a NAS storage
> works.
>
> We are in the process of designing a cluster for life science related
> problem that will work on 10'000's of file's simultaneously from across
> the linux cluster and we are hung up on the storage options the pro's
> and con's of (GPFS on SAN) or (NAS device). If some body could point me
> to a right direction it would be great because as i read from few sites
> they say NAS devices are more preferred option but i could'nt find the
> reasons to support either one of them.
>
> Thanks
>
> ASB
> ------------------------------------------------------------------------
> --------------------------------------
>
>>From a pure performance perspective using a filesystem such GFS or
> Lustre on direct attached SAN nodes
> would be #1, however this could be cost prohibitive and would require a
> fair amount of administrative overhead.
> If you are doing a lot of I/O's I would not recommend using SATA drives
> as the performance of doing lots of transactions will degrade very
> quickly once cache on your SAN front end is exhausted, SCSI would be the
> only way to go here.
>
> I would definitely not recommend going with any NFS solution as this
> type of I/O will bring your filer to max capacity in a hurry. (Unless
> you buy very high-end load-balanced systems)
>
> Some other issues to take into account are sharing bandwidth between
> file services and the actual programs running, some codes are fairly
> network intensive, and MPI is very sensitive to latency. So once again
> from a pure performance standpoint, direct attached disk is the way to
> go.
>
> If it is cost prohibitive to build this kind of infrastructure, I would
> recommend using IBM's GPFS on a separate network from the computational
> network. I have been using GPFS for around a year and have been pleased
> with the performance and scalability.  But the thing to remember with
> this is that IBM will charge for support on a yearly basis, so this can
> end up costing quite a bit of money over the long haul. (However the
> other solutions would no doubt require similar support)
>
> So to sum up:
> Cost is no object:  Direct SAN attached disk with Parallel file system
> such as Lustre
>
> Hardware cost doesn't matter, admin costs are limited:  Buy a big NetApp
> or BlueArc filer
>
> Constrained to a smaller budget for both admin and hardware: Buy a
> decent SAN, and front-end it with GPFS, Lustre, or GFS.  (I recommend
> GPFS from experience, I can't say with Lustre or GFS)
>
>
> James Lowey
> Lead, High Performance Computing Systems
> TGen,  The Translational Genomics Research Institute
> 400 N. Fifth Street,  Suite 1600
> Phoenix, AZ 85004
> http://www.tgen.org
> _______________________________________________
> Bioclusters maillist  -  Bioclusters@bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
>