Another direction that might be worth looking at is solid state disks (SSD). These "disks" act like hard drives but are made up of RAM sticks rather than moving parts. I have not personally used SSDs but from what I read, you can't get anything faster. I also read SSDs are becoming more mainstream as prices are starting to come down. A company called Cenatek makes a RAMDisk that plugs into your PCI slot. I've been particularly interested in this product. They claim 100,000 I/Os per second and 80-115 MB/s sustain rate. It can support 4GB of storage and they have plans to make a 8GB disk. The last I looked the card was about $3,000. I've seen larger SSDs (over 1TB) on the market but the price is probably astronomical. Does anyone have any experience with the Cenatek RAMDisks or SSDs in a cluster environment? John Van Workum Tsunamic Technologies Inc. http://www.clusterondemand.com > > Anand S Bisen Wrote: > > Hello, > > I wanted to know which is a better alternative for a cluster of 48 nodes > (dual processor) that is working 24x7 for life science problems dealing > with extensive I/O's (small files) for performance. The kind of I/O's i > am talking about is small file read and writes say (10-20kb) each and > 10000's of these operations simultaneously on the file system. How well > does a distributed file system like GPFS on SAN works or a NAS storage > works. > > We are in the process of designing a cluster for life science related > problem that will work on 10'000's of file's simultaneously from across > the linux cluster and we are hung up on the storage options the pro's > and con's of (GPFS on SAN) or (NAS device). If some body could point me > to a right direction it would be great because as i read from few sites > they say NAS devices are more preferred option but i could'nt find the > reasons to support either one of them. > > Thanks > > ASB > ------------------------------------------------------------------------ > -------------------------------------- > >>From a pure performance perspective using a filesystem such GFS or > Lustre on direct attached SAN nodes > would be #1, however this could be cost prohibitive and would require a > fair amount of administrative overhead. > If you are doing a lot of I/O's I would not recommend using SATA drives > as the performance of doing lots of transactions will degrade very > quickly once cache on your SAN front end is exhausted, SCSI would be the > only way to go here. > > I would definitely not recommend going with any NFS solution as this > type of I/O will bring your filer to max capacity in a hurry. (Unless > you buy very high-end load-balanced systems) > > Some other issues to take into account are sharing bandwidth between > file services and the actual programs running, some codes are fairly > network intensive, and MPI is very sensitive to latency. So once again > from a pure performance standpoint, direct attached disk is the way to > go. > > If it is cost prohibitive to build this kind of infrastructure, I would > recommend using IBM's GPFS on a separate network from the computational > network. I have been using GPFS for around a year and have been pleased > with the performance and scalability. But the thing to remember with > this is that IBM will charge for support on a yearly basis, so this can > end up costing quite a bit of money over the long haul. (However the > other solutions would no doubt require similar support) > > So to sum up: > Cost is no object: Direct SAN attached disk with Parallel file system > such as Lustre > > Hardware cost doesn't matter, admin costs are limited: Buy a big NetApp > or BlueArc filer > > Constrained to a smaller budget for both admin and hardware: Buy a > decent SAN, and front-end it with GPFS, Lustre, or GFS. (I recommend > GPFS from experience, I can't say with Lustre or GFS) > > > James Lowey > Lead, High Performance Computing Systems > TGen, The Translational Genomics Research Institute > 400 N. Fifth Street, Suite 1600 > Phoenix, AZ 85004 > http://www.tgen.org > _______________________________________________ > Bioclusters maillist - Bioclusters@bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bioclusters >