Hi Ivo, This is just my $.02 cents of course. First of all I always prefer external drive shelves and external RAID controllers -- having that stuff internal to a single server can be problematic from both a future capacity and a "what do I do when the server breaks" standpoint. A good small-budget compromise on the internal-vs-external argument can be an internal raid controller card that drives an external shelf of SCSI disks. With all raid controllers the size of the cache and having it backed up by battery (so you can enable write-through operations) are critical to getting the best throughput and performance. Make sure you max out the cache in whatever product you end up building or buying. I'm a huge NAS fan because being able to have simple and reliable shared read/write access to the same filesystem is key in most life science settings. This is especially true in clusters where (1) your cluster software may require a shared filesystem and (2) you don't want to keep many different copies of genbank etc. lying around. I'll keep using NFSv3 for as long as it takes the various parallel distribtued filesystems to get more reliable and easy to deploy. For cluster fileservers I've done both (a) and (c) depending on size, budget and the needs of the customer/end-user. In practice I've seen many people choose (a) for cost reasons and then end up with regrets because they ended up with a fileserver that was either too unreliable, too small or not fast enough. It really sucks to see your expensive cluster sit 99% idle because you have 100 processes all blocking on pending I/O requests. In big-project/big-budget situations where I've had to use a very large NAS unit I've almost always gone with Network Appliance systems. Not cheap but they are the company that every other NAS vendor is trying to knock down. Of course if I was building my own system I'd make a different choice due simply to budget. The primary difference between a dedicated NAS box like a NetApp and a build-your-own linux NFS server boils down to OS and cache. The OS stuff is not a big deal- you can make Windows/Solaris/BSD/Linux/whatever into a decent NFS server without all that much trouble (just cram your box with as much RAM as possible + a fast NIC card). The OS inside a dedicated NAS box will likely give you more software bells and whistles like snapshots/remote-mirroring etc. etc. This can be nice or not depending if you actually need the software add-ons. Many of the low-end and midrange NAS appliances likely run Windows or Linux internally. The higher end NAS boxes like NetApp tend to run dedicated OS's that have been engineered from the ground up to do nothing but fileserving. Heck- most of the 'value' in a NetApp is not the hardware- if you crack the case you can see pretty generic/commodity parts inside. What you are paying for is their incredibly well engineered operating system and the WAFL filesystem layout. Cache is a big deal and is the reason why high end dedicated NAS appliances outperform the general purpose servers. In a dedicated NAS appliance you will likely see a very large cache (gigabytes in size) that is backed up by internal batteries and redundant power supplies. Having that redundancy internally allows the system to do tricks like acknowledge client write operations without having to wait for the mechanical disks to physically write the data to media. This is why the higher end systems outperform the lower end general purpose systems- they have a couple extra bits of internal hardare plus some cleverness in their software that allow them to do some funky tricks to get lots more performance and I/O throughput. Speaking of competitors to NetApp - forget EMC and their IP4700 NAS product. It's junk. The only person I know who actually bought one (he's on this list...) regreted the decision. I faced the hard sell from Dell on this recently because they have (unwisely IMHO) chosen to resell EMC kit for midrange and enterprise storage. Fortunatly the customer made the right choice despite some last minute price gouging from some really agressive salespeople. There are many small existing and startup companies who are bringing radical "NetApp killers" to the NAS market. Two of the semi-stealthy companies that I've talked to and come away impressed are: Panasas (www.panasas.com) and Ibrix (www.ibrix.com). Both have been pretty quiet although Panasas is squarly targeting the life sciences market for their first products. For cheap NAS and external SCSI or IDE RAID there are many, many companies to choose from. What I would do: Given a need for simplicity, a big budget and a conservative datacenter IT staff who demands stuff that is easy to manage and supportable 24/7 I'd choose Network Appliance every time. If I was under budget constraints yet had a bit more freedom to pursue more flexible options I'd probably end up building a small SAN with good quality fibrechannel arrays hanging off the FC switch. I could then hang N number of beefy linux boxes off the same switch and have a pretty powerful/flexible/scalable fileserving infrastructure. The downside to this of course is that it is more complicated and you have more stuff to manage. The upside is that you don't lock yourself into any particular vendor and you will probably get a really good price/performance ratio. SANs are only expensive when very large or when Compaq/IBM/EMC are trying to sell them to you. You can go along way for short money with a small FC switch and some good quality drive arrays. -Chris Ivo Grosse wrote: > Hi all, > > we want to buy a new fileserver (for our cluster) with about 1 TB, and > we are thinking of a Linux machine. My question is: which kind of > fileserver do YOU use (and why)? > > (a) NAS (network-attached storage)? > > (b) regular Linux machine with internal RAID? > > (c) regular Linux machine with external RAID? > > Thanks!!! > > Ivo -- Chris Dagdigian, <dag@sonsorol.org> Independent life science IT & research computing consulting Office: 617-666-6454, Mobile: 617-877-5498, Fax: 425-699-0193 Work: http://BioTeam.net PGP KeyID: 83D4310E Yahoo IM: craffi