Some of the justifications will depend on your audience; are they IT people who control your hardware budget or are they senior scientists who need to approve new research computing directives etc. Here are some of the justifications I've used for both types of groups: (1) Bioclusters preserve any current investment you may have in big expensive unix SMP machines by significantly reducing the computational load on your legacy hardware. Basically you use the large memory and big SMP systems for things like EST clustering and data warehouses that need such environments and you offload everything else you can onto piles of cheap mass market hardware. I know several companies who were able to postpone or actually cancel plans to upgrade or replace large Sun, Alpha and SGI machines because they were able to extend the useful server life by migrating load to the far cheaper cluster or compute farm. Not having to replace or upgrade one of those large systems can save hundreds of thousands or even millions of dollars in capital expense. (2) Fine grained scaling on demand. In a biocluster it is trivial to add additional CPUS. As long as your architcure is correct you can incrementally scale easily and cheaply from tens of CPUs to hundreds or thousands. Compare and contrast this to the problem of upgrading a large unix machine. That 64-CPU enterprise unix system may be great but what happens when you need that 65th CPU? It may require purchase of a whole new cabinet and expensive interconnects just to get that next processor fired up. The other nice thing about scaling with bioclusters is that it is easy to take advantage of newer and faster hardware. Load management layers like LSF, PBS etc. can trivially handle heterogeneous hardware environments so it is not a problem to have your cluster composed of different machine classes. This allows you to effeciently purchase the fastest available commodity CPU power each year with little waste. Plus if you work the proper magic with the load management software layer your end users will never know or have to understand the back end. ALl they know is that their jobs get done. (3) For high throughput embarassingly parallel situations like massive BLAST & hmmsearch searches etc. etc. a biocluster will blow away any enterprise unix system you can think of. As a concrete example of this when I was at Blackstone Computing we were able to build a proof-of-concept dedicated Blast farm with $30,000 USD worth of commodity hardware. That $30,000 demo blast farm was tested by the customer (a large pharma company) and was found to be significantly faster than the $300,000 + unix system they were currently using. The system was so fast (throughtput, not turnaround) the customer was able to perform calculations and experiments that had not been possible before due to time and horsepower constraints. This (#3) is the primary reason that I see people building bioclusters. THe know that they have a huge requirement to run lots of conveniently embarassingly parallel applications in a high throughput mode. As it turns out a loosely coupled cluster or compute farm tends to be a really nice and effective platform for doing this. Many of the first "bioclusters" were actually dedicated BLAST, genescan, hmm etc. resources although these days they are being used for more. (4) Linux on commercial mass market hardware is _incredibly_ powerful from a price/performance standpoint. The Intel/AMD cpus are amazing. If you have a software application or algorithim that runs well under Linux and you need to run lots of them then a cluster is a great choice. (5) What it comes down to is that leveraging piles of inexpensive commodity hardware is the only cost effective way that life science researchers can really get the flexible "supercomputer scale" CPU power they need to perform their work. (6) A hell of a lot of bioinformatics software development is now being primarily developed or ported to linux-on-i386. I do have some links that may be useful; particularly Matthew Trunnel's article in scientific computing world but I don't have the URLs handy and I need to run out to a meeting :) I'll follow up with URLs when I get back. Anyone else with comments? -Chris Paul Gardner wrote: > Hi All, > > I have to give a talk on thurs 12pm (NZST) that justifies the expense of > purchasing 128 PentiumIVs for a BioCluster at our weekly Research Group > meeting. > > I already know a bit about using the MPI compiler and PBS queuing system. > What I'm really interested in is the solutions BioClusters are currently > being used for. Any URLs, papers, and/or suggestions would be greatly > appreciated. > -- Chris Dagdigian, <dag@sonsorol.org> Independent life science IT & research computing consulting Office: 617-666-6454, Mobile: 617-877-5498, Fax: 425-699-0193 Work: http://BioTeam.net PGP KeyID: 83D4310E Yahoo IM: craffi