[Bioclusters] BioInform Linux Cluster User Survey

Mon, 30 Sep 2002 19:34:02 -0400

Hi everybody -

Many thanks to those of you who responded to my request for feedback on your
Linux cluster experiences a few weeks ago. The BioInform article based on
responses from this list and other sources is attached below. Hope you find
it useful. The print version of the newsletter has a page of snazzy graphs
that detail some of the results, but since that won't work here, I added a
breakdown after the text of the article. Apologies for the length. I'll be
happy to mail out hard copies to anyone who requests one.

Thanks again to everyone -

Bernadette

......................................
Bernadette Toner
Editor, BioInform
GenomeWeb LLC
212-651-5636
btoner@genomeweb.com
www.bioinform.com

Bioinformatics Linux Clusters Gain Ground (Literally): Users Report Rapid
Expansion

Bioclusters are big. And most of them are getting bigger, according to a
recent user snapshot compiled by BioInform.

A relative rarity just three years ago, Linux clusters have quickly gained
popularity in the bioinformatics community as an effective, low-cost,
high-performance computing option. No longer limited to small, underfunded
academic groups seeking compute power on the cheap, clusters have also taken
root within biotechs and pharmaceutical firms looking for a scalable
complement to other supercomputing resources. An entire sub-industry has
sprouted as a result, with everyone from IBM to small, independent
consulting firms making their services available to the biocluster
community.

BioInform recently polled 20 members of this growing population to get a
better sense of how well Linux clusters are delivering on their promise.
Users from 11 academic groups and nine biopharmas responded to an informal
survey on how well the technology has lived up to their expectations so far
and where it fits into their future infrastructure plans.

More a pulse-taking exercise than a statistically valid portrait of the user
landscape, our efforts did reveal some interesting trends. Most
significantly, respondents indicated that they are taking full advantage of
one of the primary selling points of the approach — its scalability — by
regularly adding new CPUs to their existing clusters.

Of the 20 groups surveyed, 17 have had a cluster in place for just three
years or less, but 14 have already added new CPUs. The six groups who have
not yet expanded their clusters said they plan to do so in the next year, as
do seven other groups (see full results below). The average cluster size for
the group increased from 81 CPUs for the initial installation to a current
size of 426 CPUs. The starting size for academic clusters averaged 49 CPUs,
vs. 126 for biotechs and pharmas. The current average size for the two
sectors has grown to 167 CPUs for academic groups and 783 for biopharmas.

Almost half of the clusters in our survey (nine) were originally home-grown
systems. Three of these groups opted for a vendor or consultant when it came
time to upgrade the system. One academic group that started out with a
homemade system and then turned to a vendor for an upgrade at the two-year
mark said it is going back to a homemade approach for round three. IBM, VA
Linux, and Rackable came in as the most common choices for vendor-built
systems, although it should be noted that firms like Linux Networx,
Blackstone Computing, Microway, and others have sold a number of Linux
clusters in the life science market, even though their customers did not
respond to the survey.

Keep it Simple (and Cheap)

The home-grown flavor of our sample may explain the surprisingly poor
showing of Platform’s pricey LSF when it came to distributed resource
management systems. An equal number of respondents (six) opted for
home-grown job scheduling software or the open source Sun Grid Engine
instead, with PBS (five) and Mosix (four) following close behind.

There were few surprises in the applications category, however. Proving that
bioclusters are often dubbed “Blast farms” for a reason, 14 out of 20 groups
run some flavor of Blast on their clusters, with the usual suspects of
Fasta, HMMer, ClustalW, and RepeatMasker also appearing regularly.

Interestingly, none of the survey respondents opted for a commercial
parallel Blast application such as TurboGenomics’ (now TurboWorx)
TurboBlast, Paracel Blast, or Blackstone PowerBlast. This, again, may be due
to the DIY leanings of the sample group: Four respondents indicated that
they had developed their own parallel versions of the bioinformatics
workhorse. One user noted that these commercial offerings “are only wrappers
around NCBI/Wu-Blast and we are not very happy with them because of the
costs or the programs they use.”

The Biocluster Bottom Line

When it came to judging how much bang Linux clusters deliver for their buck,
the results were a bit mixed. While more than half the respondents (11)
indicated that the price/performance ratio of their cluster beat that of
other computational options as well as their expectations, almost half
(nine) said that issues such as cooling and maintenance costs bumped the
total cost of ownership for the system a bit higher than anticipated. Those
who did their homework before installing the cluster — by speaking to other
users and investigating all their available options — were confronted with
fewer shocks, however.

One user was surprised by “how much heat the new AMD Athlon machines put
out,” which led to “a few one-time startup expenses that relate to cooling.”
Another simply noted that cooling is a “big deal.”

For those who opted to build their own, many underestimated “the effort of
building and administering a cluster by ourselves.” The head of an academic
research lab noted that despite the benefits of the cluster, “I am quite
dependent on the expertise of one person (the PhD student who built it, who
will leave the lab shortly).” Another bemoaned the “time required to
customize applications to run on clusters,” while one user wished for “more
off-the-shelf cookbooks on how to set up and maintain a cluster.”

Conversely, most who opted for vendor-installed systems seemed pleased with
their choice. As one respondent put it: “The cost might have been much less
if we had built the cluster ourselves. But this would have resulted in
additional headaches in terms of maintenance of the machines. The cluster we
have now has been running non-stop and no downtime in the last 12 months!”

While maintenance costs, I/O bottlenecks, and fileserver limitations were
listed among the top drawbacks of the technology, for the majority of survey
respondents, Linux clusters deliver a combination of low cost, scalability,
and speed that far outweighs these inconveniences. One user explained, “we
were able to do full human genome analysis in one month using only 16 Intel
Pentium machines. Now [our] 26 new machines can do the exact same analysis
in two weeks. All 42 machines together should be able to do that same
analysis in little over a week. All this, for a cost much less than one
mid-range computing system that would have an equal number of processors and
comparable computing time.”

As another respondent summed up, the equation that describes why Linux
clusters are growing so rapidly is very simple: “Need more power: buy more
nodes.”

— BT

How long have you been using a Linux cluster?
0-12 months: 5 (25%)
13-24 months: 8 (40%)
25-36 months: 4 (20 %)
37+ months: 3(15%)

Original number of CPUs:
Range: 4-475
Average: 81
Most common number: 32 (4 responses)

Current number of CPUs:
Range: 30-2,300
Average: 426
Average increase: 5.2X
Only 6 (30%) respondents did not add to their cluster.
Of those who added CPUs, all but two more than doubled the size of their
original cluster. Of those, 7 had more than a 5X increase in the number of
CPUs, and 3 had an increase of 10X or more.

Do you plan to add new CPUs to your cluster over the next year?
Yes: 13 (65%)
No response/maybe: 6 (30%)
No: 1 (5%)
The 8 respondents who specified their future plans planned to add an average
of 195 CPUs, with a range of 20-1000.

Processor type:
All respondents said that all or part of their cluster used Intel chips
Of those who specified:
Pentium III: 14 (70%)
Pentium 4/Xeon: 4 (20%)
AMD Athlon: 3 (15%)
Pentium II: 1 (5%)
Mac G4: 1 (5%)
Sparc II: 1 (5%)

Distributed resource management*:
In-house systems: 6 (30%)
Sun Grid Engine: 6 (30%)
PBS: 5 (25%)
Mosix: 4 (20%)
LSF: 1 (5%)
Globus Grid: 1 (5%)
Parasol: 1 (5%)
Condor: 1 (5%)
*Total is greater than 20 because systems are used in combination.

Who built your cluster*?
Homemade: 9 (45%)
IBM: 4 (20%)
Rackable Systems: 2 (10%)
VA Linux: 2 (10%)
Unspecified vendor: 2 (10%)
Unspecified consultant: 1 (5%)
Sun: 1 (5%)
Western Scientific: 1 (5%)
ICT: 1 (5%)
Quant-X: 1 (5%)

Of those who built their own cluster, 3 (33%) responded that they had hired
a vendor for an upgrade.

Applications:
Blast (NCBI, Wu-Blast, Psi-Blast, etc.): 14 (70%)
Fasta: 4 (20%)
Protein folding/molecular dynamics: 4 (20%)
HMMer: 3 (15%)
ClustalW: 3 (15%)
RepeatMasker: 3 (15%)
Phred: 2 (10%)
Phrap: 2 (10%)
Emboss: 2 (10%)
Sim4: 2 (10%)

There were 4 respondents using parallelized versions of Blast. These were
all in-house adaptations. None of the survey respondents indicated they were
using a commercial parallel Blast application.

Price/Performance:
Better than expected: 11 (55%)
As expected: 8 (40%)
Worse than expected: 1 (5%)

Total cost of ownership:
Better than expected: 6 (30%)
As expected: 5 (25%)
Worse than expected*: 9 (45%)
Of those who indicated that TCO was worse than expected, 3 (33%) indicated
that this was due to higher than anticipated cooling costs.

Advantages:
Price/performance: 14 (70%)
Scalability: 5 (25%)
Speed/compute power: 3 (15%)
Availability of embarrassingly parallel bioinformatics applications: 2 (10%)

Drawbacks:
Systems administration overhead: 7 (35%)
I/O bottleneck: 3 (15%)
None: 3 (15%)
Hardware problems: 2 (10%)
Lack of parallelized bioinformatics software: 3 (15%)
Usability problems: 1 (5%)
Lack of support for Linux: 1 (5%)

Wish list:
Better/more parallelized bioinformatics applications: 5 (25%)
Shared memory: 4 (20%)
Cheaper/better fileserver: 3 (15%)
Improved distributed data mechanisms: 2 (10%)
Better workflow management systems: 1 (5%)
Faster/cheaper interconnects: 1 (5%)
Instruction manual: 1 (5%)