[Bioclusters] cluster hardware question

Joe Landman bioclusters@bioinformatics.org
15 Jan 2003 20:41:33 -0500


On Wed, 2003-01-15 at 19:44, Duzlevski, Ognen wrote:
> Hi,
> 
> I noticed a huge difference in pricing of dual 2.8 Ghz Xeons as opposed
> to 2.4 Ghz ones. Is there an equal difference in performance between the
> two? Or is it just a matter or hot-off-the-press vs.
> been-there-for-some-time? :)

The latter.  

> Additionally, what is the general feeling on dual-athlons vs. dual-xeons,
> both in terms of performance and possible upgradeability, within a
> cluster?

This is very application dependent.  If your code chases pointers and
suffers from fetch latency, you are likely to be somewhat more pleased
with Athlons.  If your code walks through memory with regular access
patterns that are cache friendly (e.g. the memory is the bottlneck), the
P4 could be quite compelling.  

What I usually recommend to my customer base is to test the code on
single machines first to get a feeling for the relative performance. 
Use the available tools to look at the system performance (vmstat, sar,
atop, etc.).  Try to compare similar machines with similar OSes, and
similar memory and IO configurations.  Understand where your test takes
its time.  Try to have your test reflect what you really wish to
accomplish with the machine.  Try to avoid making the tests too short
(under 1 minute) or too long (over 1 day).

> Any opinions, experiences and advice appreciated!

I usually buy a rev or two back of CPU to avoid the price shock.  The
performance difference isn't all that much (and most codes performance
is not only a function of the CPU, but also of the memory, the IO, etc).

If you go with an Athlon, buy really good DDR memory.  Yes, it will cost
you more up front.  Yes, it will save you significant headache later
on.  Also, make sure the units run cool.  Remember, racks of these
systems can generate LOTS of heat.  Make sure you have enough AC.

If you go with P4, look to see if you can get them with DDR, unless your
application really makes use of the additional bandwidth of the RDRAM. 
Cooling is more important for P4s.

Power is quite important (as is the overall room cooling).  The power
needs to be clean, and you should consider a UPS on the head node and
any file servers.  Remote power is nice if the machine room is in
another building, as is remote console terminal access

Often overlooked in clusters until too late is Disk and IO in general. 
Chris Dagdigian at BioTeam.net is a good person to speak to about this.

Also very much overlooked is the issue of cluster management.  This
tends to guide the choice of Linux distribution.  Management gets to be
painful after the 8th compute node, the old models don't work well on
multiple system image machines.

There are other issues as well, specifically networking, backup of
system, extra IO capabilities, etc.

-- 
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman@scalableinformatics.com
web  : http://scalableinformatics.com
phone: +1 734 612 4615