[Bioclusters] altivec-HMMER bechmark results

Joe Landman bioclusters@bioinformatics.org
11 Dec 2002 12:23:39 -0500


Looking over the code, it makes use of the SIMD (aka vector) registers
on the G4.  The Athlon code is not making use of its SIMD capability.  

The altivec enabled code makes use of compact data structures and
re-organized memory for better performance.  There is nothing specific
(apart from the calls to the altivec instruction sets) that precludes
doing the same thing on other machines.  In fact, one should be able to
build function prototypes for the altivec instructions which allow it to
map into the SIMD capability of the x86 set (look at libSIMD for x86).

Now why would one wish to do that?

Well the benchmark data seems to indicate that when (pardon the pun)
apples are compared to apples (e.g. non-vector code vs non-vector code)
a single CPU Athlon (old one at that) is about 27% faster than a single
cpu G4.  As someone else pointed out, the latest Athlon is a 2800+ (not
1900+) and has a different core.  Latest P4 is 3.06 GHz and also has a
different core than the earlier P4's.  When going to dual vs dual in
non-vectorized code the old dual Athlon enjoys a 37% advantage over the
dual G4.

Now in the (still punning) apples vs oranges, where we vectorize one
code set (e.g. use SIMD), and exploit the features of the chip sets, of
course we should expect that the feature-exploited (e.g. vector)
architecture will outperfom the non-vector.  There are two issues going
on here though:

1) better memory organization.  Kudos to Eric for doing a good job on
re-organizing the memory access patterns.  For those who don't know,
things like BLAST, HMMer, et al are dominated by memory latency, so the
next cache line you have to wait for will usually wind up dominating the
run time.  Reorganizing the code so that cache lines are not being
thrown away, and prefetching can work (take a look at his indexing
changes), will help all cache based architectures (I wouldn't be
suprised if this is a major component of the speedup, I have obtained
4-8x on various codes by fixing the way they walk through memory).

2) SIMD instructions.  These are very important if you can stream them
and keep the pipelines in the machine full.  The G4 has its SIMD stuff
easily available (kudos to Apple/IBM/Motorola for getting it into GCC).

It is important to note that the code that Eric supplies doesn't seem to
compile under GCC3.2 on an Athlon.  We need a def of the vector
keyword.  We can fake it if need be.  

The point is that when someone decides they want to do this on the
Athlon or on P4, I suspect that the advantage that the G4 claims today
will be gone.  This is likely to happen quite a bit sooner than later.

Joe

On Wed, 2002-12-11 at 12:03, chris dagdigian wrote:
> Dave Waddell wrote:
> > Don't want to start a processor war but the current Intel chip is a
> > 3.06GHz Pentium 4 not a 1GHz Pentium 3. IMHO, benchmark results should
> > match current hardware or don't make them at all.
> > 
> 
> > IMHO, benchmark results should
> >> match current hardware or don't make them at all.
> 
> I agree, but:
> 
> Current hardware != fastest possible hardware, especially for this 
> mailing list...
> 
> Since a huge percentage of the bioclusters that I've seen, built and 
> learned about are running dual Pentium III 1U rackmount systems the 
> benchmarks are real and actually useful for many people in this forum, 
> especially for those who are considering purchasing hardware in the near 
> term. The data is interesting enough to make some people consider 
> eval'ing a G4 box in their decision process. Apple is not something one 
> normally consideres when building such systems.
> 
> I already know of one boston-area cluster that will be built early next 
> year that is going to go heterogenous Intel/Apple with respect to the 
> compute node selection.
> 
> A general question:
> 
> How many people on this list have put life-science specific Pentium 4 
> clusters into production? How many whitebox or big name vendors offer 
> high-density well cooled Pentium 4 chassis at a reasonable price? I'm 
> actually pretty curious as to this because it seems that the trends 
> that favored lots of cheap SMP PIII boxes for our application mix are 
> going away. I think the future clusters that I get involved with may be 
> going single Pentium 4 for a bit or possibly dual-AMD. It will be an 
> interesting few months.
> 
> I should have added the usual disclaimer though: "do your own tests if 
> you want to draw real conclusions" heh.
> 
> -Chris
> 
> 
> 
> _______________________________________________
> Bioclusters maillist  -  Bioclusters@bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
-- 
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman@scalableinformatics.com
web  : http://scalableinformatics.com
phone: +1 734 612 4615