[Bioclusters] quick look see at fractal computing.

Wed Feb 22 17:58:16 EST 2006

Hi all,

I was reading GenomeWeb News this morning, and an article about the
Howard Fractal-based computing(tm) and fractal-based communication(tm)
models rather caught my eye.

So I decided to take the new MPT Blast Query server over at
http://www.mptbiotech.com/ for an outing, just for a quick look see.

Standard disclaimers apply, this was just a quick test, it is probably
full of holes, for which I apologise in advance.

I sort of consider myself a 'DNA man' these days, so I decided to look
at the old faithful DNA/DNA blastn code, that always runs fairly bad on
clusters because of I/O, etc. etc. yada yada.

Anyway, my first big problem started when I found that there was a limit
to the amount of DNA one can put in the 'power user portal':

Errors Encountered
# Query (1) is 207954 aa long; this exceeds maximum allowable length of
7000 aa

No worries, I'll carry on.  So as a test we compared the bottom 6,700
odd bases of chr5 of zebrafish:

node209 /tmp/ wc -c test2.mpt
   6737 test2.mpt

As a comparison we took a single machine with 4GB memory, and the
current NT database split into: 5 chunks: nt.00 nt.01 nt.02 nt.03 nt.04
which were also read in over a pretty loaded production NFS server,
there is not enough memory to cache it all.  I would like to point out
that this is a *really* bad configuration, but for the test it will do.
I just wanted a worse case baseline scenario.

This was the result of our basic run:

time blastall -a2 -nT -p blastn -i test2.mpt -d nt > ourtest.out
46.250u 7.900s 0:30.33 178.5%   0+0k 0+0io 391341pf+0w

The two copies of NT available here and at MPT were slightly different 
sizes, so I report a letters/second number below:

*  MPT total RAIS time 10.45s for 14,192,730,777 letters
   (1358156055 letters / second)

*  A dual CPU Intel box took 30.33s for 15,994,705,008 letters
   (527355918 letters / second) 

So I make that a speed up of only 2.57 times faster over a single dual
processor server.  

We also produced 250 (blast default) alignments the MPT server only
managed to find 156, with the limits set to ask for more.  So something
might also be slightly wrong there.

I guess the proof of the pudding would to use much larger data sizes
and do a real bake off to see the real performance difference.

I'd love to see one of the vendor agnostic groups that hang out on this
list to work with MPT to really nail this down in an independent report.

I'm sure my simple minded test here does not reflect the true power of
the method.

Best regards,

J.