[Bioclusters] quick look see at fractal computing.

Wed Feb 22 19:48:48 EST 2006

Hi James,

What Kathleen stated about the query size limit was true, but we have
removed the sequence length restriction for DNA sequences for now. Because
we are in a beta phase, there may be restrictions like this imposed from
time to time during our shake-down. Our overall performance is also not as
important to us at this stage as insuring the efficacy of our results.

This is why what concerns me the most is your statement that we did not
produce the expected number of alignments for your query. We've done
extensive testing and have processed millions of sequences for a research
group with the requisite data integrity checking. If you have found a
condition that produces anomalous results, this is unacceptable to us and I
want to explore this issue with you thoroughly. If you could please post
your input query and the results file produced by MPT, I will re-run this
experiment and attempt the reproduction of these results. If you have indeed
uncovered an issue with our system, I would also like permission to give you
full credit for the find on our website. Our greatest goal is to be accepted
and used by the community to further research advancements.

I would also like to extend an open invitation to others who may be
monitoring this thread to run their own test jobs on our system to assist us
with wider validation of our processing.

Thank You,

Nick

-----Original Message-----
From: Kathleen [mailto:kathleen at massivelyparallel.com] 
Sent: Wednesday, February 22, 2006 5:00 PM
To: 'Clustering, compute farming & distributed computing in life science
informatics'
Cc: 'Nick Robertson'
Subject: RE: [Bioclusters] quick look see at fractal computing.

We have limited the query size initially in order to manage a surge in
usage, which we experienced today.  If you want to really blast us, please
contact Nick Robertson at nick at massivelyparallel.com.  He'll get ya hooked
up so you can test our system with a massive query.  You can also talk with
one of our mega users who blasts us at least once a quarter.  

K

-----Original Message-----
From: James Cuff [mailto:jcuff at broad.mit.edu] 
Sent: Wednesday, February 22, 2006 3:58 PM
To: bioclusters at bioinformatics.org
Subject: [Bioclusters] quick look see at fractal computing.

Hi all,

I was reading GenomeWeb News this morning, and an article about the Howard
Fractal-based computing(tm) and fractal-based communication(tm) models
rather caught my eye.

So I decided to take the new MPT Blast Query server over at
http://www.mptbiotech.com/ for an outing, just for a quick look see.

Standard disclaimers apply, this was just a quick test, it is probably full
of holes, for which I apologise in advance.

I sort of consider myself a 'DNA man' these days, so I decided to look at
the old faithful DNA/DNA blastn code, that always runs fairly bad on
clusters because of I/O, etc. etc. yada yada.

Anyway, my first big problem started when I found that there was a limit to
the amount of DNA one can put in the 'power user portal':

Errors Encountered
# Query (1) is 207954 aa long; this exceeds maximum allowable length of 7000
aa

No worries, I'll carry on.  So as a test we compared the bottom 6,700 odd
bases of chr5 of zebrafish:

node209 /tmp/ wc -c test2.mpt
   6737 test2.mpt

As a comparison we took a single machine with 4GB memory, and the current NT
database split into: 5 chunks: nt.00 nt.01 nt.02 nt.03 nt.04 which were also
read in over a pretty loaded production NFS server, there is not enough
memory to cache it all.  I would like to point out that this is a *really*
bad configuration, but for the test it will do.
I just wanted a worse case baseline scenario.

This was the result of our basic run:

time blastall -a2 -nT -p blastn -i test2.mpt -d nt > ourtest.out
46.250u 7.900s 0:30.33 178.5%   0+0k 0+0io 391341pf+0w

The two copies of NT available here and at MPT were slightly different
sizes, so I report a letters/second number below:

*  MPT total RAIS time 10.45s for 14,192,730,777 letters
   (1358156055 letters / second)

*  A dual CPU Intel box took 30.33s for 15,994,705,008 letters
   (527355918 letters / second) 

So I make that a speed up of only 2.57 times faster over a single dual
processor server.  

We also produced 250 (blast default) alignments the MPT server only
managed to find 156, with the limits set to ask for more.  So something
might also be slightly wrong there.

I guess the proof of the pudding would to use much larger data sizes
and do a real bake off to see the real performance difference.

I'd love to see one of the vendor agnostic groups that hang out on this
list to work with MPT to really nail this down in an independent report.

I'm sure my simple minded test here does not reflect the true power of
the method.

Best regards,

J.

_______________________________________________
Bioclusters maillist  -  Bioclusters at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bioclusters