We have limited the query size initially in order to manage a surge in usage, which we experienced today. If you want to really blast us, please contact Nick Robertson at nick at massivelyparallel.com. He'll get ya hooked up so you can test our system with a massive query. You can also talk with one of our mega users who blasts us at least once a quarter. K -----Original Message----- From: James Cuff [mailto:jcuff at broad.mit.edu] Sent: Wednesday, February 22, 2006 3:58 PM To: bioclusters at bioinformatics.org Subject: [Bioclusters] quick look see at fractal computing. Hi all, I was reading GenomeWeb News this morning, and an article about the Howard Fractal-based computing(tm) and fractal-based communication(tm) models rather caught my eye. So I decided to take the new MPT Blast Query server over at http://www.mptbiotech.com/ for an outing, just for a quick look see. Standard disclaimers apply, this was just a quick test, it is probably full of holes, for which I apologise in advance. I sort of consider myself a 'DNA man' these days, so I decided to look at the old faithful DNA/DNA blastn code, that always runs fairly bad on clusters because of I/O, etc. etc. yada yada. Anyway, my first big problem started when I found that there was a limit to the amount of DNA one can put in the 'power user portal': Errors Encountered # Query (1) is 207954 aa long; this exceeds maximum allowable length of 7000 aa No worries, I'll carry on. So as a test we compared the bottom 6,700 odd bases of chr5 of zebrafish: node209 /tmp/ wc -c test2.mpt 6737 test2.mpt As a comparison we took a single machine with 4GB memory, and the current NT database split into: 5 chunks: nt.00 nt.01 nt.02 nt.03 nt.04 which were also read in over a pretty loaded production NFS server, there is not enough memory to cache it all. I would like to point out that this is a *really* bad configuration, but for the test it will do. I just wanted a worse case baseline scenario. This was the result of our basic run: time blastall -a2 -nT -p blastn -i test2.mpt -d nt > ourtest.out 46.250u 7.900s 0:30.33 178.5% 0+0k 0+0io 391341pf+0w The two copies of NT available here and at MPT were slightly different sizes, so I report a letters/second number below: * MPT total RAIS time 10.45s for 14,192,730,777 letters (1358156055 letters / second) * A dual CPU Intel box took 30.33s for 15,994,705,008 letters (527355918 letters / second) So I make that a speed up of only 2.57 times faster over a single dual processor server. We also produced 250 (blast default) alignments the MPT server only managed to find 156, with the limits set to ask for more. So something might also be slightly wrong there. I guess the proof of the pudding would to use much larger data sizes and do a real bake off to see the real performance difference. I'd love to see one of the vendor agnostic groups that hang out on this list to work with MPT to really nail this down in an independent report. I'm sure my simple minded test here does not reflect the true power of the method. Best regards, J. _______________________________________________ Bioclusters maillist - Bioclusters at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bioclusters