Hi James, What Kathleen stated about the query size limit was true, but we have removed the sequence length restriction for DNA sequences for now. Because we are in a beta phase, there may be restrictions like this imposed from time to time during our shake-down. Our overall performance is also not as important to us at this stage as insuring the efficacy of our results. This is why what concerns me the most is your statement that we did not produce the expected number of alignments for your query. We've done extensive testing and have processed millions of sequences for a research group with the requisite data integrity checking. If you have found a condition that produces anomalous results, this is unacceptable to us and I want to explore this issue with you thoroughly. If you could please post your input query and the results file produced by MPT, I will re-run this experiment and attempt the reproduction of these results. If you have indeed uncovered an issue with our system, I would also like permission to give you full credit for the find on our website. Our greatest goal is to be accepted and used by the community to further research advancements. I would also like to extend an open invitation to others who may be monitoring this thread to run their own test jobs on our system to assist us with wider validation of our processing. Thank You, Nick -----Original Message----- From: Kathleen [mailto:kathleen at massivelyparallel.com] Sent: Wednesday, February 22, 2006 5:00 PM To: 'Clustering, compute farming & distributed computing in life science informatics' Cc: 'Nick Robertson' Subject: RE: [Bioclusters] quick look see at fractal computing. We have limited the query size initially in order to manage a surge in usage, which we experienced today. If you want to really blast us, please contact Nick Robertson at nick at massivelyparallel.com. He'll get ya hooked up so you can test our system with a massive query. You can also talk with one of our mega users who blasts us at least once a quarter. K -----Original Message----- From: James Cuff [mailto:jcuff at broad.mit.edu] Sent: Wednesday, February 22, 2006 3:58 PM To: bioclusters at bioinformatics.org Subject: [Bioclusters] quick look see at fractal computing. Hi all, I was reading GenomeWeb News this morning, and an article about the Howard Fractal-based computing(tm) and fractal-based communication(tm) models rather caught my eye. So I decided to take the new MPT Blast Query server over at http://www.mptbiotech.com/ for an outing, just for a quick look see. Standard disclaimers apply, this was just a quick test, it is probably full of holes, for which I apologise in advance. I sort of consider myself a 'DNA man' these days, so I decided to look at the old faithful DNA/DNA blastn code, that always runs fairly bad on clusters because of I/O, etc. etc. yada yada. Anyway, my first big problem started when I found that there was a limit to the amount of DNA one can put in the 'power user portal': Errors Encountered # Query (1) is 207954 aa long; this exceeds maximum allowable length of 7000 aa No worries, I'll carry on. So as a test we compared the bottom 6,700 odd bases of chr5 of zebrafish: node209 /tmp/ wc -c test2.mpt 6737 test2.mpt As a comparison we took a single machine with 4GB memory, and the current NT database split into: 5 chunks: nt.00 nt.01 nt.02 nt.03 nt.04 which were also read in over a pretty loaded production NFS server, there is not enough memory to cache it all. I would like to point out that this is a *really* bad configuration, but for the test it will do. I just wanted a worse case baseline scenario. This was the result of our basic run: time blastall -a2 -nT -p blastn -i test2.mpt -d nt > ourtest.out 46.250u 7.900s 0:30.33 178.5% 0+0k 0+0io 391341pf+0w The two copies of NT available here and at MPT were slightly different sizes, so I report a letters/second number below: * MPT total RAIS time 10.45s for 14,192,730,777 letters (1358156055 letters / second) * A dual CPU Intel box took 30.33s for 15,994,705,008 letters (527355918 letters / second) So I make that a speed up of only 2.57 times faster over a single dual processor server. We also produced 250 (blast default) alignments the MPT server only managed to find 156, with the limits set to ask for more. So something might also be slightly wrong there. I guess the proof of the pudding would to use much larger data sizes and do a real bake off to see the real performance difference. I'd love to see one of the vendor agnostic groups that hang out on this list to work with MPT to really nail this down in an independent report. I'm sure my simple minded test here does not reflect the true power of the method. Best regards, J. _______________________________________________ Bioclusters maillist - Bioclusters at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bioclusters