On Sep 29, 2004, at 3:22 PM, Juan Carlos Perin wrote: > This is very disappointing considering a single G5 can search the NT > database in under 3 minutes, while running on multiple nodes actually > takes well over ten minutes. This seems like a great opportunity to bring up the old parallel computing saw: Parallelizing a computational task adds overhead. In using multiple CPUs on a single problem, you almost always end up doing more work than you would have, had you just run the task on a single processor. The parallel cost can include time spent in the scheduler, time spent reading files from a shared fileserver, time spent partitioning the target set, and the time of merging the results back together. At least in BLAST, there is little to no interprocess communication to slow things down, thank goodness. The classic formulation was done by Gene Amdahl many years ago: Time to run on one CPU = serial_portion + parallelizable_portion Time to run on N CPUs = serial_portion + (parallel_portion / N) + parallel_cost(N) Total work done increases, but the time to complete any single job drops. Speedup is limited by the non-parallelizable portion of the code, in this case partitioning the target and merging the results. There are lots of exceptions to this rule. The big ones are all points where performance as a function of problem size is discontinuous. This usually happens when the memory requirements cross a hardware boundary: Cache -> RAM -> Disk. Any time that tasks are trivially parallel (a large batch of input files to be searched against the same target, for example) it will almost always be more efficient (in terms of CPU-minutes spent on the problem as a whole) to run each job as a single thread on a single CPU. This is easier to implement (submit a bunch of jobs to the queuing system), easier to tune (tune once, run everywhere), and easier to debug. The vast majority of the users of BLAST farms are more interested in throughput than response time. They have thousands of query sequences, and they want results for all of those queries. There are some users who really want response time from BLAST. Most users of the NCBI BLAST server fall in this category. Parallelized BLAST is for these folks. The process of tuning a cluster to run a single BLAST job as fast as it possibly can is non-trivial, as lots of people on this list know. So the question really comes down to "what do your users want, batch throughput or response time?" Chris Dwan The BioTeam