> This test (quick hack) shows how one can use multiple > computers, including spare MacOSX, Windows, and Linux > workstations, to distribute and speed up large biosequence > analyses, BLAST in this example. If you can split large data > sets to small subsets distributed to many computers, analyze each > subset and reassemble subset results to a whole, you should > be able trade time for compute nodes. I apologize if this is obvious or redundant. The scheme you've coded (split the target into N chunks, where N is the number of compute nodes availalable, ship one chunk to each compute node, and then assemble results at the end), may well reduce response time on any single query. It is also highly susceptible to transient or permanent failures in the network, compute node, or re-assembly stages. It also adds a great deal of overhead to each of the jobs. The fact that the target needs to be re-formatted every time we gain or lose a compute node seems particularly iffy. A simpler model (used by many of the folks on this list) exploits the parallelism between jobs, rather than within each query: Any single search is run on a single computational node. A queuing system of your choice is used to schedule jobs onto nodes, manage transient and permanent failures, stage data, and all that other neat stuff. Some sites even share jobs between clusters, idle workstations, and servers by establishing a common repository of larger, un-split targets. The only objection I've run into with this simpler scheme is that it's not "grid" enough. -Chris Dwan Center for Computational Genomics and Bioinformatics University of Minnesota