Hi Andy: It is run and job dependent, but I have found that numbers between 7-20 sequences per run gives best throughput. I have done this study a number of times, and it definitely changes with each algorithm and database. You are fighting the database load (actually an mmap) time, as well as the queue latency, against a sequence comparison time, which is dominated by the search portin. Your wall clock time for 1 sequence per queued job will be worse than (for an N cpu run), using N bins to collect the sequences. The N bins is also not optimal (subject to the fuzziness of the information I have hear). What I did to find it is to take a set of jobs, partition them into 1,2,4,8,16,32,64,...,2**12 sequences (I had a large number of tomato ESTs that I had been using for this). I then measured the wall clock time for run completion as a function of the chunk. Using this, I built a finer grid (e.g. 4,5,6,7,8,9,...) around the maximum, and reran. I was able to eyeball the maximum from the chart. Joe On Tue, 2003-03-18 at 04:23, andy law (RI) wrote: > All, > > As we start to use our compute farm for biger and bigger tasks, I came > to realising that the way that we are currently thinking about > submitting our blast jobs is considerably sub-optimal. Obviously 1 run > of 100 sequences against a database is much more efficient than 100 > separate runs sgainst the same database. Has anyone developed scripts > to sit inside some part of a queue submission system (in this case > SGE) to make these things more efficient? I'm thinking along the lines > of something that monitors the size and number of queries, notes the > number of available nodes and batches the jobs up to match one against > the other? -- Joseph Landman, Ph.D Scalable Informatics LLC email: landman@scalableinformatics.com web: http://scalableinformatics.com phone: +1 734 612 4615