On Fri, 2003-05-02 at 09:16, Jeremy Mann wrote: > > On Thu, 2003-05-01 at 18:29, Jeremy Mann wrote: > >> I am curious if any knows of any commercial or open source solution to > >> breaking up the NCBI dbs into various sizes. Here, our present > >> solution is > > > > You can use the "formatdb -v N" option to have the database > > automatically divided into groups of N x 10**6 letters. I would > > recommend this route for the database formatting side. Keep the > > original db around for the other tools. > > Then how would you tell blastall which nodes have which *piece* of the > database? If you want to do MIMD type processing (eg node 1 has db chunk 1, node 2 has db chunk 2, etc), and have each blast job work on one chunk of the data, you will need to create a method to a) distribute the relevant bits to the compute nodes in question b) perform the actual run against the smaller data set c) aggregate the results back to the submitter d) reassemble the result for final presentation (optional, depending upon how it is being used) mpiBLAST will do some of this for you. You need a shared storage location for the smaller bits, but you could easily push the "shared" bits to local storage prior to running mpiblast. You would then simply need to use a modified mpiblast.conf file (easy) to point to where the bits sit. As for how to do this in general for NCBI BLAST (and other codes), some of us are working on products to enable exactly this across clusters et al. To avoid having this become an advertisement, please feel free to contact me off-list and I can explain more. -- Joseph Landman, Ph.D Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://scalableinformatics.com phone: +1 734 612 4615