If you have space on your nodes then you should definitely store the BLAST databases locally. Depending on the BLAST parameters, you may want to split the database into chunks that are cache-able and then merge the results. BLASTN with highly insensitive parameters on a fast machine can become IO-limited (and if you're running such insensitive parameters then perhaps you ought to use a different algorithm). BLASTP and others are not going to benefit much from caching as they spend quite a bit of time exploring alignments. Sounds like you're running BLASTN though. Now if you don't have room on your nodes to hold the database (maybe the nodes are diskless) then you definitely want to split your database into cache-able chunks and then you'll only have to read each slice once over NFS. If you're a WU-BLAST user, you can actually do the splitting dynamically with command line parameters and you don't have to physically split the database. One advantage of keeping the data on an NFS server is that it is easier to manage updates. I think dynamic splitting over NFS is a very good idea for diskless nodes. The server ought to have loads of RAM so it can serve from its memory rather than disk and it ought to have a gigabit ethernet (the nodes can have 100 Mbit since they are only going to be IO limited on request of the first slice). <shameless_plug>Such topics are covered in chapter 12 "Hardware and Software Optimizations" of the O'Reilly BLAST book.</shameles_plug> -Ian On Friday, November 7, 2003, at 06:07 AM, Michael.James@csiro.au wrote: > We have a problem with 66 nodes becoming NFS bound > when blasting many (>10,000) sequences > against the same database set. > > One approach (which we are trying) is to cache database files locally, > so nodes can re-read their files without bottlenecking on NFS. > > A totally different approach, with even better performance potential, > would be if a blast process could start up, load its database(s) > and process multiple queries until told to exit. > > This dilutes the startup cost across all the jobs to be run on that > node. > > Does NCBI blast do this? > Is there a blast that does? > Anyone interested in writing one? > What's involved? > > Thanks for any pointers, > michaelj > > -- > Michael James michael.james@csiro.au > System Administrator voice: 02 6246 5040 > CSIRO Bioinformatics Facility fax: 02 6246 5166 > _______________________________________________ > Bioclusters maillist - Bioclusters@bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bioclusters >