[Bioclusters] Daemonizing blast, ie running many sequences through 1 process

Farul Mohd. Ghazali bioclusters@bioinformatics.org
Sat, 8 Nov 2003 06:10:34 +0800 (MYT)


On Fri, 7 Nov 2003, Chris Dwan (CCGB) wrote:

> It may be that my experience with Solaris is out of date, or that I failed
> to properly parameterize it, but I remember there being a limit on the
> volume of data that CacheFS would accept (the cache size, as it were).
> That limit was well below the size of any of the larger target sets we
> deal with, so using cachefs as a solution to data staging led to
> thrashing, particularly when we started splitting up the targets to better
> parallelize our searches.

Theoretically CacheFS can take up to 90% of the filesystem space where
it's configured for as the cache. This is theoretical of course since I've
not used CacheFS in a Blast environment before, my nodes run Linux. Come
to think of it, it doesn't really make much sense to run CacheFS if you
have that amount of disk space to spare, might as well store the databases
locally.

> Of course, a truly brilliant resource scheduler would take into
> account the contents of the file cache when deciding where to run a
> particular job...

We've been playing around with splitting up the load by storing certain
databases locally and others over NFS. Scheduling is done by specifying
certain resource flags on SGE, it's a lot easier to do it if you have
control over how the people are submitting jobs (eg. over a web page) but
not so easy with command line submissions.