On 07-Nov-03, Chris Dwan (CCGB) wrote: > It may be that my experience with Solaris is out of date, or that I failed > to properly parameterize it, but I remember there being a limit on the > volume of data that CacheFS would accept (the cache size, as it were). > That limit was well below the size of any of the larger target sets we > deal with, so using cachefs as a solution to data staging led to > thrashing, particularly when we started splitting up the targets to better > parallelize our searches. > > I'm curious to know if this is still the case. > > Of course, a truly brilliant resource scheduler would take into account > the contents of the file cache when deciding where to run a particular > job... Quite. CacheFS seems a bit pointless; the OS usually caches disk access anyway. I have to say that we've always gone with the distribute the data set to all the machines anyway; NFS, or relying on caching at all, only helps if the users are arranging their work in such a way that takes advantage of caching, and that's not the case in my experience. They tend to do this: foreach (@sequence) { foreach (@dbs) { blastall ... } } which totally wrecks caching, rather than: foreach (@dbs) { foreach (@sequence) { blastall ... } } which we all know runs much more efficiently (especially on sites with blastable databases on shared storage). Tim -- Dr Tim Cutts Informatics Systems Group Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK