[Bioclusters] OS X and NFS
Tim Cutts
tjrc at sanger.ac.uk
Thu Jul 14 04:32:54 EDT 2005
On 13 Jul 2005, at 7:01 pm, M. Michael Barmada wrote:
> Hi Carlos,
>
> If its any help, we also had similar problems with our cluster. Our
> solution
> was to train the users to include code in their scripts that would
> create
> local directories (on the compute node - in /tmp) and copy the
> files they
> needed to those directories, then do their computing locally and
> copy back
> the results.
Absolutely. And preferably do the copying with something other than
NFS too - rcp or rsync work well, or the scheduler's built-in mechanism.
Most batch schedulers have built in abilities to this - LSF certainly
does, in the form of lsrcp and various options to bsub. I don't know
about SGE - I'm not familiar with it, but I imagine the same sort of
features are available.
It really is quite amazing how badly NFS scales. I remember having
serious problems with it on the first Linux cluster I built at
Incyte's UK office about 6 years ago, and that was just 7 dual-CPU
nodes talking to a Sun E3000 NFS server. It didn't crash, but it got
*really* slow - and that was deliberately caching the data locally (I
wrote wrapper scripts around blastall and other applications to cache
the databases locally, blowing them away by a least-recently-used
method if there wasn't room).
Sanger's current 1100 node cluster still has NFS in places, and it
regularly causes us grief. Our medium-term aim is to remove pretty
much all NFS from the cluster altogether, with the possible exception
of automounted home directories, and use cluster filesystems like
Lustre for shared data.
Tim
--
Dr Tim Cutts
Informatics Systems Group, Wellcome Trust Sanger Institute
GPG: 1024D/E3134233 FE3D 6C73 BBD6 726A A3F5 860B 3CDD 3F56 E313 4233
More information about the Bioclusters
mailing list