[Bioclusters] OS X and NFS
David Kramer
dave at princetonsolutionsgroup.com
Thu Jul 14 18:24:45 EDT 2005
Respectfully, and at the risk of sounding, ridiculously naive -- why not
consider upgrading the I/O switching technology to Myrinet or Infiniband for
higher-bandwidth and ultra-low latency, before buying more servers?
DK
Quoting Juan Carlos Perin <bic at genome.chop.edu>:
>
> I had a question to see if anyone had any knowledge of a problem we've
> been encountering. It seems our Apple cluster is crashing due to NFS.
> When we run large batch jobs that frequently access an NFS mount, the
> system ends up accumulating 'stuck' processes. If the job is able to
> finish it eventually cleans the 'stuck' processes, and all is well.
> But, if the job continues to allow accumulation of these stuck
> processes, if a given job runs long enough, the system slowly
> deteriorates and becomes less and less responsive, eventually freezing
> up and not allowing anything to function at all.
>
> We started the maximum number of NFS servers (20) and this improved
> things, but didn't fix them. We also limited the jobs to 10 nodes (20
> processors) to theoretically allow one node to access one NFS pipeline
> at any given time. I'm not sure if anyone has run into this before, or
> if anyone has ideas on how to approach fixing this problem. The only
> errors we're seeing otherwise are in the system log, complaining about
> PasswordService not matching the clients response.
>
> We're still running OSX 10.3.8 and our jobs are running through SGE
> 5.3. And we've got a 16 node (32 processor G5 system) with at least 2gb
> RAM per node. The programs running are a mixture of text mining
> algorithms in both Perl and Java. Both requiring frequent reads on
> large .txt files residing on NFS shared directories.
>
> Thanks in advance, for any ideas or suggestions.
>
> Juan Perin
> Children's Hospital of Philadelphia
> _______________________________________________
> Bioclusters maillist - Bioclusters at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
>
David Kramer
Managing Director
Princeton Solutions Group, LLC
856.642.1724
www.princetonsolutionsgroup.com
-------------------------------------------------
This mail sent through IMP: http://horde.org/imp/
More information about the Bioclusters
mailing list