[Bioclusters] Login & home directory strategies for PVM?
Tim Cutts
tjrc at sanger.ac.uk
Mon Feb 7 04:14:12 EST 2005
On 6 Feb 2005, at 11:04 am, Tony Travis wrote:
> Hello, Tim.
>
> We only have a 'small' 64-node cluster here :-)
>
> However, I've opted to use BOBCAT architecture:
>
> http://www.epcc.ed.ac.uk/bobcat/
>
> Although the original EPCC BOBCAT no longer exists, it's spirit lives
> on in our RRI/BioSS cluster:
>
> http://bobcat.rri.sari.ac.uk
>
> The important thing is to have TWO completely separate private network
> fabrics: One for DHCP/NFS, the other for IPC. The main problem we have
> is that IPC (i.e. Inter Process Communication) can swamp the bandwidth
> of a single network fabric and you rapidly lose control of the
> cluster.
We don't have any IPC. We don't run any parallel code. Each job runs
on a single CPU. And NFS *still* causes problems, occasionally. It
really isn't a myth at this scale. It's unusable. For example, we
have to make separate copies of the LSF binaries on all of the
machines, because to do it the Platform-endorsed way, with everything
NFS mounted, is a bit flakey. The NFS contention from LSF's house
keeping alone can be enough to break the cluster.
I suspect if you're running large parallel jobs, then the number of NFS
operations involved is relatively low. The issue for us is sometimes
hundreds of jobs completing every minute, all trying to read some data
files and then create three or four output files on an NFS mounted
disk. That's a lot of separate NFS operations, a large proportion of
which are the particularly painful directory operations. I plead with
the users not to write code like this, but you know what users are
like.
> I think there are some MYTHS about NFS and clusters around because of
> the bandwidth contention on a single network fabric. The NFS network
> traffic on our cluster is completely segregated from the IPC traffic
> which is throttled by the bandwidth of its own network fabric. The
> switches on the two network fabrics are NOT connected in any way...
Our approach is actually similar to yours; we're moving towards cluster
filesystems like GPFS and Lustre, and in those cases, we run the
cluster filesystem traffic over a second network. It's actually a VLAN
on the same switches, but that's not the performance problem you might
think because the Extreme switches we use are fully non-blocking. You
can throw an absolutely obscene number of packets at them and they cope
fine. Even when a Ganglia bug caused a machine to emit thousands of
multicast packets to all 1000 machines every second. The ganglia
daemons went into 100% CPU coping with the incoming packets, which made
the cluster almost unusable, but the network itself was still going
strong.
Tim
--
Dr Tim Cutts
Informatics Systems Group, Wellcome Trust Sanger Institute
GPG: 1024D/E3134233 FE3D 6C73 BBD6 726A A3F5 860B 3CDD 3F56 E313 4233
More information about the Bioclusters
mailing list