[Bioclusters] Ethernet Performance

Tue, 13 Jul 2004 08:56:55 +0100

On 13 Jul 2004, at 4:39 am, Chen Peng wrote:

> That's exactly what we did for our cluster. However, we made it at at 
> higher leverl with rsync. Basically when a node finishes 
> synchronization, it can serve others as the golden copy. We automated 
> the process and paralleled rysnc across the cluster. For our 64 nodes 
> cluster, it generally speeds up by 800% - 900%.
>
> As  you have explained, there is a limitation set by the switch. We 
> are using gigabit ethernet and one 700 MB file can be "parallel 
> sync-ed" to the 64-node cluster in 10-12 minutes, where the average 
> speed is 70-80MB/sec.

We use tree-based rsync as well, to our 1000 node cluster, but it still 
takes an age, especially since almost 800 of the nodes are only 100 
MBit connected, and even that is oversubscribed -- each chassis of 24 
RLX blades only has a single 100 MBit uplink to the rest of the 
network, so we do as you do, and push to one node within each chassis, 
and then have that node rsync to the others in its chassis.  But a full 
update of the complete 70 GB local data filesystem still takes a couple 
of days.  However, due to the nature of the code, we don't have total 
downtime for those two days - our rsync scripts open each node to jobs 
as it completes receiving its data, so we always have *some* machines 
available for work.

As some of you will have seen from Guy's presentation at the 
Bioclusters workshop, we're moving towards the use of cluster 
filesystems like GPFS, GFS or Lustre to get around this problem.

Tim

-- 
Dr Tim Cutts
Informatics Systems Group, Wellcome Trust Sanger Institute
GPG: 1024D/E3134233 FE3D 6C73 BBD6 726A A3F5  860B 3CDD 3F56 E313 4233