On 13 Jul 2004, at 4:39 am, Chen Peng wrote: > That's exactly what we did for our cluster. However, we made it at at > higher leverl with rsync. Basically when a node finishes > synchronization, it can serve others as the golden copy. We automated > the process and paralleled rysnc across the cluster. For our 64 nodes > cluster, it generally speeds up by 800% - 900%. > > As you have explained, there is a limitation set by the switch. We > are using gigabit ethernet and one 700 MB file can be "parallel > sync-ed" to the 64-node cluster in 10-12 minutes, where the average > speed is 70-80MB/sec. We use tree-based rsync as well, to our 1000 node cluster, but it still takes an age, especially since almost 800 of the nodes are only 100 MBit connected, and even that is oversubscribed -- each chassis of 24 RLX blades only has a single 100 MBit uplink to the rest of the network, so we do as you do, and push to one node within each chassis, and then have that node rsync to the others in its chassis. But a full update of the complete 70 GB local data filesystem still takes a couple of days. However, due to the nature of the code, we don't have total downtime for those two days - our rsync scripts open each node to jobs as it completes receiving its data, so we always have *some* machines available for work. As some of you will have seen from Guy's presentation at the Bioclusters workshop, we're moving towards the use of cluster filesystems like GPFS, GFS or Lustre to get around this problem. Tim -- Dr Tim Cutts Informatics Systems Group, Wellcome Trust Sanger Institute GPG: 1024D/E3134233 FE3D 6C73 BBD6 726A A3F5 860B 3CDD 3F56 E313 4233