[Bioclusters] Ethernet Performance

Joe Landman bioclusters@bioinformatics.org
Mon, 12 Jul 2004 23:11:10 -0400

PCP and similar codes build tree structures out of their connections.  
Each node in the tree has an incoming connection to its parent, and 
outgoing connections (2 or possibly more) to neighbors.  Each bucket 
(not packet, but container of data), is moved along the tree, stored at 
a node, and retransmitted to its leaves (if any).  This code and similar 
codes effectively diffuse the data to the edges of the tree.

If you measure the total amount of data moved to the nodes of the 
network, and divide by the total transfer (or diffusion) time, you will 
get the transfer rate.  This rate increases as the size of the network 
increases.  At some point, the rate of data transfer may become 
comparible to the switch backplane bandwidth( the amount of data you can 
push through the switch per unit time). 


Chen Peng wrote:

> This seems to be an interesting software.
> We in TLL implemented a similar solution for parallel data 
> synchronization, but your statistics is really surprising to us. For a 
> gigabit ethernet, the switch can handle at most 100MB/s (~=1000mbps) 
> in theory. How can it achieve 709MB/s with PCP?
> -- 
> Chen Peng <chenpeng@tll.org.sg>
> Senior System Engineer
> Temasek Life Sciences Laboratory
> On 12-Jul-04, at PM 11:36, Rene Storm wrote:
>     Hi Bioclusters,
>     maybe
>     http://www.theether.org/pcp/
>     is a solution for you.
>     It's very good for distributing files to a whole cluster.
>     copy a testfile (1GB) from one frontend to 32 Nodes with
>     gigaethernet (e1000)
>     real 0m46.179s
>     datasize 32x1024MB
>     -------------------------
>     ~ 709 MB/sec
>     copy a testfile (1GB) from one frontend to 32 Nodes with myrinet2k
>     real 0m23.202s
>     datasize 32x1024MB
>     -------------------------
>     ~1423 MB/sec
>     With pcp it is important to have a real good gigabit backplane or
>     if you got
>     an even better a full-crossbar myrinet switch.
>     Overview
>     pcp is a system for replicating files on multiple nodes of a PC
>     cluster.
>     Replication is done by building an n-ary tree of TCP sockets and
>     using
>     parallelized, pipelined data transfers which use RSA
>     authentication. For
>     large file transfers or replication on many nodes, pcp provides
>     highly
>     efficient data transfers when compared to existing alternatives
>     (e.g., NFS).
>     -- 
>     Regards,
>     Rene Storm
>     emplics AG
>     _______________________________________________
>     Bioclusters maillist - Bioclusters@bioinformatics.org
>     https://bioinformatics.org/mailman/listinfo/bioclusters
>     -- 
>     This message has been scanned for viruses and
>     dangerous content by MailScanner, and is
>     believed to be clean.

Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman@scalableinformatics.com
web  : http://scalableinformatics.com
phone: +1 734 612 4615