[Bioclusters] Re: Rsync and NCBI and bio-mirror.net

James Cuff jcuff at broad.mit.edu
Thu Feb 2 11:50:34 EST 2006


On Thu, 2006-02-02 at 10:02 -0500, Lee Watkins wrote:

> People interested in high-performance file transfer might check out
> multicast data distribution tools from Micah Beck's "Logistical
> Computing and Internetworking Lab"  http://loci.cs.utk.edu/
> Collectively they call it a "Logistical Distribution Network" (LoDN -
> "lowdown").  I've seen demos at Internet2 and Supercomputing meetings
> (mainly, like BitTorrent, for sucking down huge digital video files)
> and it's quite impressive.  


So this is an awesome idea.  

(oops, I must have been in the US for way too long I've started to use
words like 'awesome' in normal conversation [very worrying]).  


Anyway, I was all rather excited about this so I had a quick look and
went and grabbed and compiled the sdk, and started to have a poke
around.  

There was great documentation, and the web site is really well laid out.
I must say I was up and running in no time at all. Albeit for some value
of running, as we will see later on...


Here is a quick review as I see it, not that these points are in any
real order:

1. The idea of searching, locating and then downloading a 1.3MB xml file
of file chunk locations first before you can start the download proper
does not excite me at all.

2. It currently does not work.  At least for me, but I have messed
things up in the past before, so I'd be happy to see other results.  We
are on a pretty standard network setup here at MIT.

3. The 'public' file system will end up as a total shambles pretty fast,
it already is rather messy.  With time serious controls will be needed.

4. The data gets out of date really fast, and someone will need to keep
it up to date and care and tend it.  This is a critical issue for life
science data.

5. I do still think it will be great when it is finished, and if lots of
groups buy in to the whole data-depot idea.  Maybe it's worth a try
today, but my quick looksee failed enough for me to want to wait a
while.

6. I look at bio-mirror and I see something that functions, and not only
that, functions really well.  

7. Most of this probably also has to be taken with a hefty pinch of
salt, I am a self confessed Luddite, esp. when things don't work for me
from the get go. 


However, here is the gory detail in my hands:

jcuff at bill ~ $ time ncftpget
ftp://distro.ibiblio.org/pub/linux/distributions/gentoo/releases/
x86/2005.1-r1/installcd/install-x86-universal-2005.1-r1.iso
install-x86-universal-2005.1-r1.iso:   398.09 MB    1.13 MB/s  

real    5m53.188s
user    0m0.916s
sys     0m15.017s

So old school technology gets me an up to date version of a great
operating system that I can pretty much trust, md5sum etc, and it takes
ca. 5mins.

So in comparison:

After setup, my lors attempt was still "running" and fetching an old
version of the operating system in a pretty poor manner.  So much so I
decided to put it out of it's misery when tcpdump never managed to
actually pass any data packets...

./lors/bin/lors_download install-x86-universal-2004.3.iso.xnd -f -t 20
-b 2048k -C 40

<snip output>
Load missed on one attempt (non-critical):    -10013
Load          acre.cs.utk.edu:6714  2621440   524288

^C
real    22m56.408s
user    0m0.056s
sys     0m0.008s

jcuff at bill ~ $ ls -ltra install-x86-minimal-2004.3.iso 
-rw-r--r--  1 jcuff users 0 Feb  2 11:16 install-x86-minimal-2004.3.iso









More information about the Bioclusters mailing list