[Bioclusters] Re: Rsync and NCBI and bio-mirror.net

Don Gilbert gilbertd at bio.indiana.edu
Wed Feb 1 17:23:16 EST 2006


Jeremy,

This may be a common misunderstanding of the value of rsync. rsync's
'delta transmission' of changed records comes at a high cost of disk
file checksumming: basically your computer checksums all the blocks of
each file, sends to rsync server, which does same file checksum, and
then sends only changed blocks.  This reduces network transport, but
at cost of lots of disk access and CPU computation (on both server and
client).

For a busy data server, rsync costs much more time (in disk, cpu use)
and actually gets the file to you more slowly than simple but robust
FTP.

Rsync typically uses a full CPU for minutes / file, while FTP is very
lightweight on the server.  IUBio/Bio-mirror can support multiple FTP
processes from one client (I recommend no more than 15). Using
multiple Rsync processes from the same client is a not-nice thing due
to high cpu/disk cost to server.

Try this test with a 100+ MB  file:
/usr/bin/time  rsync rsync://bio-mirror.net/biomirror/blast/env_nr.tar.gz . -au
       51.51 real 
 touch -t 200109110825 env_nr.tar.gz
/usr/bin/time  rsync rsync://bio-mirror.net/biomirror/blast/env_nr.tar.gz . -au --no-whole-file
       58.87 real         2.53 user         1.52 sys

/usr/bin/time wget -nv -nH  --mirror ftp://bio-mirror.net/biomirror/blast/env_nr.tar.gz
        7.30 real         0.04 user         1.56 sys
 touch -t 200109110825 biomirror/blast/env_nr.tar.gz
/usr/bin/time wget -nv -nH --mirror ftp://bio-mirror.net/biomirror/blast/env_nr.tar.gz
        6.61 real         0.02 user         1.72 sys

(my times are on local GB ethernet )
-- Don
-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/


More information about the Bioclusters mailing list