[Bioclusters] Re: Rsync and NCBI and bio-mirror.net

Jeremy Mann jeremy at biochem.uthscsa.edu
Wed Feb 1 18:45:03 EST 2006


Interesting idea Gilbert. I tried the rsync's first on our server:

time rsync rsync://bio-mirror.net/biomirror/blast/env_nr.tar.gz . -au

real    0m58.704s
user    0m0.735s
sys     0m1.009s

Then with --no-whole-file:

time rsync rsync://bio-mirror.net/biomirror/blast/env_nr.tar.gz . -au
--no-whole-file

real    0m0.847s
user    0m0.002s
sys     0m0.004s

With wget:

real    0m49.270s
user    0m0.320s
sys     0m1.506s

Then wget with --mirror:

real    0m1.085s
user    0m0.004s
sys     0m0.001s

rsync seems to be faster with --no-whole-file that wget with --mirror. I'm
going to have to check exactly what the --mirror does with wget. If it
only downloads changes, or downloads the entire file again.

Thanks for telling me about this!



Don Gilbert said:
>
> Jeremy,
>
> This may be a common misunderstanding of the value of rsync. rsync's
> 'delta transmission' of changed records comes at a high cost of disk
> file checksumming: basically your computer checksums all the blocks of
> each file, sends to rsync server, which does same file checksum, and
> then sends only changed blocks.  This reduces network transport, but
> at cost of lots of disk access and CPU computation (on both server and
> client).
>
> For a busy data server, rsync costs much more time (in disk, cpu use)
> and actually gets the file to you more slowly than simple but robust
> FTP.
>
> Rsync typically uses a full CPU for minutes / file, while FTP is very
> lightweight on the server.  IUBio/Bio-mirror can support multiple FTP
> processes from one client (I recommend no more than 15). Using
> multiple Rsync processes from the same client is a not-nice thing due
> to high cpu/disk cost to server.
>
> Try this test with a 100+ MB  file:
> /usr/bin/time  rsync rsync://bio-mirror.net/biomirror/blast/env_nr.tar.gz
> . -au
>        51.51 real
>  touch -t 200109110825 env_nr.tar.gz
> /usr/bin/time  rsync rsync://bio-mirror.net/biomirror/blast/env_nr.tar.gz
> . -au --no-whole-file
>        58.87 real         2.53 user         1.52 sys
>
> /usr/bin/time wget -nv -nH  --mirror
> ftp://bio-mirror.net/biomirror/blast/env_nr.tar.gz
>         7.30 real         0.04 user         1.56 sys
>  touch -t 200109110825 biomirror/blast/env_nr.tar.gz
> /usr/bin/time wget -nv -nH --mirror
> ftp://bio-mirror.net/biomirror/blast/env_nr.tar.gz
>         6.61 real         0.02 user         1.72 sys
>
> (my times are on local GB ethernet )
> -- Don
> -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
> -- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/
> _______________________________________________
> Bioclusters maillist  -  Bioclusters at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
>


-- 
Jeremy Mann
jeremy at biochem.uthscsa.edu

University of Texas Health Science Center
Bioinformatics Core Facility
http://www.bioinformatics.uthscsa.edu
Phone: (210) 567-2672



More information about the Bioclusters mailing list