[Bioclusters] Re: Rsync and NCBI and bio-mirror.net
Jeremy Mann
jeremy at biochem.uthscsa.edu
Wed Feb 1 18:45:03 EST 2006
Interesting idea Gilbert. I tried the rsync's first on our server:
time rsync rsync://bio-mirror.net/biomirror/blast/env_nr.tar.gz . -au
real 0m58.704s
user 0m0.735s
sys 0m1.009s
Then with --no-whole-file:
time rsync rsync://bio-mirror.net/biomirror/blast/env_nr.tar.gz . -au
--no-whole-file
real 0m0.847s
user 0m0.002s
sys 0m0.004s
With wget:
real 0m49.270s
user 0m0.320s
sys 0m1.506s
Then wget with --mirror:
real 0m1.085s
user 0m0.004s
sys 0m0.001s
rsync seems to be faster with --no-whole-file that wget with --mirror. I'm
going to have to check exactly what the --mirror does with wget. If it
only downloads changes, or downloads the entire file again.
Thanks for telling me about this!
Don Gilbert said:
>
> Jeremy,
>
> This may be a common misunderstanding of the value of rsync. rsync's
> 'delta transmission' of changed records comes at a high cost of disk
> file checksumming: basically your computer checksums all the blocks of
> each file, sends to rsync server, which does same file checksum, and
> then sends only changed blocks. This reduces network transport, but
> at cost of lots of disk access and CPU computation (on both server and
> client).
>
> For a busy data server, rsync costs much more time (in disk, cpu use)
> and actually gets the file to you more slowly than simple but robust
> FTP.
>
> Rsync typically uses a full CPU for minutes / file, while FTP is very
> lightweight on the server. IUBio/Bio-mirror can support multiple FTP
> processes from one client (I recommend no more than 15). Using
> multiple Rsync processes from the same client is a not-nice thing due
> to high cpu/disk cost to server.
>
> Try this test with a 100+ MB file:
> /usr/bin/time rsync rsync://bio-mirror.net/biomirror/blast/env_nr.tar.gz
> . -au
> 51.51 real
> touch -t 200109110825 env_nr.tar.gz
> /usr/bin/time rsync rsync://bio-mirror.net/biomirror/blast/env_nr.tar.gz
> . -au --no-whole-file
> 58.87 real 2.53 user 1.52 sys
>
> /usr/bin/time wget -nv -nH --mirror
> ftp://bio-mirror.net/biomirror/blast/env_nr.tar.gz
> 7.30 real 0.04 user 1.56 sys
> touch -t 200109110825 biomirror/blast/env_nr.tar.gz
> /usr/bin/time wget -nv -nH --mirror
> ftp://bio-mirror.net/biomirror/blast/env_nr.tar.gz
> 6.61 real 0.02 user 1.72 sys
>
> (my times are on local GB ethernet )
> -- Don
> -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
> -- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/
> _______________________________________________
> Bioclusters maillist - Bioclusters at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
>
--
Jeremy Mann
jeremy at biochem.uthscsa.edu
University of Texas Health Science Center
Bioinformatics Core Facility
http://www.bioinformatics.uthscsa.edu
Phone: (210) 567-2672
More information about the Bioclusters
mailing list