[Bioclusters] rsync of NCBI formatted databases

Michael James bioclusters@bioinformatics.org
Thu, 5 Dec 2002 13:26:24 +1100


> Heck, even trading 'wget' scripts on this list may save people time &=20
> effort.

> > >We are currently using a simple update script that uses wget to down=
load
> > >the Formatted Databases for use with NCBI blast. Rather that continu=
e
> > >with this script, I am curious if there is an rsync server or rsync
> > >mirror at NCBI?

> > We provide a mirror of this data, which can be retrieved via ftp, htt=
p,
> > and rsync.  See http://www.bio-mirror.net/ for details.

Mirroring using wget is not possible to sites outside America
 that pay AU$60/Gig for the data.  Rsync is much less wasteful
 as it only transfers the changed sections of the updated files.

For the gzipped files that constitute Biomirror that means the receiving =
site
 keeps an exact mirror of the directory of gzipped files.
It can unpack and redistribute them elsewhere as nessessary for its own u=
se.

The sending site needs to provide rsync access
 and use an rsync-friendly version of gzip.

Patches have gone into gzip over the last couple of years to (I think)
reset the string tables at regular intervals. This means that the effects
of a change in one part of the source file will peter out.

http://www.google.com/search?q=3Drusty%20gzip%20patch&sourceid=3Dmozilla-=
search&start=3D0&start=3D0&ie=3Dutf-8&oe=3Dutf-8

Which will tell you about Rusty's gzip --rsyncable patch:

http://pserver.samba.org/cgi-bin/cvsweb/rsync/patches/gzip-rsyncable.diff

What are other mirrors using?
Is there a map of the biomirror chain?

michaelj
--=20
Michael James=09=09=09=09michael.james@csiro.au
System Administrator=09=09=09voice:=0902 6246 5040
CSIRO Bioinformatics Facility=09fax:=09=0902 6246 5166