[Bioclusters] download blast db with rsync in uncompressed format

Fabien Steinmetz bioclusters@bioinformatics.org
Tue, 2 Dec 2003 14:22:31 +0100


Le Lundi 1 D=E9cembre 2003 18:14, elijah wright a =E9crit :
> > > in fact rsync can't be used at its "best performances" because the
> > > databases are already compressed. Thus the transmitted data to update
> > > a local version is very high and could be much lower if using rsync
> > > with uncompressed databases (by usind the rsync switch to compress
> > > data that is being transmitted). Is there any server on which it would
> > > be possible to get such uncompressed files (in fasta or precompressed
> > > format) ? I couldn't find any with a google. Or do you know a better
> > > way to lower the transmitted data ?
>
> erm, if you're syncing compressed databases against compressed databases,
> then rsync's compression should gain you *nothing*.  you just want to be
> able to compare the blocks and update the ones that aren't the same.
> since they're already compressed, the net amount of data should be LESS,
> rather than more...

Of course the transmitted data is less than the size of the file, howevert=
=20
it's very near the size of the file.

> i am assuming, of course, that you are syncing two compressed versions of
> the dataset and not trying to do something truly odd.

That's what I'm doing syncing two version of the same file. That's not so=20
odd ;-)

> [if i remember correctly, rsync copes well with gzip / bzip data because
> there are pretty clear block boundaries within which data tends to remain
> the same if only some files change...]

Yes that's what's written on rsync's algorithm specification or man pages.=
=20
However I noticed it doesn't manage it well in that case. Maybe because=20
the .gz files are always created 'de novo'

=46abien