On Wednesday 03 December 2003 00:22, Fabien Steinmetz wrote: > Le Lundi 1 D=E9cembre 2003 18:14, elijah wright a =E9crit : > > > > in fact rsync can't be used at its "best performances" because the > > > > databases are already compressed. >=20 > Of course the transmitted data is less than the size of the file, however > it's very near the size of the file. Unfortunately that's not necessarily true. I've got an arrangement with my upstream database provider, bio-mirror.au.apan.net to get rsync access to the databases and often see speedups of LESS than 1. Rsync is a fantastic tool and stands to benefit us all, but for it to work well we require some changes upstream. Considering FASTA databases first, updates seem to come out with the entries in a new order. Even before any new info is added, permuting entries will break any chance of rsync helping. New entries seem to be added randomly through the file. Rsync considers the file in blocks, if most blocks have changed, all rsync can do is ADD the overhead of exchanging block by block checksums. If updates left the beginning as unchanged as possible, appending new entries, then rsync would work well. Once the files are compressed, the compression needs to be considered. Rusty has written some patches to gzip to add a --rsyncable option. These periodically flush the compression codebook, meaning that changes to an early part of the file will not change the entire compressed file. Again, if the original file pushed changes to the end, it wouldn't matter. By the time the file has been indexed (formatdb) and tarred, it might as well be feathered, all is lost. =46or most of my databases, I prefer to go back to NCBI and get the FASTA version. Anyway, it is different (useable) in some cases (est_*) My .02, michaelj PS: Don't get me started on the need to put comments into fasta files with version info. ie: # est_others.fasta, generated by <institute> on <date> =2D-=20 Michael James michael.james@csiro.au System Administrator voice: 02 6246 5040 CSIRO Bioinformatics Facility fax: 02 6246 5166