[Bioclusters] what are people using to mirror large ftp repositories with 2gb+ files?

Donald Becker bioclusters@bioinformatics.org
Wed, 10 Sep 2003 19:34:50 -0400 (EDT)


On Wed, 10 Sep 2003, Nathan O. Siemers wrote:

> 	we use unholy combos of wget, GET (from lwp), and the ancient perl4 
> 'mirror' code at bms.  Can't address the 32 bit limits 'cause we run the 
> code on origins, but you might try those other tools (GET is not 
> recursive though).

Scyld had the first Linux distribution released and tested with LFS
support on 32 bit machines.  Some of the things we learned circa 2000:
  Perl 4 was broken.
  Most FTP clients were broken
     - And most, specifically ncftp, are now fixed.
  Most GNU applications were very robust
     bash, which you wouldn't expect to care about file size, required
       several patches, all fixed in v2.0+ 
  A few kernel bits and tools were obviously only tested with sparse
    files, and some fenchpost errors were missed.
   - We tested with both dense 5GB+ files, and sparse file
     (sparse files are what you get when the program seeks to offset
     writes a single byte)
  Most FTP servers transferred files fine, but were very confused
    internally 

The very good news is that every significant kernel and application
issue we identified in 2000 was fixed within a year.  Many of them had
already been fixed in the development versions of the packages.

-- 
Donald Becker				becker@scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993