[Bioclusters] what are people using to mirror large ftp repositories with 2gb+ files?

Chris Dagdigian bioclusters@bioinformatics.org
Wed, 10 Sep 2003 16:17:29 -0400


Hi folks,

I've turned a bunch of Seagate 160gb IDE disks into a large software 
RAID5 volume and am trying to mirror the raw fasta data from 
ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/ for use on a personal 
development project.

The 'wget' utility (Redhat 9 on a dual PIII system w/ ext3 filesystem) 
is bombing out on a few of the remote files which even when compressed 
are greater than 2gb in size.

My kernel and ext3 filesystem support large filesizes but 'wget' or my 
shell seem to have issues.

I've recompiled the wget .src.rpm with the usual compiler flags to add 
large file support and wget _seems_ to be working but I don't really 
trust it as it is reporting negative filesizes like this now:

 > RETR nt.Z...done
 > Length: -1,668,277,957 [-1,705,001,669 to go]

What are others doing? Would 'curl' be better? Any recommendations would 
be appreciated.

-Chris


-- 
Chris Dagdigian, <dag@sonsorol.org>
BioTeam Inc. - Independent Bio-IT & Informatics consulting
Office: 617-666-6454, Mobile: 617-877-5498, Fax: 425-699-0193
PGP KeyID: 83D4310E iChat/AIM: 'bioteamdag' Web: http://bioteam.net