[Bioclusters] what are people using to mirror large ftp repositories with 2gb+ files?

Joe Landman bioclusters@bioinformatics.org
Wed, 10 Sep 2003 05:01:01 -0400


  You also need to look out for tcsh if you are using that.  A rebuild
may be needed.  

  I am using curl in my db_dlaf (database download and format) utility
(http://scalableinformatics.com/downloads/db_dlaf.pl).  I have been
bitten by some odd bugs in wget in the past, and now prefer curl.

2 >./db_dlaf.pl --help
db_dlaf.pl:  copyright 2003 Scalable Informatics LLC
             web: http://scalableinformatics.com
           email: landman@scalableinformatics.com
        db_dlaf.pl [--l | --list] [--path=/path] [--db=db1:db2:...] \
                [--url={http|ftp}://host/path] [--tmp=/path] \
                [--formatdb="options"] [--help]
                --l             list of files
                --list          longer list
                --path          where to put the database indices
                --db            list of databases to grab, use : to
                --url           http or ftp path to databases
                --tmp           temporary disk space
                --formatdb      formatdb options to use on each db

  It is also very much a function of which version of the glibc you are
using (assuming Linux/BSD like OS).


On Wed, 2003-09-10 at 16:17, Chris Dagdigian wrote:
> Hi folks,
> I've turned a bunch of Seagate 160gb IDE disks into a large software 
> RAID5 volume and am trying to mirror the raw fasta data from 
> ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/ for use on a personal 
> development project.
> The 'wget' utility (Redhat 9 on a dual PIII system w/ ext3 filesystem) 
> is bombing out on a few of the remote files which even when compressed 
> are greater than 2gb in size.
> My kernel and ext3 filesystem support large filesizes but 'wget' or my 
> shell seem to have issues.
> I've recompiled the wget .src.rpm with the usual compiler flags to add 
> large file support and wget _seems_ to be working but I don't really 
> trust it as it is reporting negative filesizes like this now:
>  > RETR nt.Z...done
>  > Length: -1,668,277,957 [-1,705,001,669 to go]
> What are others doing? Would 'curl' be better? Any recommendations would 
> be appreciated.
> -Chris
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman@scalableinformatics.com
web  : http://scalableinformatics.com
phone: +1 734 612 4615