[Bioclusters] ack. I'm getting bitten by the 2gb filesize problem on a linux cluster...

chris dagdigian bioclusters@bioinformatics.org
Thu, 30 Jan 2003 15:47:55 -0500


Hi folks,

I thought these problems were long past me with modern kernels and 
filesystems --

We as a community have learned to deal with uncompressed sequence 
databases that are greater than 2gb -- its pretty simple to gzcat the 
file and pipe it through formatdb via STDIN to avoid having to 
uncompress the database file at all.

Now however I've got a problem that the compressed archive file that 
someone is trying to download is greater than 2gb in size :)

The database in question is:

ftp://ftp.ncbi.nlm.nih.gov/blast/db/FormattedDatabases/htgs.tar.gz

The file is mirrored via 'wget' and a cron script and has recently 
started core dumping. A ftp session for this file also seemed to bomb 
out but I have not verified this fully.

I did the usual things that one does; verified that the wget binary core 
dumps regardless of what shell one is using (Joe Landman found this 
issue a while ago...). I also verified that the error occurs when 
downloading to a NFS mounted NetApp filesystem as well as a local ext3 
formatted filesystem.  The node is running Redhat 7.2 with a 2.4.18-18.7 
kernel.

Next step was to recompile 'wget' from the source tarball with the usual 
  "-D_ENABLE_64_BIT_OFFSET" and "-D_LARGE_FILES"  compiler directives.

Still no love. The wget binary still fails once the downloaded file gets 
a little larger than 2gb in size.

Anyone seen this before? What FTP or HTTP download clients are people 
using to download large files?

-Chris




-- 
Chris Dagdigian, <dag@sonsorol.org>
BioTeam Inc. - life science IT & informatics consulting
Office: 617-666-6454, Mobile: 617-877-5498, Fax: 425-699-0193
PGP KeyID: 83D4310E Yahoo IM: craffi Web: http://bioteam.net