[Bioclusters] ack. I'm getting bitten by the 2gb filesize problem on a linux cluster...

Donald Becker bioclusters@bioinformatics.org
Thu, 30 Jan 2003 16:29:26 -0500 (EST)


On Thu, 30 Jan 2003, chris dagdigian wrote:

> I thought these problems were long past me with modern kernels and 
> filesystems --

This is an application problem.

> We as a community have learned to deal with uncompressed sequence 
> databases that are greater than 2gb -- its pretty simple to gzcat the 

I can see this coming...

> file and pipe it through formatdb via STDIN to avoid having to 
> uncompress the database file at all.
>
> Now however I've got a problem that the compressed archive file that 
> someone is trying to download is greater than 2gb in size :)

As the first commercial Linux distribution to ship with tested LFS
support, back in the 2.2 kernel days, we have extensive experience with
large files.

The first place to look is -- surprise! -- the shell.  With some shells,
including older versions of Bash, you can't push more than than 2GB
through a pipe.  This is one of those bugs that you wouldn't guess.  We
found it because we had a defined test plan that included end-to-end
tests: we were not doing targeted testing for bugs we thought were
likely.

The next likely case is the FTP servers and clients.  Almost all required
minor updates for large file support.  Don't forget that reading and
writing are not the only problem spots.  The code that lists files,
passes the length and tracks progress is more likely to be buggy than the
trivial aspect of bulk data transfer.


-- 
Donald Becker				becker@scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993