[Bioclusters] ack. I'm getting bitten by the 2gb filesize problem on a linux cluster...

Thu, 30 Jan 2003 18:03:46 -0500

Sanity check:

The below bit shows what happens on an xfs file system based RedHat 7.3 
machine with a kernel upgrade.  One of the odd issues I found in RH7.2 
days was ext2 and ext3 had troubles with > 2GB files on RH.  This often 
was a mixture of several different problems.  RH7.3 is a "modern" distro 
in that things work right with mostly up-to-date tools.

Punchline:  no problems on this system.

Note:  strace output shows wget using fstat64 and related functions. 
One thing that confused me during my debugging times in the past has 
been if any of the libraries or pipelined code snippets had been 
compiled w/o the large file support, it would be the weak link in the chain.

Also note:  for both raw performance reasons, and for sanity reasons, I 
try to use XFS where I can.  On large file sequential reads it is hard 
to beat.  Add to that the fsck times on ext2 (or ext3 w/o data=journal 
option on mount) are painful on small systems.  I do not (and will not) 
use ReiserFS anymore for anything.  I only use ext3 when raw performance 
does not matter as much as maintaining the ability to use the next RH 
kernel.  Life is made harder by RH not including XFS (yet).  This will 
change with 2.6 kernels.

synopsis: Looks like you need either a newer wget, or newer glibc (or 
one of the other libraries that RH7.2 presents to wget on RLD).

--- target ---

[landman@squash.scalableinformatics.com:~]
5 >wget -v http://127.0.0.1/db/nt.Z
--17:28:31--  http://127.0.0.1/db/nt.Z
            => `nt.Z'
Connecting to 127.0.0.1:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2,112,977,407 [text/plain]

100%[=================================================>] 2,112,977,407 
  34.23M/s    ETA 00:00

17:29:30 (34.23 MB/s) - `nt.Z' saved [2112977407/2112977407]

[landman@squash.scalableinformatics.com:~]
6 >md5sum nt.Z
b88443e2fb32bd3b593fa39000e7e18a  nt.Z

[landman@squash.scalableinformatics.com:~]
7 >wget -O - http://127.0.0.1/db/nt.Z | md5sum -
--17:33:48--  http://127.0.0.1/db/nt.Z
            => `-'
Connecting to 127.0.0.1:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2,112,977,407 [text/plain]

100%[=================================================>] 2,112,977,407 
  21.42M/s    ETA 00:00

17:35:22 (21.42 MB/s) - `-' saved [2112977407/2112977407]

b88443e2fb32bd3b593fa39000e7e18a  -

[landman@squash.scalableinformatics.com:~]
8 >uname -a
Linux squash.scalableinformatics.com 2.4.20-rc4 #1 Tue Nov 26 23:38:45 
EST 2002 i686 unknown
[landman@squash.scalableinformatics.com:~]
9 >rpm -qa | grep -i wget
wget-1.8.2-4.73

--- source ---
[root@squash db]# md5sum nt.Z
b88443e2fb32bd3b593fa39000e7e18a  nt.Z

--- strace of wget ---
execve("/usr/bin/wget", ["wget", "-v", "http://127.0.0.1/db/nt.Z"], [/* 
46 vars */]) = 0
uname({sys="Linux", node="squash.scalableinformatics.com", ...}) = 0
brk(0)                                  = 0x80725f0
open("/etc/ld.so.preload", O_RDONLY)    = -1 ENOENT (No such file or 
directory)
open("/usr/X11R6/lib/i686/mmx/libssl.so.2", O_RDONLY) = -1 ENOENT (No su

[...]

select(4, NULL, [3], [3], {900, 0})     = 1 (out [3], left {900, 0})
write(3, "GET /db/nt.Z HTTP/1.0\r\nUser-Agen"..., 103) = 103
write(2, "HTTP request sent, awaiting resp"..., 40HTTP request sent, 
awaiting response... ) = 40
select(4, [3], NULL, [3], {900, 0})     = 1 (in [3], left {900, 0})
read(3, "HTTP/1.1 200 OK\r\nDate: Thu, 30 J"..., 4096) = 4096
write(2, "200 OK", 6200 OK)                   = 6
write(2, "\n", 1
)                       = 1
write(2, "Length: ", 8Length: )                 = 8
write(2, "2,112,977,407", 132,112,977,407)           = 13
write(2, " [text/plain]\n", 14 [text/plain]
)         = 14
open("nt.Z.2", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 4
fstat64(4, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 
-1, 0) = 0x40014000

chris dagdigian wrote:
> Hi folks,
> 
> I thought these problems were long past me with modern kernels and 
> filesystems --
> 
> We as a community have learned to deal with uncompressed sequence 
> databases that are greater than 2gb -- its pretty simple to gzcat the 
> file and pipe it through formatdb via STDIN to avoid having to 
> uncompress the database file at all.
> 
> Now however I've got a problem that the compressed archive file that 
> someone is trying to download is greater than 2gb in size :)
> 
> The database in question is:
> 
> ftp://ftp.ncbi.nlm.nih.gov/blast/db/FormattedDatabases/htgs.tar.gz
> 
> The file is mirrored via 'wget' and a cron script and has recently 
> started core dumping. A ftp session for this file also seemed to bomb 
> out but I have not verified this fully.
> 
> I did the usual things that one does; verified that the wget binary core 
> dumps regardless of what shell one is using (Joe Landman found this 
> issue a while ago...). I also verified that the error occurs when 
> downloading to a NFS mounted NetApp filesystem as well as a local ext3 
> formatted filesystem.  The node is running Redhat 7.2 with a 2.4.18-18.7 
> kernel.
> 
> Next step was to recompile 'wget' from the source tarball with the usual 
>  "-D_ENABLE_64_BIT_OFFSET" and "-D_LARGE_FILES"  compiler directives.
> 
> Still no love. The wget binary still fails once the downloaded file gets 
> a little larger than 2gb in size.
> 
> Anyone seen this before? What FTP or HTTP download clients are people 
> using to download large files?
> 
> -Chris
> 
> 
> 
> 

-- 
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman@scalableinformatics.com
web  : http://scalableinformatics.com
phone: +1 734 612 4615