[Bioclusters] problem downloading large files from NCBI

andy law (RI) bioclusters@bioinformatics.org
Fri, 3 Oct 2003 09:58:19 +0100


A word of caution - the NCBI FTP server has a bug in it.

The circumstances under which it appears are limited, but...

... if you try to download a file > 2GB ...
... and it fails after transferring more than 2GB, but not the whole file ...
... and you try to issue a REST command to get it to carry on where it left off ...

... then it tells you it has done so (return code 350) but actually starts transmitting at the 2GB point.

Putting that graphically for the hard of thought.
Assuming that the file is 2.5GB file and that each of the letters a-j each represent 0.25GB of information. The file on the server is

If we transfer 2.25GB then on the client we have

We ask for the rest of the file, starting at the point between 'i' and 'j'.

The server says 'OK' but actually sends us 'ij'

We now have on the client

which is wrong.

I have emailed info@ncbi.nlm.nih.gov. Is there anyone I should mail directly about this?



Yada, yada, yada...

The information contained in this e-mail (including any attachments) is confidential and is intended for the use of the addressee only.   The opinions expressed within this e-mail (including any attachments) are the opinions of the sender and do not necessarily constitute those of Roslin Institute (Edinburgh) ("the Institute") unless specifically stated by a sender who is duly authorised to do so on behalf of the Institute.