On Thu, Aug 04, 2005 at 03:43:25PM +0800, Darwin Sadeli
wrote:
I encountered a problem when trying to download
the english wikipedia
image dumps. First of all, the web page states that the size of the
dump is 16.7 GB. However, when I tried to download using wget, only
about 700 MB was downloaded. When I tried to extract the tar file, it
stops abruptly, giving error message that the end of file is
corrupted.
So, I wondered if the image dumps are actually corrupted or my method
of downloading is actually wrong and certain steps are required to
download the image dump. Please help! Thank you in advance for your
kind assistance.
Iirc stable wget has a bigfile problems. So it sees 16.7 GB
(4 * 4GB + 700 MB) as 700 MB, because that's what fits in 32 bit value.
Try some other program (or maybe unstable wget).
Recent versions of curl have been tested and are able to download large
files without problems. Note that curl needs an explicit option ("-C -" i
believe) to continue from a partially downloaded file if the transfer was
interrupted (can happen in a several hours download), otherwise it will
start from scratch.
Alfio