I am downloading an archive with wget, how can I use wget to only redownload that file when the file is newer on the server or the size has changed?
I'm aware of the -N
flag but it doesn't work.
I am downloading an archive with wget, how can I use wget to only redownload that file when the file is newer on the server or the size has changed?
I'm aware of the -N
flag but it doesn't work.
TL;DR There is a critical bug introduced in or around wget 1.17 that broke this feature.
In older wget, you need to do wget -N https://example.com/file.zip
In newer wget, you need to do wget -N --no-if-modified-since https://example.com/file.zip
The server must support HEAD request and provide both timestamp (Last-Modified) and size (Content-Length).
Use the -d
flag to display request headers response headers for debugging.
wget --version
wget -N -d https://example.com/file.zip
truncate --size 1 file.zip
wget -N -d https://example.com/file.zip
In older versions where it used to work, wget sends a HEAD request to obtain the last modified time and the file size, then if either changed, wget sends a GET request (without Last-Modified-Since) to download the file.
In newer versions where it's broken, wget sends a single GET request (with Last-Modified-Since), to only download the file is date has changed. Unfortunately that doesn't work.
The change in behavior is broken by design, it simply cannot detect changes in file size, and as a side effect wget will never recover from a partial interrupted download.
When sending a HTTP GET
request with a timestamp, the server can respond 304 Not Modified
code with no content and no file size. The 304 code is only based on the last modification time provided by the client. Unfortunately this leaves no chance to wget to ever know about the file size or to redownload the file.
# wget 1.21 in ubuntu 22, broken
wget -N https://example.com/file.zip -d
truncate --size 1 file.zip
wget -N https://example.com/file.zip -d
---request begin---
GET /file.zip HTTP/1.1
Host: examplpe.com
If-Modified-Since: Thu, 31 Aug 2023 18:22:20 GMT
User-Agent: Wget/1.21.2
Accept: */*
Accept-Encoding: identity
Connection: Keep-Alive
---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 304 Not Modified
Date: Wed, 06 Sep 2023 09:10:16 GMT
Connection: keep-alive
Last-Modified: Thu, 31 Aug 2023 18:22:20 GMT
ETag: f37ffefc58f99f0b996a38154d87820344d86d41
Accept-Ranges: bytes
Content-Disposition: attachment; filename="file.zip"; filename*=UTF-8''file.zip
---response end---
304 Not Modified
Registered socket 3 for persistent reuse.
File ‘file.zip’ not modified on server. Omitting download.
web browsers do not suffer from this caching issue because they store the ETag
header from the initial response, a unique id representing a unique version of the file. Apache and nginx generate the ETag
automatically when serving static files based on last modification time and file size.
© 2022 - 2024 — McMap. All rights reserved.