I'm wondering if there is a way to check ahead of time the size of a file I might download via wget
? I know that using the --spider
option tells me if a file exists or not, but I'm interested in finding the size of that file as well.
Hmm.. for me --spider
does display the size:
$ wget --spider http://henning.makholm.net/
Spider mode enabled. Check if remote file exists.
--2011-08-08 19:39:48-- http://henning.makholm.net/
Resolving henning.makholm.net (henning.makholm.net)... 85.81.19.235
Connecting to henning.makholm.net (henning.makholm.net)|85.81.19.235|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9535 (9.3K) [text/html] <-------------------------
Remote file exists and could contain further links,
but recursion is disabled -- not retrieving.
$
(But beware that not all web servers will inform clients of the length of the data except by closing the connection when it's all been sent.)
If you're concerned about wget changing the format it reports the length in, you might use wget --spider --server-response
and look for a Content-Length
header in the output.
--> SIZE filename.ext 213 ########
–
Anti Length: unspecified [application/zip]
@Henning Makholm –
Donar wget --method=HEAD
works the same as --spider
–
Espionage curl --head URL
Look for "Content-Length:" in the output.
And thanks to Henning Makholm's comment:
wget --spider URL
and look for "Length:" in the output.
wget
would be more pleasing (-: –
Streptococcus wget -S
(wget --server-response
) shows the same header information, but then it goes on to download the file, so that's not useful for the question. I don't see an option for wget to show the headers without fetching the file. For example, ``tries=0` means infinite retries. –
Paradigm wget
option to do only HEAD is spelled --spider
. –
Nubbly wget -S
to work; see my answer. –
Merridie Content-Length
does not exist? –
Donar Content-Length
to do so. Consider that the thing you're looking at might not be a file with a defined size; it might be the the of some program. In that case, the only way to know the size is to download the data and count the bytes. –
Paradigm wget
actively downloads it, where add a download limit to wget, hence if exceeds the limit stop the download process. –
Donar I was actually looking for the size of a directory and google got me here. While there is no direct answer here, the accepted answer helped me to build the following command on top of it:
wget --spider -m -np URL-to-dir 2>&1 | sed -n -e /unspecified/d -e '/^Length: /{s///;s/ .*//;p}' | paste -s -d+ | bc
The above runs wget
in a spider mode for the entire directory, which ends up logging the length for each file in that directory. The output is then piped to sed
to extract a sequence of numbers (byte sizes). The last two components in the pipe simply help sum it up to get the total in bytes.
This should work:
size_bytes=$(wget -S "${url}" --start-pos=500G 2>&1 | grep Content-Length | cut -d: -f2)
© 2022 - 2024 — McMap. All rights reserved.
wget http://example.com --spider --server-response -O - 2>&1 | sed -ne '/Content-Length/{s/.*: //;p}'
– Conney