get file size of a file to wget before wget-ing it?

Asked 8/8, 2011 at 17:30 Answered 4/4, 2023 at 17:10

I'm wondering if there is a way to check ahead of time the size of a file I might download via wget? I know that using the --spider option tells me if a file exists or not, but I'm interested in finding the size of that file as well.

Yttria answered 8/8, 2011 at 17:30 Comment(0)

Hmm.. for me --spider does display the size:

$ wget --spider http://henning.makholm.net/
Spider mode enabled. Check if remote file exists.
--2011-08-08 19:39:48--  http://henning.makholm.net/
Resolving henning.makholm.net (henning.makholm.net)... 85.81.19.235
Connecting to henning.makholm.net (henning.makholm.net)|85.81.19.235|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9535 (9.3K) [text/html]     <-------------------------
Remote file exists and could contain further links,
but recursion is disabled -- not retrieving.

$

(But beware that not all web servers will inform clients of the length of the data except by closing the connection when it's all been sent.)

If you're concerned about wget changing the format it reports the length in, you might use wget --spider --server-response and look for a Content-Length header in the output.

Nubbly answered 8/8, 2011 at 17:37 Comment(7)

Just for completeness, here's extraction of the size only: wget http://example.com --spider --server-response -O - 2>&1 | sed -ne '/Content-Length/{s/.*: //;p}' – Conney 8/8, 2011 at 18:8

For FTP, look for this in the output: --> SIZE filename.ext 213 ######## – Anti 17/6, 2013 at 20:25

What is length returns unspecified: Length: unspecified [application/zip] @Henning Makholm – Donar 21/11, 2018 at 11:59

@alper: In that case the server at the other end will not tell you how long the file will be unless you download all of it. – Nubbly 21/11, 2018 at 14:4

What about aria2? – Bohs 21/7, 2019 at 7:18

solution that works without -S (--server-response) and just parses Lenght: unix.stackexchange.com/a/16499/162125 – Espionage 2/10, 2021 at 23:53

BTW the command wget --method=HEAD works the same as --spider – Espionage 2/10, 2021 at 23:55

curl --head URL

Look for "Content-Length:" in the output.

And thanks to Henning Makholm's comment:

wget --spider URL

and look for "Length:" in the output.

Paradigm answered 8/8, 2011 at 17:50 Comment(9)

Although doing it with wget would be more pleasing (-: – Streptococcus 8/8, 2011 at 19:22

wget -S (wget --server-response) shows the same header information, but then it goes on to download the file, so that's not useful for the question. I don't see an option for wget to show the headers without fetching the file. For example, ``tries=0` means infinite retries. – Paradigm 8/8, 2011 at 19:32

For some reason the wget option to do only HEAD is spelled --spider. – Nubbly 10/8, 2011 at 11:46

There's a workaround that allows wget -S to work; see my answer. – Merridie 4/4, 2023 at 17:10

What if Content-Length does not exist? – Donar 6/5, 2023 at 23:19

@Donar Then the server isn't telling you the size. – Paradigm 7/5, 2023 at 4:16

@KeithThompson :-( Is there any workaround/hack to capture the size from the server? – Donar 8/5, 2023 at 6:19

@Donar Probably not. If the server wanted to give you the size, it would use Content-Length to do so. Consider that the thing you're looking at might not be a file with a defined size; it might be the the of some program. In that case, the only way to know the size is to download the data and count the bytes. – Paradigm 8/5, 2023 at 8:13

Thanks. I was looking for a way to predict the size before of the file before download aiming if the size of the data is large I halt the download process. Maybe I can check the downloaded size while wget actively downloads it, where add a download limit to wget, hence if exceeds the limit stop the download process. – Donar 9/5, 2023 at 8:24

I was actually looking for the size of a directory and google got me here. While there is no direct answer here, the accepted answer helped me to build the following command on top of it:

wget --spider -m -np URL-to-dir 2>&1 | sed -n -e /unspecified/d -e '/^Length: /{s///;s/ .*//;p}' | paste -s -d+ | bc

The above runs wget in a spider mode for the entire directory, which ends up logging the length for each file in that directory. The output is then piped to sed to extract a sequence of numbers (byte sizes). The last two components in the pipe simply help sum it up to get the total in bytes.

Puca answered 2/3, 2022 at 13:14 Comment(0)

This should work:

size_bytes=$(wget -S "${url}" --start-pos=500G 2>&1 | grep Content-Length | cut -d: -f2)

Merridie answered 4/4, 2023 at 17:10 Comment(0)

Recommended topics

Hot tags