get file size of a file to wget before wget-ing it?
Asked Answered
Y

4

71

I'm wondering if there is a way to check ahead of time the size of a file I might download via wget? I know that using the --spider option tells me if a file exists or not, but I'm interested in finding the size of that file as well.

Yttria answered 8/8, 2011 at 17:30 Comment(0)
N
92

Hmm.. for me --spider does display the size:

$ wget --spider http://henning.makholm.net/
Spider mode enabled. Check if remote file exists.
--2011-08-08 19:39:48--  http://henning.makholm.net/
Resolving henning.makholm.net (henning.makholm.net)... 85.81.19.235
Connecting to henning.makholm.net (henning.makholm.net)|85.81.19.235|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9535 (9.3K) [text/html]     <-------------------------
Remote file exists and could contain further links,
but recursion is disabled -- not retrieving.

$ 

(But beware that not all web servers will inform clients of the length of the data except by closing the connection when it's all been sent.)

If you're concerned about wget changing the format it reports the length in, you might use wget --spider --server-response and look for a Content-Length header in the output.

Nubbly answered 8/8, 2011 at 17:37 Comment(7)
Just for completeness, here's extraction of the size only: wget http://example.com --spider --server-response -O - 2>&1 | sed -ne '/Content-Length/{s/.*: //;p}'Conney
For FTP, look for this in the output: --> SIZE filename.ext 213 ########Anti
What is length returns unspecified: Length: unspecified [application/zip] @Henning MakholmDonar
@alper: In that case the server at the other end will not tell you how long the file will be unless you download all of it.Nubbly
What about aria2?Bohs
solution that works without -S (--server-response) and just parses Lenght: unix.stackexchange.com/a/16499/162125Espionage
BTW the command wget --method=HEAD works the same as --spiderEspionage
P
41
curl --head URL

Look for "Content-Length:" in the output.

And thanks to Henning Makholm's comment:

wget --spider URL

and look for "Length:" in the output.

Paradigm answered 8/8, 2011 at 17:50 Comment(9)
Although doing it with wget would be more pleasing (-:Streptococcus
wget -S (wget --server-response) shows the same header information, but then it goes on to download the file, so that's not useful for the question. I don't see an option for wget to show the headers without fetching the file. For example, ``tries=0` means infinite retries.Paradigm
For some reason the wget option to do only HEAD is spelled --spider.Nubbly
There's a workaround that allows wget -S to work; see my answer.Merridie
What if Content-Length does not exist?Donar
@Donar Then the server isn't telling you the size.Paradigm
@KeithThompson :-( Is there any workaround/hack to capture the size from the server?Donar
@Donar Probably not. If the server wanted to give you the size, it would use Content-Length to do so. Consider that the thing you're looking at might not be a file with a defined size; it might be the the of some program. In that case, the only way to know the size is to download the data and count the bytes.Paradigm
Thanks. I was looking for a way to predict the size before of the file before download aiming if the size of the data is large I halt the download process. Maybe I can check the downloaded size while wget actively downloads it, where add a download limit to wget, hence if exceeds the limit stop the download process.Donar
P
0

I was actually looking for the size of a directory and google got me here. While there is no direct answer here, the accepted answer helped me to build the following command on top of it:

wget --spider -m -np URL-to-dir 2>&1 | sed -n -e /unspecified/d -e '/^Length: /{s///;s/ .*//;p}' | paste -s -d+ | bc

The above runs wget in a spider mode for the entire directory, which ends up logging the length for each file in that directory. The output is then piped to sed to extract a sequence of numbers (byte sizes). The last two components in the pipe simply help sum it up to get the total in bytes.

Puca answered 2/3, 2022 at 13:14 Comment(0)
M
0

This should work:

size_bytes=$(wget -S "${url}" --start-pos=500G 2>&1 | grep Content-Length | cut -d: -f2)
Merridie answered 4/4, 2023 at 17:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.