How to get remote file size from a shell script?
Asked Answered
D

14

90

Is there a way to get the size of a remote file like

http://api.twitter.com/1/statuses/public_timeline.json

in shell script?

Diatomic answered 21/12, 2010 at 9:21 Comment(2)
few examples in this page, here is one for Windows shell script (that can be double as bash script with few modification) superuser.com/a/1007898/429721Uhland
How about wget --spider?Boiling
L
142

You can download the file and get its size. But we can do better.

Use curl to get only the response header using the -I option.

In the response header look for Content-Length: which will be followed by the size of the file in bytes.

$ URL="http://api.twitter.com/1/statuses/public_timeline.json"
$ curl -sI $URL | grep -i Content-Length
Content-Length: 134

To get the size use a filter to extract the numeric part from the output above:

$ curl -sI $URL | grep -i Content-Length | awk '{print $2}'
134
Levison answered 21/12, 2010 at 9:25 Comment(6)
Used this function and wanted to send the result to a function to format the bytes to KB or MB, and it has a hidden carriage return, pipe the result to tr -d '\r' to remove them.Holmgren
curl -sI $URL | grep -i content-length to avoid case sensitive you have to use -i in grepCountercharge
Not working for me curl -sI https://code.jquery.com/jquery-3.1.1.min.js | grep -i content-lengthOasis
Use cut -d' ' -f2 instead of awk. awk is bigger and slower than cut. And to be clear, that's a space between single quotes. Otherwise, this answer works for me.Pewter
@Prisoner13 : LOL. Thanks for the laugh. I test cut first 7 columns in a 12mn rows 1.85 GB file i already have that's filled with unicode, and using = as column delimiter. it took bsd-cut some 31.44 secs, gnu-cut 5.485s, and mawk2 5.437s. That's basically a tie.Cheung
This only works if the server bothers to send a content-length header with its response. They don't always do that.Holohedral
C
32

Two caveats to the other answers:

  1. Some servers don't return the correct Content-Length for a HEAD request, so you might need to do the full download.
  2. You'll likely get an unrealistically large response (compared to a modern browser) unless you specify gzip/deflate headers.

Also, you can do this without grep/awk or piping:

curl 'http://api.twitter.com/1/statuses/public_timeline.json' --location --silent --write-out 'size_download=%{size_download}\n' --output /dev/null

And the same request with compression:

curl 'http://api.twitter.com/1/statuses/public_timeline.json' --location --silent  -H 'Accept-Encoding: gzip,deflate' --write-out 'size_download=%{size_download}\n' --output /dev/null
Contrarily answered 29/5, 2015 at 22:48 Comment(3)
This doesn't seem to work with redirects. Doesn't this download the whole file also?Crankle
@TomHale I think you can just add -L to command to follow redirects (I don't have a handy redirecting URL to test). And, yes, it downloads the whole file.Contrarily
If you can depend on the web server you're querying to return an accurate Content-Length for a HEAD request, you don't need to download the whole file. Just add -I to the example above to see how it returns zero (at least it does on 2-25-2019). My solution is more generalized.Contrarily
B
10

Similar to codaddict's answer, but without the call to grep:

curl -sI http://api.twitter.com/1/statuses/public_timeline.json | awk '/Content-Length/ { print $2 }'
Boatel answered 21/12, 2010 at 10:10 Comment(2)
Ironically, the example URL you chose uses lower case header strings content-length which breaks your command. There are lots of ways to ignore case in awk, but this is the most bulletproof: curl -sI http://api.twitter.com/1/statuses/public_timeline.json | awk '/[Cc]ontent-[Ll]ength/ { print $2 }' ...of course grep is also nice ;)Lunde
I guess that the headers changed in the four years between my answer and this comment :)Boatel
C
6

I think the easiest way to do this would be to:

  1. use cURL to run in silent mode -s,

  2. pull only the headers -I (so as to avoid downloading the whole file)

  3. then do a case insensitive grep -i

  4. and return the second arg using awk $2.

  5. output is returned as bytes

Examples:

curl -sI http://api.twitter.com/1/statuses/public_timeline.json | grep -i content-length | awk '{print $2}'

//output: 52

or

curl -sI https://code.jquery.com/jquery-3.1.1.min.js | grep -i content-length | awk '{print $2}'

//output: 86709

or

curl -sI http://download.thinkbroadband.com/1GB.zip | grep -i content-length | awk '{print $2}'

//output: 1073741824

Show as Kilobytes/Megabytes

If you would like to show the size in Kilobytes then change the awk to:

awk '{print $2/1024}'

or Megabytes

awk '{print $2/1024/1024}'
Cadman answered 25/3, 2017 at 23:11 Comment(0)
P
5

The preceding answers won't work when there are redirections. For example, if one wants the size of the debian iso DVD, he must use the --location option, otherwise, the reported size may be that of the 302 Moved Temporarily answer body, not that of the real file.
Suppose you have the following url:

$ url=http://cdimage.debian.org/debian-cd/8.1.0/amd64/iso-dvd/debian-8.1.0-amd64-DVD-1.iso

With curl, you could obtain:

$ curl --head --location ${url}
HTTP/1.0 302 Moved Temporarily
...
Content-Type: text/html; charset=iso-8859-1
...

HTTP/1.0 200 OK
...
Content-Length: 3994091520
...
Content-Type: application/x-iso9660-image
...

That's why I prefer using HEAD, which is an alias to the lwp-request command from the libwww-perl package (on debian). Another advantages it has is that it strips the extra \r characters, which eases subsequent string processing.

So to retrieve the size of the debian iso DVD, one could do for example:

$ size=$(HEAD ${url})
$ size=${size##*Content-Length: }
$ size=${size%%[[:space:]]*}

Please note that:

  • this method will require launching only one process
  • it will work only with bash, because of the special expansion syntax used

For other shells, you may have to resort to sed, awk, grep et al..

Pudens answered 16/6, 2015 at 8:59 Comment(3)
Nice answer. Would it be possible to do it in a one-liner?Astrogate
size=$(HEAD ${url} | grep "Content-Length:" | sed 's/.*: //')Pudens
Sorry, I don't know how to edit my previous comment which I posted too quickly. The one-liner solution I just posted will work but at the expense of creating 2 extra processes. In the other hand, it should be compatible with more shells.Pudens
O
3

The accepted solution was not working for me, this is:

curl -s https://code.jquery.com/jquery-3.1.1.min.js | wc -c
Oasis answered 7/12, 2016 at 17:55 Comment(2)
@Oasis Don't you think it's better to get the data from the headers? As this will actually download the file buffer to wc.Cadman
@0x616f your are right, this information is also in the headers. Can you propose a solution and notice me? I will vote it up ;)Oasis
A
1

To combine all the above for me works:

URL="http://cdimage.debian.org/debian-cd/current/i386/iso-dvd/debian-9.5.0-i386-DVD-1.iso"
curl --head --silent --location "$URL" | grep -i "content-length:" | tr -d " \t" | cut -d ':' -f 2

This will return just the content length in bytes:

3767500800
Aranda answered 8/10, 2015 at 15:27 Comment(0)
D
1

This will show you a detailed info about the ongoing download

you just need to specify an URL like below example.

$ curl -O -w 'We downloaded %{size_download} bytes\n' 
https://cmake.org/files/v3.8/cmake-3.8.2.tar.gz

output

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 7328k  100 7328k    0     0   244k      0  0:00:29  0:00:29 --:--:--  365k
We downloaded 7504706 bytes

For automated purposes you'll just need to add the command to your script file.

Dictation answered 11/7, 2017 at 21:33 Comment(0)
V
1

I have a shell function, based on codaddict's answer, which gives a remote file's size in a human-readable format thusly:

remote_file_size () {
  printf "%q" "$*"           |
    xargs curl -sI           |
    grep Content-Length      |
    awk '{print $2}'         |
    tr -d '\040\011\012\015' |
    gnumfmt --to=iec-i --suffix=B # the `g' prefix on `numfmt' is only for systems
  # ^                             # that lack the GNU coreutils by default, i.e.,
  # |                             # non-Linux systems
  # |
  # |                             # in other words, if you're on Linux, remove this
  # |                             # letter `g'; if you're on BSD or Mac, install the GNU coreutils
} # |                                        |
  # +----------------------------------------+
Vauntcourier answered 31/8, 2017 at 6:0 Comment(0)
N
0

Question is old and have been sufficiently answered , but let expand upon exisiting answer. If you want to automate this task ( for checking file sizes of multiple files) then here's a one liner.

first write the URL of the files in a file:

cat url_of_files.txt

https://stpubdata-jwst.stsci.edu/ero/jw02734/jw02734002001/jw02734002001_04101_00001-seg002_nis_x1dints.fits
https://stpubdata-jwst.stsci.edu/ero/jw02734/jw02734002001/jw02734002001_04101_00001-seg003_nis_calints.fits
https://stpubdata-jwst.stsci.edu/ero/jw02734/jw02734002001/jw02734002001_04102_00001-seg001_nis_calints.fits
https://stpubdata-jwst.stsci.edu/ero/jw02734/jw02734002001/jw02734002001_02101_00002-seg001_nis_cal.fits
... 

then from the command line (from the same directory as your url_of_files.txt):

eval $(sed -rn '/^https/s/(https.*$)/curl -sI \1/p' url_of_files.txt) | awk '/[Cc]ontent-[Ll]ength/{kb=$2/1024;mb=kb/1024;gb=mb/1024;print ( $2>1024 ? ( kb>1024 ? ( mb>1024 ?  gb " G" : mb " M") : kb " K" ) : $2 " B" ) }'


This is for checking file sizes ranging from bytes to Gbs. I use this line to check the fits data files being made available by the JWST team.

It checks the file size and depending on its size , roughly converts it to a an appropriate number with B,K,M,G extensions denoting the size in Bytes, Kilo bytes, Mega bytes, and Giga bytes.

result:

...
177.188 K
177.188 K
236.429 M
177.188 K
5.95184 M
1.83608 G
1.20326 G
130.059 M
1.20326 G
...
Nitriding answered 13/7, 2022 at 20:22 Comment(0)
C
0

You can kinda do it like this, including auto-following 301/302 redirections :

curl -ILs 'https://twitter.com/i/csp_report?a=ORTGK%3D%3D%3D&ro=fals' | 

mawk 'NF*=!_<NF' \
      OFS=   FS='^[Cc][Oo][Nn][Tt][Ee][Nn][Tt]-[Ll][Ee][Nn][Gg][Tt][Hh]: ' 
 1  41

It's very brute force but gets the job done - but that's whatever raw value being reported by the server, so you may have to make adjustments to it as you see fit.

You may also have to add the -g flag so it can auto handle switchover from vanilla http to https :

curl -gILs 'http://apple.com' | 

mawk 'NF *= !_<NF' OFS= \
        FS='^[Cc][Oo][Nn][Tt][Ee][Nn][Tt]-[Ll][Ee][Nn][Gg][Tt][Hh]: ' 
 1  304

 2  106049 

    '(I''m *guessing* this might be the main site, 
      and first item was the redirection page ? )'
Cheung answered 14/7, 2022 at 1:24 Comment(0)
H
0

My solution is using awk END to ensure to grep only the last Content-length:

function curl2contentlength() {
    curl -sI -L -H 'Accept-Encoding: gzip,deflate' $1 | grep -i Content-Length | awk 'END{print $2}'
}
curl2contentlength $@

./curl2contentlength.sh "https://chrt.fm/track/B63133/stitcher.simplecastaudio.com/ec74d48c-cbf1-4764-923e-7d584dce50fa/episodes/a85954a3-24c3-48ed-bced-ef0607b7149a/audio/128/default.mp3?aid=rss_feed&awCollectionId=ec74d48c-cbf1-4764-923e-7d584dce50fa&awEpisodeId=a85954a3-24c3-48ed-bced-ef0607b7149a&feed=qm_9xx0g"

10806508

In fact without it would have been

0
0
10806508
Hebraic answered 8/9, 2022 at 19:7 Comment(0)
L
-1

I use like this ([Cc]ontent-[Ll]ength:), because I got server give multiple Content-Length character at header response

curl -sI "http://someserver.com/hls/125454.ts" | grep [Cc]ontent-[Ll]ength: | awk '{ print $2 }'

Accept-Ranges: bytes Access-Control-Expose-Headers: Date, Server, Content-Type, Content-Length Server: WowzaStreamingEngine/4.5.0 Cache-Control: no-cache Access-Control-Allow-Origin: * Access-Control-Allow-Credentials: true Access-Control-Allow-Methods: OPTIONS, GET, POST, HEAD Access-Control-Allow-Headers: Content-Type, User-Agent, If-Modified-Since, Cache-Control, Range Date: Tue, 10 Jan 2017 01:56:08 GMT Content-Type: video/MP2T Content-Length: 666460

Lossa answered 10/1, 2017 at 2:4 Comment(0)
T
-5

different solution:

ssh userName@IP ls -s PATH | grep FILENAME | awk '{print$1}'

gives you the size in KB

Tantalus answered 7/7, 2016 at 13:4 Comment(1)
This works only if we have an ssh account on the same server where the url content is hosted, which is quite a strong constraint.Jaime

© 2022 - 2024 — McMap. All rights reserved.