Is there a way to get the size of a remote file like
http://api.twitter.com/1/statuses/public_timeline.json
in shell script?
Is there a way to get the size of a remote file like
http://api.twitter.com/1/statuses/public_timeline.json
in shell script?
You can download the file and get its size. But we can do better.
Use curl to get only the response header using the -I
option.
In the response header look for Content-Length:
which will be followed by the size of the file in bytes.
$ URL="http://api.twitter.com/1/statuses/public_timeline.json"
$ curl -sI $URL | grep -i Content-Length
Content-Length: 134
To get the size use a filter to extract the numeric part from the output above:
$ curl -sI $URL | grep -i Content-Length | awk '{print $2}'
134
tr -d '\r'
to remove them. –
Holmgren curl -sI $URL | grep -i content-length
to avoid case sensitive you have to use -i
in grep –
Countercharge curl -sI https://code.jquery.com/jquery-3.1.1.min.js | grep -i content-length
–
Oasis 1.85 GB
file i already have that's filled with unicode, and using =
as column delimiter. it took bsd
-cut
some 31.44 secs, gnu
-cut
5.485s, and mawk2
5.437s. That's basically a tie. –
Cheung content-length
header with its response. They don't always do that. –
Holohedral Two caveats to the other answers:
Also, you can do this without grep/awk or piping:
curl 'http://api.twitter.com/1/statuses/public_timeline.json' --location --silent --write-out 'size_download=%{size_download}\n' --output /dev/null
And the same request with compression:
curl 'http://api.twitter.com/1/statuses/public_timeline.json' --location --silent -H 'Accept-Encoding: gzip,deflate' --write-out 'size_download=%{size_download}\n' --output /dev/null
-L
to command to follow redirects (I don't have a handy redirecting URL to test). And, yes, it downloads the whole file. –
Contrarily Content-Length
for a HEAD
request, you don't need to download the whole file. Just add -I
to the example above to see how it returns zero (at least it does on 2-25-2019). My solution is more generalized. –
Contrarily Similar to codaddict's answer, but without the call to grep
:
curl -sI http://api.twitter.com/1/statuses/public_timeline.json | awk '/Content-Length/ { print $2 }'
content-length
which breaks your command. There are lots of ways to ignore case in awk, but this is the most bulletproof: curl -sI http://api.twitter.com/1/statuses/public_timeline.json | awk '/[Cc]ontent-[Ll]ength/ { print $2 }'
...of course grep is also nice ;) –
Lunde use cURL to run in silent mode -s
,
pull only the headers -I
(so as to avoid downloading the whole file)
then do a case insensitive grep -i
and return the second arg using awk $2
.
output is returned as bytes
curl -sI http://api.twitter.com/1/statuses/public_timeline.json | grep -i content-length | awk '{print $2}'
//output: 52
or
curl -sI https://code.jquery.com/jquery-3.1.1.min.js | grep -i content-length | awk '{print $2}'
//output: 86709
or
curl -sI http://download.thinkbroadband.com/1GB.zip | grep -i content-length | awk '{print $2}'
//output: 1073741824
If you would like to show the size in Kilobytes then change the awk to:
awk '{print $2/1024}'
or Megabytes
awk '{print $2/1024/1024}'
The preceding answers won't work when there are redirections. For example, if one wants the size of the debian iso DVD, he must use the --location option, otherwise, the reported size may be that of the 302 Moved Temporarily
answer body, not that of the real file.
Suppose you have the following url:
$ url=http://cdimage.debian.org/debian-cd/8.1.0/amd64/iso-dvd/debian-8.1.0-amd64-DVD-1.iso
With curl, you could obtain:
$ curl --head --location ${url}
HTTP/1.0 302 Moved Temporarily
...
Content-Type: text/html; charset=iso-8859-1
...
HTTP/1.0 200 OK
...
Content-Length: 3994091520
...
Content-Type: application/x-iso9660-image
...
That's why I prefer using HEAD
, which is an alias to the lwp-request
command from the libwww-perl package (on debian). Another advantages it has is that it strips the extra \r characters, which eases subsequent string processing.
So to retrieve the size of the debian iso DVD, one could do for example:
$ size=$(HEAD ${url})
$ size=${size##*Content-Length: }
$ size=${size%%[[:space:]]*}
Please note that:
For other shells, you may have to resort to sed, awk, grep et al..
The accepted solution was not working for me, this is:
curl -s https://code.jquery.com/jquery-3.1.1.min.js | wc -c
wc
. –
Cadman To combine all the above for me works:
URL="http://cdimage.debian.org/debian-cd/current/i386/iso-dvd/debian-9.5.0-i386-DVD-1.iso"
curl --head --silent --location "$URL" | grep -i "content-length:" | tr -d " \t" | cut -d ':' -f 2
This will return just the content length in bytes:
3767500800
$ curl -O -w 'We downloaded %{size_download} bytes\n'
https://cmake.org/files/v3.8/cmake-3.8.2.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 7328k 100 7328k 0 0 244k 0 0:00:29 0:00:29 --:--:-- 365k
We downloaded 7504706 bytes
For automated purposes you'll just need to add the command to your script file.
I have a shell function, based on codaddict's answer, which gives a remote file's size in a human-readable format thusly:
remote_file_size () {
printf "%q" "$*" |
xargs curl -sI |
grep Content-Length |
awk '{print $2}' |
tr -d '\040\011\012\015' |
gnumfmt --to=iec-i --suffix=B # the `g' prefix on `numfmt' is only for systems
# ^ # that lack the GNU coreutils by default, i.e.,
# | # non-Linux systems
# |
# | # in other words, if you're on Linux, remove this
# | # letter `g'; if you're on BSD or Mac, install the GNU coreutils
} # | |
# +----------------------------------------+
Question is old and have been sufficiently answered , but let expand upon exisiting answer. If you want to automate this task ( for checking file sizes of multiple files) then here's a one liner.
first write the URL of the files in a file:
cat url_of_files.txt
https://stpubdata-jwst.stsci.edu/ero/jw02734/jw02734002001/jw02734002001_04101_00001-seg002_nis_x1dints.fits
https://stpubdata-jwst.stsci.edu/ero/jw02734/jw02734002001/jw02734002001_04101_00001-seg003_nis_calints.fits
https://stpubdata-jwst.stsci.edu/ero/jw02734/jw02734002001/jw02734002001_04102_00001-seg001_nis_calints.fits
https://stpubdata-jwst.stsci.edu/ero/jw02734/jw02734002001/jw02734002001_02101_00002-seg001_nis_cal.fits
...
then from the command line (from the same directory as your url_of_files.txt
):
eval $(sed -rn '/^https/s/(https.*$)/curl -sI \1/p' url_of_files.txt) | awk '/[Cc]ontent-[Ll]ength/{kb=$2/1024;mb=kb/1024;gb=mb/1024;print ( $2>1024 ? ( kb>1024 ? ( mb>1024 ? gb " G" : mb " M") : kb " K" ) : $2 " B" ) }'
This is for checking file sizes ranging from bytes
to Gbs
. I use this line to check the fits data files being made available by the JWST team.
It checks the file size and depending on its size , roughly converts it to a an appropriate number with B,K,M,G extensions denoting the size in Bytes, Kilo bytes, Mega bytes, and Giga bytes.
result:
...
177.188 K
177.188 K
236.429 M
177.188 K
5.95184 M
1.83608 G
1.20326 G
130.059 M
1.20326 G
...
You can kinda do it like this, including auto-following 301/302
redirections :
curl -ILs 'https://twitter.com/i/csp_report?a=ORTGK%3D%3D%3D&ro=fals' | mawk 'NF*=!_<NF' \ OFS= FS='^[Cc][Oo][Nn][Tt][Ee][Nn][Tt]-[Ll][Ee][Nn][Gg][Tt][Hh]: '
1 41
It's very brute force but gets the job done - but that's whatever raw value being reported by the server, so you may have to make adjustments to it as you see fit.
You may also have to add the -g
flag so it can auto handle switchover from vanilla http
to https
:
curl -gILs 'http://apple.com' | mawk 'NF *= !_<NF' OFS= \ FS='^[Cc][Oo][Nn][Tt][Ee][Nn][Tt]-[Ll][Ee][Nn][Gg][Tt][Hh]: '
1 304
2 106049
'(I''m *guessing* this might be the main site,
and first item was the redirection page ? )'
My solution is using awk END
to ensure to grep only the last Content-length
:
function curl2contentlength() {
curl -sI -L -H 'Accept-Encoding: gzip,deflate' $1 | grep -i Content-Length | awk 'END{print $2}'
}
curl2contentlength $@
./curl2contentlength.sh "https://chrt.fm/track/B63133/stitcher.simplecastaudio.com/ec74d48c-cbf1-4764-923e-7d584dce50fa/episodes/a85954a3-24c3-48ed-bced-ef0607b7149a/audio/128/default.mp3?aid=rss_feed&awCollectionId=ec74d48c-cbf1-4764-923e-7d584dce50fa&awEpisodeId=a85954a3-24c3-48ed-bced-ef0607b7149a&feed=qm_9xx0g"
10806508
In fact without it would have been
0
0
10806508
I use like this ([Cc]ontent-[Ll]ength:)
, because I got server give multiple Content-Length character at header response
curl -sI "http://someserver.com/hls/125454.ts" | grep [Cc]ontent-[Ll]ength: | awk '{ print $2 }'
Accept-Ranges: bytes
Access-Control-Expose-Headers: Date, Server, Content-Type, Content-Length
Server: WowzaStreamingEngine/4.5.0
Cache-Control: no-cache
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
Access-Control-Allow-Methods: OPTIONS, GET, POST, HEAD
Access-Control-Allow-Headers: Content-Type, User-Agent, If-Modified-Since, Cache-Control, Range
Date: Tue, 10 Jan 2017 01:56:08 GMT
Content-Type: video/MP2T
Content-Length: 666460
different solution:
ssh userName@IP ls -s PATH | grep FILENAME | awk '{print$1}'
gives you the size in KB
© 2022 - 2024 — McMap. All rights reserved.
wget --spider
? – Boiling