How do I download and save a binary file over HTTP using Ruby?
The URL is http://somedomain.net/flv/sample/sample.flv
.
I am on the Windows platform and I would prefer not to run any external program.
How do I download and save a binary file over HTTP using Ruby?
The URL is http://somedomain.net/flv/sample/sample.flv
.
I am on the Windows platform and I would prefer not to run any external program.
The simplest way is the platform-specific solution:
#!/usr/bin/env ruby
`wget http://somedomain.net/flv/sample/sample.flv`
Probably you are searching for:
require 'net/http'
# Must be somedomain.net instead of somedomain.net/, otherwise, it will throw exception.
Net::HTTP.start("somedomain.net") do |http|
resp = http.get("/flv/sample/sample.flv")
open("sample.flv", "wb") do |file|
file.write(resp.body)
end
end
puts "Done."
Edit: Changed. Thank You.
Edit2: The solution which saves part of a file while downloading:
# instead of http.get
f = open('sample.flv')
begin
http.request_get('/sample.flv') do |resp|
resp.read_body do |segment|
f.write(segment)
end
end
ensure
f.close()
end
a platform-specific solution
. –
Bewitch wget
. OS X provides curl
(curl http://oh.no/its/pbjellytime.flv --output secretlylove.flv
). Windows has a Powershell equivalent (new-object System.Net.WebClient).DownloadFile('http://oh.no/its/pbjellytime.flv','C:\tmp\secretlylove.flv')
. Binaries for wget and curl exist for all operating system via download as well. I still highly recommend using the standard library unless your writing code solely for your own lovin'. –
Katelynnkaterina Invoke-WebRequest
: iwr $url -OutFile $path
–
Peri Net::HTTP
. And I receive the part of the file but get response Net::HTTPOK
. Is there any way to ensure we downloaded the file completely? –
Assess open
as potentially unsafe. Better use File.open
instead. See: rubocop.readthedocs.io/en/latest/cops_security/#securityopen –
Padlock I know that this is an old question, but Google threw me here and I think I found a simpler answer.
In Railscasts #179, Ryan Bates used the Ruby standard class OpenURI to do much of what was asked like this:
(Warning: untested code. You might need to change/tweak it.)
require 'open-uri'
File.open("/my/local/path/sample.flv", "wb") do |saved_file|
# the following "open" is provided by open-uri
open("http://somedomain.net/flv/sample/sample.flv", "rb") do |read_file|
saved_file.write(read_file.read)
end
end
open("http://somedomain.net/flv/sample/sample.flv", 'rb')
will open the URL in binary mode. –
Kalidasa HTTP
=> HTTPS
redirection, and found out how to solve it using open_uri_redirections
Gem –
Choke :content_length_proc
and :progress_proc
as well, though. (ruby-doc.org/stdlib-2.2.2/libdoc/open-uri/rdoc/OpenURI/…) –
Stadium open
with a new ability that the calling code might not anticipate. You shouldn't be trusting user input passed to open
anyway, but you need to be doubly careful now. –
Holms Here is my Ruby http to file using open(name, *rest, &block)
.
require "open-uri"
require "fileutils"
def download(url, path)
case io = open(url)
when StringIO then File.open(path, 'w') { |f| f.write(io.read) }
when Tempfile then io.close; FileUtils.mv(io.path, path)
end
end
The main advantage here it is concise and simple, because open
does much of the heavy lifting. And it does not read the whole response in memory.
The open
method will stream responses > 1kb to a Tempfile
. We can exploit this knowledge to implement this lean download to file method.
See the OpenURI::Buffer
implementation here.
Please be careful with user provided input!
open(name, *rest, &block)
is unsafe if name
is coming from user input!
Use OpenURI::open_uri
to avoid reading files from disk:
...
case io = OpenURI::open_uri(url)
...
"w"
. Will it work on Windows or better put "wb"
instead? –
Anselmo open
actually does not read the response in memory, it reads it into a temporary file for any responses > 10240 bytes. So you were kind-a-right but not. The revised answer cleans up this misunderstanding and hopefully serves as a great example on the power of Ruby :) –
Halfsole EACCES: permission denied
error when changing the filename with mv
command its because you have to close the file first. Suggest changing that part to Tempfile then io.close;
–
Schwaben io.read
to the StringIO case: when StringIO then File.open(path, 'w') { |f| f.write(io.read) }
. Cheers. –
Archaean Example 3 in the Ruby's net/http documentation shows how to download a document over HTTP, and to output the file instead of just loading it into memory, substitute puts with a binary write to a file, e.g. as shown in Dejw's answer.
More complex cases are shown further down in the same document.
Following solutions will first read the whole content to memory before writing it to disc (for more i/o efficient solutions look at the other answers).
You can use open-uri, which is a one liner
require 'open-uri'
content = open('http://example.com').read
Or by using net/http
require 'net/http'
File.write("file_name", Net::HTTP.get(URI.parse("http://url.com")))
url
and file
, respectively), using open-uri
as in the first: File.write(file, open(url).read)
... Dead simple, for the trivial download case. –
Aim Expanding on Dejw's answer (edit2):
File.open(filename,'w'){ |f|
uri = URI.parse(url)
Net::HTTP.start(uri.host,uri.port){ |http|
http.request_get(uri.path){ |res|
res.read_body{ |seg|
f << seg
#hack -- adjust to suit:
sleep 0.005
}
}
}
}
where filename
and url
are strings.
The sleep
command is a hack that can dramatically reduce CPU usage when the network is the limiting factor. Net::HTTP doesn't wait for the buffer (16kB in v1.9.2) to fill before yielding, so the CPU busies itself moving small chunks around. Sleeping for a moment gives the buffer a chance to fill between writes, and CPU usage is comparable to a curl solution, 4-5x difference in my application. A more robust solution might examine progress of f.pos
and adjust the timeout to target, say, 95% of the buffer size -- in fact that's how I got the 0.005 number in my example.
Sorry, but I don't know a more elegant way of having Ruby wait for the buffer to fill.
Edit:
This is a version that automatically adjusts itself to keep the buffer just at or below capacity. It's an inelegant solution, but it seems to be just as fast, and to use as little CPU time, as it's calling out to curl.
It works in three stages. A brief learning period with a deliberately long sleep time establishes the size of a full buffer. The drop period reduces the sleep time quickly with each iteration, by multiplying it by a larger factor, until it finds an under-filled buffer. Then, during the normal period, it adjusts up and down by a smaller factor.
My Ruby's a little rusty, so I'm sure this can be improved upon. First of all, there's no error handling. Also, maybe it could be separated into an object, away from the downloading itself, so that you'd just call autosleep.sleep(f.pos)
in your loop? Even better, Net::HTTP could be changed to wait for a full buffer before yielding :-)
def http_to_file(filename,url,opt={})
opt = {
:init_pause => 0.1, #start by waiting this long each time
# it's deliberately long so we can see
# what a full buffer looks like
:learn_period => 0.3, #keep the initial pause for at least this many seconds
:drop => 1.5, #fast reducing factor to find roughly optimized pause time
:adjust => 1.05 #during the normal period, adjust up or down by this factor
}.merge(opt)
pause = opt[:init_pause]
learn = 1 + (opt[:learn_period]/pause).to_i
drop_period = true
delta = 0
max_delta = 0
last_pos = 0
File.open(filename,'w'){ |f|
uri = URI.parse(url)
Net::HTTP.start(uri.host,uri.port){ |http|
http.request_get(uri.path){ |res|
res.read_body{ |seg|
f << seg
delta = f.pos - last_pos
last_pos += delta
if delta > max_delta then max_delta = delta end
if learn <= 0 then
learn -= 1
elsif delta == max_delta then
if drop_period then
pause /= opt[:drop_factor]
else
pause /= opt[:adjust]
end
elsif delta < max_delta then
drop_period = false
pause *= opt[:adjust]
end
sleep(pause)
}
}
}
}
end
There are more api-friendly libraries than Net::HTTP
, for example httparty:
require "httparty"
File.open("/tmp/my_file.flv", "wb") do |f|
f.write HTTParty.get("http://somedomain.net/flv/sample/sample.flv").parsed_response
end
I had problems, if the file contained German Umlauts (ä,ö,ü). I could solve the problem by using:
ec = Encoding::Converter.new('iso-8859-1', 'utf-8')
...
f << ec.convert(seg)
...
if you looking for a way how to download temporary file, do stuff and delete it try this gem https://github.com/equivalent/pull_tempfile
require 'pull_tempfile'
PullTempfile.transaction(url: 'https://mycompany.org/stupid-csv-report.csv', original_filename: 'dont-care.csv') do |tmp_file|
CSV.foreach(tmp_file.path) do |row|
# ....
end
end
© 2022 - 2024 — McMap. All rights reserved.
resp.body
part is confusing me I thought it would save only 'body' part of the response but I want to save whole/binary file. I also found that rio.rubyforge.org could be helpful. Moreover with my question nobody can say that such question was not answered yet :-) – Canguehttp.get('...')
call sends a request and receives response (the whole file). To download a file in chunks and save it simultaneously see my edited answer below ;-) Resuming is not easy, maybe You count bytes You saved and then skip them when You redownload the file (file.write(resp.body)
returns the number of bytes written). – Bewitch