Ruby on Rails 3: Streaming data through Rails to client
Asked Answered
G

10

46

I am working on a Ruby on Rails app that communicates with RackSpace cloudfiles (similar to Amazon S3 but lacking some features).

Due to the lack of the availability of per-object access permissions and query string authentication, downloads to users have to be mediated through an application.

In Rails 2.3, it looks like you can dynamically build a response as follows:

# Streams about 180 MB of generated data to the browser.
render :text => proc { |response, output|
  10_000_000.times do |i|
    output.write("This is line #{i}\n")
  end
}

(from http://api.rubyonrails.org/classes/ActionController/Base.html#M000464)

Instead of 10_000_000.times... I could dump my cloudfiles stream generation code in there.

Trouble is, this is the output I get when I attempt to use this technique in Rails 3.

#<Proc:0x000000010989a6e8@/Users/jderiksen/lt/lt-uber/site/app/controllers/prospect_uploads_controller.rb:75>

Looks like maybe the proc object's call method is not being called? Any other ideas?

Germander answered 17/8, 2010 at 22:48 Comment(0)
C
16

It looks like this isn't available in Rails 3

https://rails.lighthouseapp.com/projects/8994/tickets/2546-render-text-proc

This appeared to work for me in my controller:

self.response_body =  proc{ |response, output|
  output.write "Hello world"
}
Ceres answered 4/10, 2010 at 16:1 Comment(0)
A
70

Assign to response_body an object that responds to #each:

class Streamer
  def each
    10_000_000.times do |i|
      yield "This is line #{i}\n"
    end
  end
end

self.response_body = Streamer.new

If you are using 1.9.x or the Backports gem, you can write this more compactly using Enumerator.new:

self.response_body = Enumerator.new do |y|
  10_000_000.times do |i|
    y << "This is line #{i}\n"
  end
end

Note that when and if the data is flushed depends on the Rack handler and underlying server being used. I have confirmed that Mongrel, for instance, will stream the data, but other users have reported that WEBrick, for instance, buffers it until the response is closed. There is no way to force the response to flush.

In Rails 3.0.x, there are several additional gotchas:

  • In development mode, doing things such as accessing model classes from within the enumeration can be problematic due to bad interactions with class reloading. This is an open bug in Rails 3.0.x.
  • A bug in the interaction between Rack and Rails causes #each to be called twice for each request. This is another open bug. You can work around it with the following monkey patch:

    class Rack::Response
      def close
        @body.close if @body.respond_to?(:close)
      end
    end
    

Both problems are fixed in Rails 3.1, where HTTP streaming is a marquee feature.

Note that the other common suggestion, self.response_body = proc {|response, output| ...}, does work in Rails 3.0.x, but has been deprecated (and will no longer actually stream the data) in 3.1. Assigning an object that responds to #each works in all Rails 3 versions.

Artwork answered 1/12, 2010 at 1:17 Comment(7)
invaluable response, thank you. Used it to implement streaming templates for csv file: github.com/fawce/csv_builderPessimist
Thanks so much. Why are these methods deprecated and there is no official way of streaming data?!Hearse
unfortunately this solution is not working for me. I started a new discussion here linkNephew
John, we run into memory issue with the code above. If we stream large amount of the data, it seems that it will consume large RAM and never release. We are running under Passenger 3.0.19. Do you have this issue?Brasca
From your description it sounds like Passenger may be buffering the response in memory rather than streaming it to the client. I haven't used it so I can't say whether that's expected behavior or not.Artwork
Works with rails 4.0.0 too!Harold
If it still not work (Rails 3.1.x), try to add the "Last-Modified" header (see response from Exequiel)Cecil
U
24

Thanks to all the posts above, here is fully working code to stream large CSVs. This code:

  1. Does not require any additional gems.
  2. Uses Model.find_each() so as to not bloat memory with all matching objects.
  3. Has been tested on rails 3.2.5, ruby 1.9.3 and heroku using unicorn, with single dyno.
  4. Adds a GC.start at every 500 rows, so as not to blow the heroku dyno's allowed memory.
  5. You may need to adjust the GC.start depending on your Model's memory footprint. I have successfully used this to stream 105K models into a csv of 9.7MB without any problems.

Controller Method:

def csv_export
  respond_to do |format|
    format.csv {
      @filename = "responses-#{Date.today.to_s(:db)}.csv"
      self.response.headers["Content-Type"] ||= 'text/csv'
      self.response.headers["Content-Disposition"] = "attachment; filename=#{@filename}"
      self.response.headers['Last-Modified'] = Time.now.ctime.to_s

      self.response_body = Enumerator.new do |y|
        i = 0
        Model.find_each do |m|
          if i == 0
            y << Model.csv_header.to_csv
          end
          y << sr.csv_array.to_csv
          i = i+1
          GC.start if i%500==0
        end
      end
    }
  end
end

config/unicorn.rb

# Set to 3 instead of 4 as per http://michaelvanrooijen.com/articles/2011/06/01-more-concurrency-on-a-single-heroku-dyno-with-the-new-celadon-cedar-stack/
worker_processes 3

# Change timeout to 120s to allow downloading of large streamed CSVs on slow networks
timeout 120

#Enable streaming
port = ENV["PORT"].to_i
listen port, :tcp_nopush => false

Model.rb

  def self.csv_header
    ["ID", "Route", "username"]
  end

  def csv_array
    [id, route, username]
  end
Unmannered answered 8/7, 2012 at 21:25 Comment(0)
C
16

It looks like this isn't available in Rails 3

https://rails.lighthouseapp.com/projects/8994/tickets/2546-render-text-proc

This appeared to work for me in my controller:

self.response_body =  proc{ |response, output|
  output.write "Hello world"
}
Ceres answered 4/10, 2010 at 16:1 Comment(0)
E
9

In case you are assigning to response_body an object that responds to #each method and it's buffering until the response is closed, try in in action controller:

self.response.headers['Last-Modified'] = Time.now.to_s

Eslinger answered 20/4, 2012 at 20:2 Comment(2)
This was the solution for me! Although, I needed to format the time like so: Time.now.ctime.to_sKatharinekatharsis
I've searched a while to find this response. I don't understand why when you don't specify the header it doesn't stream... anyway, adding this line worked for me. txCecil
G
5

Just for the record, rails >= 3.1 has an easy way to stream data by assigning an object that respond to #each method to the controller's response.

Everything is explained here: http://blog.sparqcode.com/2012/02/04/streaming-data-with-rails-3-1-or-3-2/

Gallinaceous answered 14/3, 2012 at 10:0 Comment(0)
S
2

Yes, response_body is the Rails 3 way of doing this for the moment: https://rails.lighthouseapp.com/projects/8994/tickets/4554-render-text-proc-regression

Schoonover answered 8/10, 2010 at 1:12 Comment(0)
V
2

This solved my problem as well - I have gzip'd CSV files, want to send to the user as unzipped CSV, so I read them a line at a time using a GzipReader.

These lines are also helpful if you're trying to deliver a big file as a download:

self.response.headers["Content-Type"] = "application/octet-stream" self.response.headers["Content-Disposition"] = "attachment; filename=#{filename}"

Viewable answered 30/8, 2011 at 23:28 Comment(0)
B
2

In addition, you will have to set the 'Content-Length' header by your self.

If not, Rack will have to wait (buffering body data into memory) to determine the length. And it will ruin your efforts using the methods described above.

In my case, I could determine the length. In cases you can't, you need to make Rack to start sending body without a 'Content-Length' header. Try to add into config.ru "use Rack::Chunked" after 'require' before the 'run'. (Thanks arkadiy)

Binnings answered 27/6, 2012 at 22:59 Comment(1)
If you do not know the the length you may try to add into config.ru "use Rack::Chunked" after 'require' before the 'run'Houseyhousey
D
1

I commented in the lighthouse ticket, just wanted to say the self.response_body = proc approach worked for me though I needed to use Mongrel instead of WEBrick to succeed.

Martin

Doxy answered 9/11, 2010 at 14:19 Comment(0)
S
1

Applying John's solution along with Exequiel's suggestion worked for me.

The statement

self.response.headers['Last-Modified'] = Time.now.to_s

marks the response as non-cacheable in rack.

After investigating further, I figured one could also use this :

headers['Cache-Control'] = 'no-cache'

This, to me, is just slightly more intuitive. It conveys the message to any1 else who may be reading my code. Also, in case a future version of rack stops checking for Last-Modified , a lot of code may break and it may be a while for folks to figure out why.

Secretion answered 16/1, 2013 at 6:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.