Rails raw query for csv format, to be returned via controller
Asked Answered
G

2

19

I was using active record to get my stories and then generate a CSV, the standard way as done in the rails cast. But I have a lot of rows and it takes minutes. I think if I could get posgresql to do the csv rendering, then I could save some time.

Heres what I have right now:

query = "COPY stories TO STDOUT WITH CSV HEADER;"
results = ActiveRecord::Base.connection.execute(query);

But the results are empty for this query:

 => #<PG::Result:0x00000006ea0488 @connection=#<PG::Connection:0x00000006c62fb8 @socket_io=nil, @notice_receiver=nil, @notice_processor=nil>> 
2.0.0-p247 :053 > result.count
 => 0 

A better way of knowing:

2.0.0-p247 :059 >   result.to_json
 => "[]" 

I suspect my controller will look something like this:

format.csv { send_data raw_results }

This works for normal queries, I just can't figure out the SQL syntax to have the CSV results returned to rails.

UPDATE

Got the CSV export from 120000 msec down to 290 msec

My model:

def self.to_csv(story_ids)

    csv  = []
    conn = ActiveRecord::Base.connection.raw_connection
    conn.copy_data("COPY (SELECT * FROM stories WHERE stories.id IN (#{story_ids.join(',')})) TO STDOUT WITH (FORMAT CSV, HEADER TRUE, FORCE_QUOTE *, ESCAPE E'\\\\');") do
      while row = conn.get_copy_data
        csv.push(row)
      end
    end
    csv.join("\r\n")
  end

My controller:

send_data Story.to_csv(Story.order(:created_at).pluck(:id))
Garman answered 14/1, 2014 at 23:35 Comment(5)
Is there any way to send_data straight from DB? I mean, without saving it to the csv Array?Boylston
@FernandoFabreti Sounds like the copy_data function returns rows which need to be joined together into one file. I don't think theres any way of combining the rows without some sort of variable assignment. You could probably use a string from the beginning instead and append in the loop. Would be interested in the performance diff.Garman
I had to change csv.join("\r\n") to csv.join("\n") to get it to produce the rows correctly. It was originally adding an extra newline. Not sure if this will affect other non-*nix machines...Shih
@Garman Worked like a charm for me too, thanks to your update! Two quick questions, though: 1. What about scenarios when a row is generated by multiple complex AR queries involving associations? How do we then generate a single SQL query and pass it in the above example? 2. While it surely impacts performance in terms of time, does it also impact memory used by the operation?Kauslick
@FernandoFabreti I ended up wrapping the answer into Enumerator that I pass to self.response_body as used here. The linked example is incomplete and needs lines << "#{row.length.to_s(16)}\r\n" before yielding a row for a chunked response to work.Cloth
S
17

AFAIK you need to use the copy_data method on the underlying PostgreSQL database connection for this:

- (Object) copy_data(sql)

call-seq:

conn.copy_data( sql ) {|sql_result| ... } -> PG::Result

Execute a copy process for transferring [sic] data to or from the server.

This issues the SQL COPY command via #exec. The response to this (if there is no error in the command) is a PG::Result object that is passed to the block, bearing a status code of PGRES_COPY_OUT or PGRES_COPY_IN (depending on the specified copy direction). The application should then use #put_copy_data or #get_copy_data to receive or transmit data rows and should return from the block when finished.

And there's even an example:

conn.copy_data "COPY my_table TO STDOUT CSV" do
  while row=conn.get_copy_data
    p row
  end
end

ActiveRecord's wrapper for the raw database connection doesn't know what copy_data is but you can use raw_connection to unwrap it:

conn = ActiveRecord::Base.connection.raw_connection
csv  = [ ]
conn.copy_data('copy stories to stdout with csv header') do
  while row = conn.get_copy_data
    csv.push(row)
  end
end

That would leave you with an array of CSV strings in csv (one CSV row per array entry) and you could csv.join("\r\n") to get the final CSV data.

Sacks answered 15/1, 2014 at 0:43 Comment(1)
Ended up having to use a different query, one which escapes the data better. conn.copy_data("COPY stories TO STDOUT WITH (FORMAT CSV, HEADER TRUE, FORCE_QUOTE *, ESCAPE E'\\\\');"). Thanks for your help!Garman
C
0

This answer builds up on the answer provided by @mu-is-too-short, but without a temporary object using streaming instead.

headers['X-Accel-Buffering'] = 'no'
headers["Cache-Control"] = 'no-cache'
headers["Transfer-Encoding"] = 'chunked'
headers['Content-Type'] = 'text/csv; charset=utf-8'
headers['Content-Disposition'] = 'inline; filename="data.csv"'
headers.delete('Content-Length')
sql = "SELECT * FROM stories WHERE stories.id IN (#{story_ids.join(',')})"
self.response_body = Enumerator.new do |chunk|
  conn = ActiveRecord::Base.connection.raw_connection
  conn.copy_data("COPY (#{sql.chomp(';')}) TO STDOUT WITH (FORMAT CSV, HEADER TRUE, RCE_QUOTE *, ESCAPE E'\\\\');") do
    while row = conn.get_copy_data
      chunk << "#{row.length.to_s(16)}\r\n"
      chunk << row
      chunk << "\r\n"
    end
    chunk << "0\r\n\r\n"
  end
end

You can also use gz = Zlib::GzipWriter.new(Stream.new(chunk)) and gz.write row with a class akin to

class Stream
  def initialize(block)
    @block = block
  end
  def write(row)
    @block << "#{row.length.to_s(16)}\r\n"
    @block << row
    @block << "\r\n"
  end
end

And remember headers['Content-Encoding'] = 'gzip'. See also this gist.

Cloth answered 25/7, 2018 at 4:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.