How do I force a Rails query to return potentially two models per result?
Asked Answered
F

6

9

I'm using Rails 5. I have this in my controller model for loading a certain model subject to criteria ...

  @results = MyObjectTime.joins(:my_object,
                            "LEFT JOIN user_my_object_time_matches on my_object_times.id = user_my_object_time_matches.my_object_time_id #{additional_left_join_clause}")
                     .where( search_criteria.join(' and '), *search_values )
                     .limit(1000) 
                     .order("my_objects.day DESC, my_objects.name")
                     .paginate(:page => params[:page])

My question is, how do I rewrite the above so that the @results array contains both the "MyObjectTime" and any potential "UserMyObjectTimeMatch" it is linked to?

Futhark answered 21/2, 2017 at 3:27 Comment(4)
Your @results variable will contain an enumerable of MyObjectTime instances. There is not an object that is an instance of MyObjectTime and any associated instances of UserMyObjectTimeMatch. The performance problem of issuing 1000 + 1 queries is solved by eager loading the association, using #includes(:user_my_object_time_matches)Cantrell
Will "#includes(:user_my_object_time_matches)" load all associations eagerly? That's not what I want and its not what my query does. THe query only loads one association per MyObjectTime object, which is the one I would want to load eagerly.Futhark
Can you help with some example...that can help me to help you further :).ThanksIrinairis
Can you provide the code for your ActiveRecord models? Most likely, creating the correct combination of associations and scopes will solve your problem and make your code much more readable. Scopes on Associations: guides.rubyonrails.org/… Scopes: api.rubyonrails.org/classes/ActiveRecord/Scoping/Named/…Jostle
T
3

I would recommend to search for an alternative, but with the information that you provided and looking to avoid the potential 10001 queries mentioned in your comment, if you have the has_many 'user_my_object_time_matches' setup you could do:

@results = MyObjectTime.joins(:my_object,
                            "LEFT JOIN user_my_object_time_matches on my_object_times.id = user_my_object_time_matches.my_object_time_id #{additional_left_join_clause}")
        .where( search_criteria.join(' and '), *search_values )
        .limit(1000) 
        .order("my_objects.day DESC, my_objects.name")
        .paginate(:page => params[:page])
        .includes(:user_my_object_time_matches)
        .map{|t| [t, t.user_my_object_time_matches]}
Tarazi answered 3/3, 2017 at 6:17 Comment(1)
FYi this line doesn't compile -- ".map|t| [t, t.user_my_object_time_matches]}". Why do I need the "map" line?Futhark
R
3

You cannot. At least not using ActiveRecord or Rails' default interface. ActiveRecord query methods are designed in such a way that they will only return the objects of calling Model.

For example, If you query like

MyObjectTime.joins(:time).where(parent_id: 5)

it'll return the objects for MyObjectTime only. However, because of the join, the records from association time are might also be fetched, only not returned. So, you can take advantage of it. Especially when you use includes in place of joins, the associated models are fetched and you can use them via reference of the associating record/object.

Explanation to build a result pair

This can be done easily by creating a hash with required results.

For example, consider a model Mark that has answer_sheet association.

You can fetch the marks with :answer_sheet using includes this way. I'm fetching 20 in the example.

marks = Mark.limit(20).includes(:answer_sheet);

This fetches answer_sheet which can be retrieved via mark, So, build a hash this way

h = {}
marks.each do |mark|
  h[mark.id] = {}
  h[mark.id][:mark] = mark
  h[mark.id][:answer_sheet] = mark.answer_sheet
end

Now, your hash has the mark and answer_sheet object ready via mark.id key.

This will only execute at most two queries at first fetch and the iteration doesn't won't trigger any further queries. In my system the only two required queries are (with using includes)

SELECT  "marks".* FROM "marks" LIMIT 20
  AnswerSheet Load (0.9ms)  SELECT "answer_sheets".* FROM "answer_sheets" WHERE "answer_sheets"."mark_id" IN (877, 3035, 3036, 878, 879, 880, 881, 561, 882, 883, 884, 885, 886, 887, 888, 889, 890, 891, 892, 893)

You can even use the mark object itself as the key. Then the building process become more simple

h = {}
marks.each do |mark|
  h[mark] = mark.answer_sheet
end

Now, whenever you wanted to access the answer_sheet associated with mark, you'll just need to use h[mark] to fetch it.

Rozella answered 21/2, 2017 at 18:43 Comment(10)
Its not a requirement to use ActiveRecord, I just included it in my code because I couldn't think of any other way to do it. If there is another way that doesn't involve ActiveRecord, that's fine by me.Futhark
Rails' query interface is built upon activeRecord and that doesn't support your requirement. But as I said, You can use the result to build pairsRozella
Regarding, "You can use the result to build pairs", I dont' undersatnd how to do this without generating the same amount of queries taht I was previously generating.Futhark
@Futhark I've tried to add an example. The iteration won't generate queries with includesRozella
The problem with the second query is that it uses an IN clause. Most(if not all) relational databases have a limit on the number of values that can be used in the IN clause.Nowicki
@Nowicki Where is IN?Rozella
See the output query of AnswerSheet Load (0.9ms) ... in your postNowicki
@Nowicki Thanks. I don't know specifically about the postgres limit, do you? Also, you can use eager_load instead which will create a complex query to load all association.Rozella
@Nowicki I found this answer https://mcmap.net/q/125324/-postgresql-max-number-of-parameters-in-quot-in-quot-clause and it seems postgres doesn't enforce a limitRozella
@Anwar: Fair enough, good to know! The database in the question was never specified, so was just throwing it out there because I know oracle, SQL server and do have limits. I think MySQL is limited by the max_allowed_packet value, which can be variable.Nowicki
T
3

I would recommend to search for an alternative, but with the information that you provided and looking to avoid the potential 10001 queries mentioned in your comment, if you have the has_many 'user_my_object_time_matches' setup you could do:

@results = MyObjectTime.joins(:my_object,
                            "LEFT JOIN user_my_object_time_matches on my_object_times.id = user_my_object_time_matches.my_object_time_id #{additional_left_join_clause}")
        .where( search_criteria.join(' and '), *search_values )
        .limit(1000) 
        .order("my_objects.day DESC, my_objects.name")
        .paginate(:page => params[:page])
        .includes(:user_my_object_time_matches)
        .map{|t| [t, t.user_my_object_time_matches]}
Tarazi answered 3/3, 2017 at 6:17 Comment(1)
FYi this line doesn't compile -- ".map|t| [t, t.user_my_object_time_matches]}". Why do I need the "map" line?Futhark
N
2

You could just execute a raw SQL query using the ActiveRecord connection that allows you to include columns from as many tables as you want. You can get everything in one query. You need to make sure to alias ambiguous column names(i.e. like I did for the name column in my example)

I don't know what your models look like, but here is a simple parent/sibling example to demonstrate:

Create the Models and migration

# testmigration.rb
class Testtables < ActiveRecord::Migration[5.0]
  def change
    create_table :parents do |t|
      t.string :name
      t.timestamps
    end

    create_table :siblings do |t|
      t.string :name
      t.references :parent
      t.timestamps
    end
  end
end

# parent.rb
class Parent < ApplicationRecord
  has_many :siblings
end

# sibling.rb
class Sibling < ApplicationRecord
end

Create test data

> rails c
> Parent.new(name: "Parent A").save!
> Parent.new(name: "Parent B").save!
> Sibling.new(name: "Sibling 1 - Parent A", parent_id: 1).save!
> Sibling.new(name: "Sibling 2 - Parent A", parent_id: 1).save!
> Sibling.new(name: "Sibling 1 - Parent B", parent_id: 2).save!
> Sibling.new(name: "Sibling 2 - Parent B", parent_id: 2).save!
> Sibling.new(name: "Sibling 3 - Parent B", parent_id: 2).save!

Run the custom query, which includes columns from both Parent and Sibling models(name and created_at)

> sql_query = "SELECT p.name as parent_name, p.created_at as parent_created, s.name as sibling_name, s.created_at as sibling_created FROM public.parents p INNER JOIN public.siblings s on s.parent_id = p.id;"
> result = ActiveRecord::Base.connection.execute(sql_query)

Inspect the results

> result[0]['parent_name']
  => "Parent A" 
> result[0]['sibling_name']
  => "Sibling 1 - Parent A"
> result[1]['parent_created']
  => "2017-03-04 18:31:54.661714"
Nowicki answered 4/3, 2017 at 16:53 Comment(4)
Do the result have both the models? It would be helpful if you could include an exampleRozella
Updated answer with exampleNowicki
Nice! But why do you think it's better doing it with AR? accessing them with AR is still easier!Rozella
I don't. This is not "the rails way" but simply pointing this out as an alternative, in cases where you need to run a custom query, and ActiveRecord might not meet the needs.Nowicki
J
2

Without full information about the models or the database, I'm making a few assumptions below. Most of the time, with the proper associations and/or scopes in your AR models, you can get by without writing any raw sql:

class MyObjectTime < ApplicationRecord
  has_many :my_objects, ->(args){ where(args) }

  scope :top_1000, ->{ limit(1000) }
  scope :order_by_my_objects, ->{ order(my_objects: { day: :desc, name: :asc }) }
end

class UserMyObjectTimeMatches < ApplicationRecord
  belongs_to :my_object_time
end

MyObjectTime.my_objects(params[:search_args])
  .order_by_my_objects.top_1000
  .include(:my_object_time).paginate(page: params[:page])

If I had the full code, I could setup the models and test - so this code is not tested and likely will need to be tweaked.

Jostle answered 6/3, 2017 at 23:26 Comment(0)
K
1

You can use eagerloading here instead of merging whole result in one array like:

@results = MyObjectTime.joins(:my_object,
                            "LEFT JOIN user_my_object_time_matches on my_object_times.id = user_my_object_time_matches.my_object_time_id #{additional_left_join_clause}")
                     .where( search_criteria.join(' and '), *search_values )
                     .limit(1000) 
                     .order("my_objects.day DESC, my_objects.name")
                     .paginate(:page => params[:page]).includes(:user_my_object_time_matches)

Once you will use includes it doesn't fire extra queries on it.

@first_my_object_time = @results.first
@user_my_object_time_matches = @first_my_object_time.user_my_object_time_matches

If you want it in same array you can select directly from sql by using ActiveRecord Select Method as:

@results = MyObjectTime.joins(:my_object,
                        "LEFT JOIN user_my_object_time_matches on my_object_times.id = user_my_object_time_matches.my_object_time_id #{additional_left_join_clause}")
                 .where( search_criteria.join(' and '), *search_values )
                 .limit(1000) 
                 .order("my_objects.day DESC, my_objects.name")
                 .paginate(:page => params[:page]).select("my_object_time.*, user_my_object_time_matches.*").as_json
Klotz answered 6/3, 2017 at 5:31 Comment(0)
B
0

Cannot comment so gonna ask this way.

The question is, why would you ever wanna do that?

Why exactly do you think using relation would not do for you?

# I already have some results pulled the way you did and I wanna use them
  @results.each do |r|
    self.foo(r, r.user_my_object_time_match);
  end

#or if I really wanted a double array, then I could do
  @results = MyObjectTime.<clauses>().collect { |mot| [mot, mot.user_my_object_time_match] }

If these don't work for you, then please specify what the problem is, because at this moment this question looks like it has a XY problem (https://meta.stackoverflow.com/tags/xy-problem/info)

Barmecidal answered 21/2, 2017 at 9:7 Comment(3)
The problem with the two approaches you have is a new SQL query is generated for each iteration of the loop. So if @results has 10000 rows, what you're doing causes 10001 queries against the db and I would prefer to limit that just to one and get all the data at once.Futhark
That makes sense, though I don't think there is a standardized way to do this. You could of course fill the models on your own, with the data you get from the query you have already written, but I am not really sure if it would be more efficient.Barmecidal
When you say "the query you have already written," do you mean the "@results = " line I included in my question?Futhark

© 2022 - 2024 — McMap. All rights reserved.