ActiveRecord Query Union
Asked Answered
G

17

109

I've written a couple of complex queries (at least to me) with Ruby on Rail's query interface:

watched_news_posts = Post.joins(:news => :watched).where(:watched => {:user_id => id})
watched_topic_posts = Post.joins(:post_topic_relationships => {:topic => :watched}).where(:watched => {:user_id => id})

Both of these queries work fine by themselves. Both return Post objects. I would like to combine these posts into a single ActiveRelation. Since there could be hundreds of thousands of posts at some point, this needs to be done at the database level. If it were a MySQL query, I could simply user the UNION operator. Does anybody know if I can do something similar with RoR's query interface?

Grubb answered 13/7, 2011 at 23:34 Comment(6)
You should be able to use scope. Create 2 scopes and then call them both like Post.watched_news_posts.watched_topic_posts. You may need to send in params to the scopes for things like the :user_id and :topic.Prodrome
Thanks for the suggestion. According to the docs, "A scope represents a narrowing of a database query". In my case, I'm not looking for posts that are in both watched_news_posts and watched_topic_posts. Rather, I'm looking for posts that are in watched_news_posts or watched_topic_posts, with no duplicates allowed. Is this still possible to accomplish with scopes?Grubb
Not really possible out-of-the-box. There is a plugin on github called union but it uses old-school syntax (class method and hash-style query params), if that's cool with you I'd say go with it... otherwise write it out the long way in a find_by_sql in your scope.Mews
I agree with jenjenut233, and I would think you could do something like find_by_sql("#{watched_news_posts.to_sql} UNION #{watched_topic_posts.to_sql}"). I haven't tested that out, so let me know how it goes if you try it. Also, there's probably some ARel functionality that would work.Holmen
Well I rewrote the queries as SQL queries. They work now, but unfortunately find_by_sql can't be used with other chainable queries, which means I now have to rewrite my will_paginate filters and queries as well. Why doesn't ActiveRecord support a union operation?Grubb
You should be able to accomplish this with a left outer joinDeport
L
109

Here's a quick little module I wrote that allows you to UNION multiple scopes. It also returns the results as an instance of ActiveRecord::Relation.

module ActiveRecord::UnionScope
  def self.included(base)
    base.send :extend, ClassMethods
  end

  module ClassMethods
    def union_scope(*scopes)
      id_column = "#{table_name}.id"
      sub_query = scopes.map { |s| s.select(id_column).to_sql }.join(" UNION ")
      where "#{id_column} IN (#{sub_query})"
    end
  end
end

Here's the gist: https://gist.github.com/tlowrimore/5162327

Edit:

As requested, here's an example of how UnionScope works:

class Property < ActiveRecord::Base
  include ActiveRecord::UnionScope

  # some silly, contrived scopes
  scope :active_nearby,     -> { where(active: true).where('distance <= 25') }
  scope :inactive_distant,  -> { where(active: false).where('distance >= 200') }

  # A union of the aforementioned scopes
  scope :active_near_and_inactive_distant, -> { union_scope(active_nearby, inactive_distant) }
end
Leathaleather answered 14/3, 2013 at 15:35 Comment(7)
This is really a way more complete answer the other others listed above. Works great!Bailar
Example of usage would be nice.Broughton
As requested, I've added an example.Leathaleather
I get an exception with SQLite: only a single result allowed for a SELECT that is part of an expression.Halliehallman
@SarahVessels That error seems to imply that the nested select statements are querying more than one column; however, the union_scope definition explicitly selects the id column and nothing more. I'd be curious to see what the resulting SQL is for your scenario. Have you tried calling to_sql on your scope to see what the generated SQL looks like?Leathaleather
The solution is "almost" correct and I gave it a +1, but I ran into a problem that I fixed here: gist.github.com/lsiden/260167a4d3574a580d97Hemidemisemiquaver
Quick warning: this method is highly problematic from a performance perspective with MySQL, since the subquery will be counted as dependent and executed for each record in the table (see percona.com/blog/2010/10/25/mysql-limitations-part-3-subqueries).Pacificism
R
87

I also have encountered this problem, and now my go-to strategy is to generate SQL (by hand or using to_sql on an existing scope) and then stick it in the from clause. I can't guarantee it's any more efficient than your accepted method, but it's relatively easy on the eyes and gives you a normal ARel object back.

watched_news_posts = Post.joins(:news => :watched).where(:watched => {:user_id => id})
watched_topic_posts = Post.joins(:post_topic_relationships => {:topic => :watched}).where(:watched => {:user_id => id})

Post.from("(#{watched_news_posts.to_sql} UNION #{watched_topic_posts.to_sql}) AS posts")

You can do this with two different models as well, but you need to make sure they both "look the same" inside the UNION -- you can use select on both queries to make sure they will produce the same columns.

topics = Topic.select('user_id AS author_id, description AS body, created_at')
comments = Comment.select('author_id, body, created_at')

Comment.from("(#{comments.to_sql} UNION #{topics.to_sql}) AS comments")
Rub answered 1/6, 2013 at 3:19 Comment(4)
suppose if we have two different model then please let me know what will be the query for unoin.Meaning
Very helpful answer. For future readers, remember the final "AS comments" portion because activerecord constructs the query as 'SELECT "comments"."*" FROM"... if you don't specify the name of the unioned set OR specify a different name like "AS foo", the final sql execution will fail.Tumult
This was exactly what I was looking for. I extended ActiveRecord::Relation to support #or in my Rails 4 project. Assuming same model: klass.from("(#{to_sql} union #{other_relation.to_sql}) as #{table_name}")Nad
Would be possible to have a third model (without a real table) to work like an uniform "interface" for the first two models? Eg. Wine and Soda, and a thrid model called Drink, which will union the first two for query only purpose.Blois
G
11

Based on Olives' answer, I did come up with another solution to this problem. It feels a little bit like a hack, but it returns an instance of ActiveRelation, which is what I was after in the first place.

Post.where('posts.id IN 
      (
        SELECT post_topic_relationships.post_id FROM post_topic_relationships
          INNER JOIN "watched" ON "watched"."watched_item_id" = "post_topic_relationships"."topic_id" AND "watched"."watched_item_type" = "Topic" WHERE "watched"."user_id" = ?
      )
      OR posts.id IN
      (
        SELECT "posts"."id" FROM "posts" INNER JOIN "news" ON "news"."id" = "posts"."news_id" 
        INNER JOIN "watched" ON "watched"."watched_item_id" = "news"."id" AND "watched"."watched_item_type" = "News" WHERE "watched"."user_id" = ?
      )', id, id)

I'd still appreciate it if anybody has any suggestions to optimize this or improve the performance, because it's essentially executing three queries and feels a little redundant.

Grubb answered 18/7, 2011 at 23:49 Comment(1)
How could I do the same thing with this: gist.github.com/2241307 So that it creates an AR::Relation class rather than an Array class?Hoogh
B
11

You could also use Brian Hempel's active_record_union gem that extends ActiveRecord with an union method for scopes.

Your query would be like this:

Post.joins(:news => :watched).
  where(:watched => {:user_id => id}).
  union(Post.joins(:post_topic_relationships => {:topic => :watched}
    .where(:watched => {:user_id => id}))

Hopefully this will be eventually merged into ActiveRecord some day.

Breakfront answered 1/3, 2015 at 18:5 Comment(0)
L
6

Could you use an OR instead of a UNION?

Then you could do something like:

Post.joins(:news => :watched, :post_topic_relationships => {:topic => :watched})
.where("watched.user_id = :id OR topic_watched.user_id = :id", :id => id)

(Since you are joins the watched table twice I'm not too sure what the names of the tables will be for the query)

Since there are a lot of joins, it might also be quite heavy on the database, but it might be able to be optimized.

Lasandralasater answered 15/7, 2011 at 17:51 Comment(2)
Sorry to get back to you so late, but I've been on vacation for the last couple days. The problem I had when I tried your answer was the joins method was causing both tables to be joined, rather than two separate queries which could then be compared. However, your idea was sound and did give me another idea. Thanks for the help.Grubb
select using OR is slower than than UNION, wondering any solution for UNION insteadRecalesce
P
6

Arguably, this improves readability, but not necessarily performance:

def my_posts
  Post.where <<-SQL, self.id, self.id
    posts.id IN 
    (SELECT post_topic_relationships.post_id FROM post_topic_relationships
    INNER JOIN watched ON watched.watched_item_id = post_topic_relationships.topic_id 
    AND watched.watched_item_type = "Topic" 
    AND watched.user_id = ?
    UNION
    SELECT posts.id FROM posts 
    INNER JOIN news ON news.id = posts.news_id 
    INNER JOIN watched ON watched.watched_item_id = news.id 
    AND watched.watched_item_type = "News" 
    AND watched.user_id = ?)
  SQL
end

This method returns an ActiveRecord::Relation, so you could call it like this:

my_posts.order("watched_item_type, post.id DESC")
Pentheas answered 10/8, 2011 at 5:43 Comment(4)
where are you getting posts.id from?Udometer
There are two self.id parameters because self.id is referenced twice in the SQL -- see the two question marks.Pentheas
This was a useful example of how to do a a UNION query and get back an ActiveRecord::Relation. Thanks.Catenate
do you have a tool to generate these types of SDL queries - how did you do it without spelling mistakes etc.?Converge
S
6

How about...

def union(scope1, scope2)
  ids = scope1.pluck(:id) + scope2.pluck(:id)
  where(id: ids.uniq)
end
Sinotibetan answered 12/3, 2014 at 1:17 Comment(5)
Be warned that this will perform three queries rather than one, as each pluck call is a query in itself.Herringbone
This is a really good solution, becouse it doesn't return an array, so then you can use .order or .paginate methods... It keeps the orm classesSyzygy
Useful if the scopes are of the same model, but this would generate two queries because of the plucks.Oasis
Likewise this worked great for me, much simpler than the other approaches, and I can chain more scopes on!Foxtail
how pagination will work in this?Costate
C
3

There is an active_record_union gem. Might be helpful

https://github.com/brianhempel/active_record_union

With ActiveRecordUnion, we can do:

the current user's (draft) posts and all published posts from anyone current_user.posts.union(Post.published) Which is equivalent to the following SQL:

SELECT "posts".* FROM (
  SELECT "posts".* FROM "posts"  WHERE "posts"."user_id" = 1
  UNION
  SELECT "posts".* FROM "posts"  WHERE (published_at < '2014-07-19 16:04:21.918366')
) posts
Cowrie answered 28/1, 2016 at 18:13 Comment(0)
F
2

If you don't want to use SQL syntax inside your code, here's solution with arel

watched_news_posts = Post.joins(:news => :watched).where(:watched => {:user_id => id}).arel
watched_topic_posts = Post.joins(:post_topic_relationships => {:topic => :watched}).where(:watched => {:user_id => id}).arel

results = Arel::Nodes::Union.new(watched_news_posts, watched_topic_posts)
from(Post.arel_table.create_table_alias(results, :posts))
Fallible answered 28/11, 2022 at 15:41 Comment(0)
S
1

In a similar case I summed two arrays and used Kaminari:paginate_array(). Very nice and working solution. I was unable to use where(), because I need to sum two results with different order() on the same table.

Syst answered 11/1, 2013 at 14:31 Comment(0)
L
1

Heres how I joined SQL queries using UNION on my own ruby on rails application.

You can use the below as inspiration on your own code.

class Preference < ApplicationRecord
  scope :for, ->(object) { where(preferenceable: object) }
end

Below is the UNION where i joined the scopes together.

  def zone_preferences
    zone = Zone.find params[:zone_id]
    zone_sql = Preference.for(zone).to_sql
    region_sql = Preference.for(zone.region).to_sql
    operator_sql = Preference.for(Operator.current).to_sql

    Preference.from("(#{zone_sql} UNION #{region_sql} UNION #{operator_sql}) AS preferences")
  end
Labellum answered 29/5, 2019 at 6:20 Comment(0)
T
1

Less problems and easier to follow:

    def union_scope(*scopes)
      scopes[1..-1].inject(where(id: scopes.first)) { |all, scope| all.or(where(id: scope)) }
    end

So in the end:

union_scope(watched_news_posts, watched_topic_posts)
Tufthunter answered 17/12, 2019 at 12:40 Comment(1)
I changed it slightly to: scopes.drop(1).reduce(where(id: scopes.first)) { |query, scope| query.or(where(id: scope)) } Thx!Sleight
S
1
gem 'active_record_extended' 

Also has a set of union helpers among many others.

Squamulose answered 13/7, 2022 at 6:59 Comment(0)
T
0

I would just run the two queries you need and combine the arrays of records that are returned:

@posts = watched_news_posts + watched_topics_posts

Or, at the least test it out. Do you think the array combination in ruby will be far too slow? Looking at the suggested queries to get around the problem, I'm not convinced that there will be that significant of a performance difference.

Theis answered 9/8, 2012 at 5:16 Comment(4)
Actually doing @posts=watched_news_posts & watched_topics_posts might be better as it is an intersection and will avoid dupes.Theis
I was under the impression ActiveRelation loads its records lazily. Wouldn't you lose that if you intersected the arrays in Ruby?Grubb
Apparently a union which returns a relation is under dev in rails, but I don't know what version it will be in.Theis
this return array instead, its two different query results merge.Cartan
M
0

Elliot Nelson answered good, except the case where some of the relations are empty. I would do something like that:

def union_2_relations(relation1,relation2)
sql = ""
if relation1.any? && relation2.any?
  sql = "(#{relation1.to_sql}) UNION (#{relation2.to_sql}) as #{relation1.klass.table_name}"
elsif relation1.any?
  sql = relation1.to_sql
elsif relation2.any?
  sql = relation2.to_sql
end
relation1.klass.from(sql)

end

Mineralogist answered 21/2, 2018 at 11:56 Comment(0)
S
0

When we add UNION to the scopes, it breaks at time due to order_by clause added before the UNION.

So I changed it in a way to give it a UNION effect.

module UnionScope
  def self.included(base)
    base.send(:extend, ClassMethods)
  end

  module ClassMethods
    def union_scope(*scopes)
      id_column = "#{table_name}.id"
      sub_query = scopes.map { |s| s.pluck(:id) }.flatten
      where("#{id_column} IN (?)", sub_query)
    end
  end
end

And then use it like this in any model

class Model
  include UnionScope
  scope :union_of_scopeA_scopeB, -> { union_scope(scopeA, scopeB) }
end
Spreadeagle answered 8/2, 2022 at 12:2 Comment(0)
I
0

Tim's answer is great. It uses the ids of the scopes in the WHERE clause. As shosti reports, this method is problematic in terms of performance because all ids need to be generated during query execution. This is why, I prefer joeyk16 answer. Here a generalized module:

module ActiveRecord::UnionScope
  def self.included(base)
    base.send :extend, ClassMethods
  end

  module ClassMethods
    def self.union(*scopes)
      self.from("(#{scopes.map(&:to_sql).join(' UNION ')}) AS #{self.table_name}")
    end 
  end
end
Inly answered 7/9, 2022 at 7:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.