Want to find records with no associated records in Rails
Asked Answered
G

9

229

Consider a simple association...

class Person
   has_many :friends
end

class Friend
   belongs_to :person
end

What is the cleanest way to get all persons that have NO friends in ARel and/or meta_where?

And then what about a has_many :through version

class Person
   has_many :contacts
   has_many :friends, :through => :contacts, :uniq => true
end

class Friend
   has_many :contacts
   has_many :people, :through => :contacts, :uniq => true
end

class Contact
   belongs_to :friend
   belongs_to :person
end

I really don't want to use counter_cache - and I from what I've read it doesn't work with has_many :through

I don't want to pull all the person.friends records and loop through them in Ruby - I want to have a query/scope that I can use with the meta_search gem

I don't mind the performance cost of the queries

And the farther away from actual SQL the better...

Gravamen answered 15/3, 2011 at 23:47 Comment(0)
A
125

This is still pretty close to SQL, but it should get everyone with no friends in the first case:

Person.where('id NOT IN (SELECT DISTINCT(person_id) FROM friends)')
Arnst answered 16/3, 2011 at 0:33 Comment(3)
Just imagine you have 10000000 records in friends table. What about performance in that case?Ourselves
@Ourselves Depending on your duplicate frequency, you can probably drop the DISTINCT. Otherwise, I think you'd want to normalize out the data and index in that case. I might do that by creating a friend_ids hstore or serialized column. Then you could say Person.where(friend_ids: nil)Arnst
If you're going to use sql, it's probably better to use not exists (select person_id from friends where person_id = person.id) (Or perhaps people.id or persons.id, depending on what your table is.) Not sure what the fastest is in a particular situation, but in the past this has worked well for me when I was not trying to use ActiveRecord.Dryly
A
583

Update 4 - Rails 6.1

Thanks to Tim Park for pointing out that in the upcoming 6.1 you can do this:

Person.where.missing(:contacts)

Thanks to the post he linked to too.

Update 3 - Rails 5

Thanks to @Anson for the excellent Rails 5 solution (give him some +1s for his answer below), you can use left_outer_joins to avoid loading the association:

Person.left_outer_joins(:contacts).where(contacts: { id: nil })

I've included it here so people will find it, but he deserves the +1s for this. Great addition!

Update 2

Someone asked about the inverse, friends with no people. As I commented below, this actually made me realize that the last field (above: the :person_id) doesn't actually have to be related to the model you're returning, it just has to be a field in the join table. They're all going to be nil so it can be any of them. This leads to a simpler solution to the above:

Person.includes(:contacts).where(contacts: { id: nil })

And then switching this to return the friends with no people becomes even simpler, you change only the class at the front:

Friend.includes(:contacts).where(contacts: { id: nil })

Update

Got a question about has_one in the comments, so just updating. The trick here is that includes() expects the name of the association but the where expects the name of the table. For a has_one the association will generally be expressed in the singular, so that changes, but the where() part stays as it is. So if a Person only has_one :contact then your statement would be:

Person.includes(:contact).where(contacts: { person_id: nil })

Original

Better:

Person.includes(:friends).where(friends: { person_id: nil })

For the hmt it's basically the same thing, you rely on the fact that a person with no friends will also have no contacts:

Person.includes(:contacts).where(contacts: { person_id: nil })
Anaesthetize answered 6/4, 2011 at 17:5 Comment(25)
You can incorporate this into a scope which would be much cleaner.Warga
Far better answer, not sure why is the other one rated as accepted.Spacious
@Anaesthetize this does not work for a has_one association.Voyeur
Yes it does, just that assuming you have a singular name for your has_one association you need to change the name of the association in the includes call. So assuming it was has_one :contact inside Person then your code would be Person.includes(:contact).where( :contacts => { :person_id => nil } )Anaesthetize
I just found that you cannot use this approach along with pluck, which requires joins instead of includes. But joining will not retrieve the proper records. Any idea?Advance
Using pluck works fine for me. Ie. just adding to the end of the code above does the right thing for a field in PersonAnaesthetize
How would you make this work for the reverse side of the association? Starting with the model where the belongs_to association lives, i.e. find a friend with no person.Nason
Friend.where :person => nilAnaesthetize
Sorry, brain fade. The answer for the case of the through table is the same as from person to friend: Friend.includes(:contacts).where( :contacts => { :friend_id => nil })Anaesthetize
Actually, the truth is that the last field in the join table doesn't need to be related to the model being searched for, it just has to be a field in that table - so actually you can just use id, so the last one becomes: Friend.includes(:contacts).where( :contacts => { :id => nil }) and then to switch that to return Person just change the Friend to PersonAnaesthetize
Can someone explain me how to adapt this in a scope? Right now i have scope :without_company, -> { includes(:company).where( :companies => { :user_id => nil } } and is returning me syntax error, unexpected '}', expecting ')'Cleanser
Because you have an open paren after your where that you never closeAnaesthetize
What about if you want to add a condition? E.g. People without any friends named Sam? (including anyone without any friends).Jinnah
@Jinnah I bet that if you guess at how to solve that, you'll be right.Anaesthetize
If you're using a custom table name in your Friend model ( self.table_name = "custom_friends_table_name"), then use Person.includes(:friends).where(:custom_friends_table_name => {:id => nil}) .Shabbygenteel
Any idea how to get this working if the has_one or has_many association relies on class inheritance and STI? e.g. with this example Person.includes(:contacts).where( :contacts => { :id => nil } ) would fail if the table for contacts was using an STI table name that's different from its model name (e.g. like entities if the rails model contact.rb was: contact < entity)Skewness
Right, so as with the custom table name example in the comment before yours, you need to use the table name in the where clause, eg. ...where( entities: { id: nil } )Anaesthetize
Could you update with my answer? I know OP asked for solution without SQL here, but I think it's worth having one with optimal query on db server.Arcuate
@Anaesthetize A nice update in Rails 6.1 adds a missing method to do exactly this!Illogicality
It is, just in the edge docs because 6.1 isn't out yet: edgeapi.rubyonrails.org/classes/ActiveRecord/QueryMethods/…Anaesthetize
What is the opposite of missing so that i can get the NON orphan records in an associationrelation as well ? I have tried subtracting from Person.all, but i get back an array instead.Smokestack
Just Person.joins(:contacts)Anaesthetize
I absolutely cannot wait to upgrade to rails 6.1 to use .missing()Childbirth
Question: What about when you want Persons who are not friends with "user X". It can be done with a subquery, but can i use missing for that?Detonation
No, missing only applies to the whole association, but no need for a subquery, you'd just use: Person.joins(:friends).where.not(friends: { id: user_x.id })Anaesthetize
D
191

smathy has a good Rails 3 answer.

For Rails 5, you can use left_outer_joins to avoid loading the association.

Person.left_outer_joins(:contacts).where( contacts: { id: nil } )

Check out the api docs. It was introduced in pull request #12071.

Deficient answered 9/11, 2016 at 16:19 Comment(9)
Are there any downsides to this? I checked and it loaded 0.1 ms faster then .includesFivepenny
Not loading the association is a downside if you actually access it later, but a benefit if you don't access it. For my sites, a 0.1ms hit is pretty negligible, so .includes extra cost in load time wouldn't be something I would worry much about optimizing. Your use case may be different.Deficient
And if you don't yet have Rails 5, you can do this: Person.joins('LEFT JOIN contacts ON contacts.person_id = persons.id').where('contacts.id IS NULL') It works just fine as a scope as well. I do this all the time in my Rails projects.Lhary
The big advantage to this method is memory savings. When you do an includes, all of those AR objects are loaded into memory, which can be a bad things as tables get larger and larger. If you don't need access to the contact record, the left_outer_joins does not load the contact into memory. The SQL request speed is the same, but the overall app benefit is much larger.Urrutia
This is really good! Thanks! Now if the rails gods could perhaps implement it as a simple Person.where(contacts: nil) or Person.with(contact: contact) if using where encroaches too far into 'properness' - but given that contact: is already being parsed and identified as an association, it seems logical that arel could easily work out what's required...Trebuchet
if I'm not mistaken, Person.joins() would do the exact same thing as Person.left_outer_joins() as I believe a LEFT JOIN and LEFT OUTER JOIN will return the same results? I believe it's just sematics. I don't think you need to use the word "outer" anywhere in the code as it's redundant for this type of join. The important thing is to use .joins() instead of .includes() -- have a look hereAthens
If you like to get only those Persons what has Contacts then do Person.left_outer_joins(:contacts).where.not( contacts: { id: nil } ).distinctGerlachovka
@Athens you're misstaken. .joins creates an INNER JOIN (which is the default for almost every db i know of). Thats why .left_outer_joins was introduced in the first place.Sophistry
@Sophistry that's right I meant to say that .left_joins is an alias for .left_outer_joins - both of these create the same LEFT OUTER JOINS sqlAthens
A
125

This is still pretty close to SQL, but it should get everyone with no friends in the first case:

Person.where('id NOT IN (SELECT DISTINCT(person_id) FROM friends)')
Arnst answered 16/3, 2011 at 0:33 Comment(3)
Just imagine you have 10000000 records in friends table. What about performance in that case?Ourselves
@Ourselves Depending on your duplicate frequency, you can probably drop the DISTINCT. Otherwise, I think you'd want to normalize out the data and index in that case. I might do that by creating a friend_ids hstore or serialized column. Then you could say Person.where(friend_ids: nil)Arnst
If you're going to use sql, it's probably better to use not exists (select person_id from friends where person_id = person.id) (Or perhaps people.id or persons.id, depending on what your table is.) Not sure what the fastest is in a particular situation, but in the past this has worked well for me when I was not trying to use ActiveRecord.Dryly
I
14

Persons that have no friends

Person.includes(:friends).where("friends.person_id IS NULL")

Or that have at least one friend

Person.includes(:friends).where("friends.person_id IS NOT NULL")

You can do this with Arel by setting up scopes on Friend

class Friend
  belongs_to :person

  scope :to_somebody, ->{ where arel_table[:person_id].not_eq(nil) }
  scope :to_nobody,   ->{ where arel_table[:person_id].eq(nil) }
end

And then, Persons who have at least one friend:

Person.includes(:friends).merge(Friend.to_somebody)

The friendless:

Person.includes(:friends).merge(Friend.to_nobody)
Inert answered 29/9, 2013 at 16:5 Comment(2)
I think you can also do: Person.includes(:friends).where(friends: {person: nil})Krueger
Note: The merge strategy can sometimes yield a warning like DEPRECATION WARNING: It looks like you are eager loading table(s) Currently, Active Record recognizes the table in the string, and knows to JOIN the comments table to the query, rather than loading comments in a separate query. However, doing this without writing a full-blown SQL parser is inherently flawed. Since we don't want to write an SQL parser, we are removing this functionality. From now on, you must explicitly tell Active Record when you are referencing a table from a stringUndo
G
12

Both the answers from dmarkow and Unixmonkey get me what I need - Thank You!

I tried both out in my real app and got timings for them - Here are the two scopes:

class Person
  has_many :contacts
  has_many :friends, :through => :contacts, :uniq => true
  scope :without_friends_v1, -> { where("(select count(*) from contacts where person_id=people.id) = 0") }
  scope :without_friends_v2, -> { where("id NOT IN (SELECT DISTINCT(person_id) FROM contacts)") }
end

Ran this with a real app - small table with ~700 'Person' records - average of 5 runs

Unixmonkey's approach (:without_friends_v1) 813ms / query

dmarkow's approach (:without_friends_v2) 891ms / query (~ 10% slower)

But then it occurred to me that I don't need the call to DISTINCT()... I'm looking for Person records with NO Contacts - so they just need to be NOT IN the list of contact person_ids. So I tried this scope:

  scope :without_friends_v3, -> { where("id NOT IN (SELECT person_id FROM contacts)") }

That gets the same result but with an average of 425 ms/call - nearly half the time...

Now you might need the DISTINCT in other similar queries - but for my case this seems to work fine.

Thanks for your help

Gravamen answered 16/3, 2011 at 15:17 Comment(0)
L
6

Unfortunately, you're probably looking at a solution involving SQL, but you could set it in a scope and then just use that scope:

class Person
  has_many :contacts
  has_many :friends, :through => :contacts, :uniq => true
  scope :without_friends, where("(select count(*) from contacts where person_id=people.id) = 0")
end

Then to get them, you can just do Person.without_friends, and you can also chain this with other Arel methods: Person.without_friends.order("name").limit(10)

Laminated answered 16/3, 2011 at 0:29 Comment(0)
G
1

A NOT EXISTS correlated subquery ought to be fast, particularly as the row count and ratio of child to parent records increases.

scope :without_friends, where("NOT EXISTS (SELECT null FROM contacts where contacts.person_id = people.id)")
Galan answered 16/9, 2013 at 8:40 Comment(0)
L
1

Also, to filter out by one friend for instance:

Friend.where.not(id: other_friend.friends.pluck(:id))
Ledda answered 1/6, 2017 at 23:53 Comment(1)
This will result in 2 queries rather than a subquery.Parkerparkhurst
A
1

Here is an option using a subquery:

# Scenario #1 - person <-> friend
people = Person.where.not(id: Friend.select(:person_id))

# Scenario #2 - person <-> contact <-> friend
people = Person.where.not(id: Contact.select(:person_id))

The above expressions should generate the following SQL:

-- Scenario #1 - person <-> friend
SELECT people.*
FROM people 
WHERE people.id NOT IN (
  SELECT friends.person_id
  FROM friends
)

-- Scenario #2 - person <-> contact <-> friend
SELECT people.*
FROM people 
WHERE people.id NOT IN (
  SELECT contacts.person_id
  FROM contacts
)
Ardellearden answered 16/10, 2020 at 14:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.