How to find records that have duplicate data using Active Record
Asked Answered
D

6

31

What is the best way to find records with duplicate values in a column using ruby and the new Activerecord?

Decalcify answered 24/2, 2011 at 14:42 Comment(2)
Duplicated values of just 1 column or more than 1? Is/Are these string/ints/text fields?Balch
just 1 column - strings.Decalcify
K
51

Translating @TuteC into ActiveRecord:

sql = 'SELECT id, 
         COUNT(id) as quantity 
         FROM types 
         GROUP BY name 
       HAVING quantity > 1'
#=>
Type.select("id, count(id) as quantity")
  .group(:name)
  .having("quantity > 1")
Kalliekallista answered 24/2, 2011 at 18:10 Comment(8)
Is this code supposed to work for PostgreSQL as well? It returns the error PGError: ERROR: column "quantity" does not existClothbound
@Marc, I am not sure. But you can try Type.select("id, count(id) as quantity").group(:name).having("count(id) > 1")Kalliekallista
this will only find exact duplicates.Ri
@holaSenor - What is a non-exact duplicate, exactly? (pun intended)Psychotechnics
Good question, I'm sure it made sense at the time. I suspect there was another comment, since deleted, that I was responding too. Probably a comment with some code that compared all fields in the row rather than one column, but who knows. Duplicate needs to be qualified by some criteria. By exact I think I meant the proposed code compared all attributes and they would have to be the same, OP wanted to use one column.Ri
How can I go about having the other columns of the row to work with as well?Regelation
re fl00rs answer: Postgresql - ActiveRecord::StatementInvalid: PG::GroupingError: ERROR: column "types.id" must appear in the GROUP BY clause or be used in an aggregate function. But add it to the group-by (no error), and then you get zero hits, since no two records have the same id-value. I've tried several of these solutions, and all either don't return the ID in the result (so who knows where the dups were), or fail to find the desired records when the ID is included.Woodborer
Best workaround is here: #33362169 Adapted to this would be: type_dups = Type.group(:name).having("count(name) > 1").count.keys; @results = Type.where(:name => type_dups)Woodborer
M
28

Here's how I solved it with the AREL helpers, and no custom SQL:

Person.select("COUNT(last_name) as total, last_name")
  .group(:last_name)
  .having("COUNT(last_name) > 1")
  .order(:last_name)
  .map{|p| {p.last_name => p.total} }

Really, it's just a nicer way to write the SQL. This finds all records that have duplicate last_name values, and tells you how many and what the last names are in a nice hash.

Marriott answered 12/7, 2012 at 5:20 Comment(0)
W
21

I was beating my head against this problem with a 2016 stack (Rails 4.2, Ruby 2.2), and got what I wanted with this:

> Model.select([:thing]).group(:thing).having("count(thing) > 1").all.size
 => {"name1"=>5, "name2"=>4, "name3"=>3, "name4"=>2, "name5"=>2}
Witness answered 11/8, 2016 at 8:36 Comment(1)
this was exactly what I neededTelamon
C
11

With custom SQL, this finds types with same values for name:

sql = 'SELECT id, COUNT(id) as quantity FROM types
         GROUP BY name HAVING quantity > 1'
repeated = ActiveRecord::Base.connection.execute(sql)
Cautionary answered 24/2, 2011 at 14:58 Comment(0)
T
6

In Rails 2.x, select is a private method of AR class. Just use find():

klass.find(:all, 
  :select => "id, count(the_col) as num", 
  :conditions => ["extra conditions here"], 
  :group => 'the_col', 
  :having => "num > 1")
Teutonism answered 4/4, 2011 at 17:48 Comment(2)
Totally off-topic, grave-digging comment here... but WHY IS THIS BETTER THAN SQL?Manhour
@JohnCromartie It's not. But it's fun. ;)Gildus
T
2

Here is a solution that extends the other answers to show how to find and iterate through the records grouped by the duplicate field:

duplicate_values = Model.group(:field).having(Model.arel_table[:field].count.gt(1)).count.keys
Model.where(field: duplicate_values).group_by(&:field).each do |value, records|
  puts "The records with ids #{records.map(&:id).to_sentence} have field set to #{value}"
end

It seems a shame this has to be done with two queries but this answer confirms this approach.

Termless answered 10/7, 2018 at 20:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.