Return similar elements of array in Ruby
Asked Answered
E

3

0

Say I have such an array:

arr = ['footballs_jumba_10', 'footballs_jumba_11', 'footballs_jumba_12',
       'footballs_jumba_14', 'alpha_romeo_11', 'alpha_romeo_12',
       'alpha_juliet_10', 'alpha_juliet_11']

If I wanted to return duplicates, (assuming any of these strings in the array were exactly identical, I would just

return arr.detect{ |a| arr.count(a) > 1 }

but, what if I wanted to get only duplicates of the first 10 characters of each element of the array, without knowing the variations beforehand? Like this:

['footballs_', 'alpha_rome', 'alpha_juli']
Endomorphism answered 19/11, 2015 at 0:53 Comment(1)
Your example would have been better had you included a string whose first 10 characters were unique, as it would not have been returned in the desired result. (Too late now to change it.)Generally
G
1

This is quite straightforward with the method Arry#difference that I proposed in my answer here:

arr << "Let's add a string that appears just once"
  #=> ["footballs_jumba_10", "footballs_jumba_11", "footballs_jumba_12",
  #    "footballs_jumba_14", "alpha_romeo_11", "alpha_romeo_12",
  #    "alpha_juliet_10", "alpha_juliet_11", "Let's add a string that appears just once"]

a = arr.map { |s| s[0,10] }
  #=> ["footballs_", "footballs_", "footballs_", "footballs_", "alpha_rome",
  #    "alpha_rome", "alpha_juli", "alpha_juli", "Let's add "] 
b = a.difference(a.uniq)
  #=> ["footballs_", "footballs_", "footballs_", "alpha_rome", "alpha_juli"] 
b.uniq
  #=> ["footballs_", "alpha_rome", "alpha_juli"] 
Generally answered 19/11, 2015 at 1:23 Comment(2)
Simple, straight-forward, and worked with a slice and a bit of regex. Thanks mate!Endomorphism
You might also consider a twig of parsley and a dash of oregano.Generally
O
1

Use Array#uniq:

arr.map {|e| e[0..9]}.uniq
# => ["footballs_", "alpha_rome", "alpha_juli"]
Outpoint answered 19/11, 2015 at 1:18 Comment(1)
With arr << "add a unique string", arr.map {|e| e[0..9]}.uniq #=> ["footballs_", "alpha_rome", "alpha_juli", "add a uniq"], but only duplicates are wanted.Generally
G
1

This is quite straightforward with the method Arry#difference that I proposed in my answer here:

arr << "Let's add a string that appears just once"
  #=> ["footballs_jumba_10", "footballs_jumba_11", "footballs_jumba_12",
  #    "footballs_jumba_14", "alpha_romeo_11", "alpha_romeo_12",
  #    "alpha_juliet_10", "alpha_juliet_11", "Let's add a string that appears just once"]

a = arr.map { |s| s[0,10] }
  #=> ["footballs_", "footballs_", "footballs_", "footballs_", "alpha_rome",
  #    "alpha_rome", "alpha_juli", "alpha_juli", "Let's add "] 
b = a.difference(a.uniq)
  #=> ["footballs_", "footballs_", "footballs_", "alpha_rome", "alpha_juli"] 
b.uniq
  #=> ["footballs_", "alpha_rome", "alpha_juli"] 
Generally answered 19/11, 2015 at 1:23 Comment(2)
Simple, straight-forward, and worked with a slice and a bit of regex. Thanks mate!Endomorphism
You might also consider a twig of parsley and a dash of oregano.Generally
G
0

You could do something like this:

def partial_duplicates(elements)
  unique = {}
  duplicates = {}

  elements.each do |e|
    partial = e[0..9]

      # If the element is in the hash, it is a duplicate.
      if first_element = unique[partial]
        duplicates[first_element] = true
        duplicates[e] = true
      else
        # include the element as unique
        unique[partial] = e
      end
  end

  duplicates.keys
end

This will return unique duplicates. If you want all the duplicates, you can just use an Array.

Also, this returns all the full representations of each duplicate as it seems more useful and probably what you want:

partial_duplicates(arr)
=> ["footballs_jumba_10", "footballs_jumba_11", "footballs_jumba_12", "footballs_jumba_14", "alpha_romeo_11", "alpha_romeo_12", "alpha_juliet_10", "alpha_juliet_11"]

If you want only the partial duplicates you can change the condition to:

if unique[partial]
  duplicates[partial] = true
else
  unique[partial] = true
end

then:

partial_duplicates(arr)
=> ["footballs_", "alpha_rome", "alpha_juli"]
Gorgonian answered 19/11, 2015 at 1:34 Comment(4)
Since e is a string and duplicates is a hash, you can't write duplicates << e, but you're on the right track. I suggest you make unique and duplicates sets (require 'set'; unique = Set.new). You could write duplicates.add(e) (or its alias duplicates << e) to add e to the set but Set#add? would be better, as it both adds the element if it's not already in the set and tells you if it was added. Your last line is different and gives the wrong answer when arr contains a unique element. Test!Generally
Thanks for the tips Cary. duplicates << e was a mistake indeed. But since Set actually uses hashes in its implementation, I decided to keep the hash, thus not needing the extra require. As for the last suggestion, I removed it entirely.Gorgonian
I don't think avoiding a require is a good reason to use phoney hashes rather than sets. By that argument you would never use sets, but we have them for just this type of situation. It's analogous to replacing sets in mathematics with functions having certain properties. It would make the math harder to follow with no advantage. (It probably would also spark a rash of suicides among mathematicians. Fortunately, most Rubiests are not that passionate.) I would be interested in the views of other readers about this.Generally
Cary, my decision to use a hash in this case is simply one of simplifying the code in this example. I don't see how a Set makes this example simpler or more useful. And the phoney hashes as you mentioned it, is basically how Set is implemented.Gorgonian

© 2022 - 2024 — McMap. All rights reserved.