How to make Ruby test factories with random unique data, in Factory Girl or Minifacture?
Asked Answered
B

4

9

I am testing a typical Rails model with a typical factory:

# My model uses a 3-letter uppercase airport code,
# such as "ATL" for Atlanta, "BOS" for Boston, etc.

class Airport < ActiveRecord::Base
  validates :code, uniqueness: true

Factory.define :airport do |f|
  f.code { random_airport_code }  # Get a 3-letter uppercase code

I am adding more tests and starting to see collisions in the airport code: for example the factory creates an airport with code "XYZ" then a subsequent call to the factory tries to create an airport with the same code.

A sequence is one way to tackle this. For example use a Factory Girl sequence, or an ordered list, or pre-calculated enumeration,some similar way of maintaining state of the next available code.

My question is: what are non-sequence ways to tackle this? I want to use random data, and not a sequence.

A few ideas I'm trying because they're pragmatic -- any insight on these is much appreciated.

Example idea to use optimistic locking

while 
  airport = Factory.build :airport
  airport.save && return airport
end

Pros: fast in practice because collisions are rare; local state.

Cons: awkward syntax; non-local to the factory; the save might fail for reasons other than the collision.

Example idea to use a transaction

Airport.transaction 
  while
    x = random_airport_code
    if Airport.exists?(code: x)
      next
    else
      Factory :airport, code: x
      break
    end
  end
end

Pros: this is the closest to what I want; local state; ensures there's no collision.

Cons: long awkward syntax.

Bounty

Does Factory Girl or Minifacture have any kind of syntax that is more amenable to random data, and not a sequence?

Or perhaps some kind of pattern to automatically re-roll of the dice if there's a save collision?

Some overhead is fine with me. In practice a collision is happening once per day or so, on a continuous integration setup with thousands of tests. If the test suite must re-roll the dice a few times, or probe the database for existing values, etc. that's fine.

The comments ask why random data instead of a sequence. I prefer random data because my experience is that random data leads to better tests, better long-term maintainability, and better semantics with the test goal. Also, I use Faker and Forgery instead of fixtures, in case that's helpful to know.

To earn the bounty, the answer must be random on the fly-- not a sequence. (For example, the solution I'm seeking may likely use #sample and/or an unordered set, and may likely not use #shuffle and/or an ordered set)

Board answered 29/4, 2014 at 2:12 Comment(8)
What is the advantage of random data over a sequence? Random data in tests can lead to failures which are intermittent, and difficult to diagnose or reproduce.Beseem
Want to know the reason for random as well. For sequence, as well it is 100% confident that you can have AAA to ZZZ unique data without collision. But with random, you either create a random generator that will not collide (is it possible or easy?), or check against generated data (slow performance)Commie
Perhaps you want to take a look at Faker (github.com/stympy/faker) - you could use Faker::Lorem.characters(3) and convert them into uppercase letters for your tests.Richierichlad
Thanks George & Peter & George; I added more detail, and yes I'm using Faker.Board
i have no idea what you are trying to achieve. if you want "random" data, faker should be good enough. if you want no collisions use a sequence, that should be good enough. at worst, you could combine faker and a sequence to always generate "random" data.Ballast
@Ballast I am trying to achieve a good clean syntax that uses random data, without maintaining sequence states; in the rare case that there's a collision, does something reasonable such as retrying with new random data.Board
Faster way to generate codes: 1.upto(3).inject("") { |m, e| m << (rand(26)+65).chr}Ezar
@Ezar Thanks, I'll try benchmark for that.Board
R
7

You could use a callback. Something like:

factory :airport do
  after(:build) do |airport|
    airport.code = loop do
      code = ('AAA'..'ZZZ').to_a.sample
      break code unless Airport.exists?(code: code)
    end
  end
end

You may want to change after(:build) to before(:create), it depends on how you want to use the factory.

Reiterate answered 1/5, 2014 at 23:36 Comment(1)
Thanks Gergo, your answer is the closest to my goal of truly random with a retry in the rare case that there's a collision. Much appreciated!Board
B
3

this should work, but it only allows for 17576 models to be created

CODES = ("AAA".."ZZZ").to_a.shuffle
Factory.define :airport do |f|
  f.code { CODES.pop }
end
Ballast answered 7/5, 2014 at 0:21 Comment(5)
Thanks phoet. I'm seeking a solution that doesn't use a sequence, in other words, doesn't need the CODES array which is non-local state.Board
is a member local enough? { (@codes ||= ("AAA".."ZZZ").to_a.shuffle).pop } i really like how you blow up such a trivial problemBallast
A member as in your comment would still be storing a sequence, i.e. an ordered set that knows each next element and maintains its state over time. My goal is to use a random pick each time, not a sequence.Board
in the limit space of possible values from AAA to ZZZ there is very little randomness available. how do you want to achieve collision detection without overhead?Ballast
Overhead's fine with me-- I'll add that to the question statement. Each test clears the database, and each test needs a handful of airports to run correctly, thus the actual values is just a tiny area of the space of possible values, i.e. I'm doing a pick 5 of 17576.Board
M
1

Yes, FactoryGirl has a feature that should allow you to do this. See the end of the documentation of sequences: https://github.com/thoughtbot/factory_girl/blob/master/GETTING_STARTED.md#sequences You can set a sequence to any object that knows how to return an incremented version of itself when #next is called on it. So you could write a class that knew how to return unique random data and that implements #next, e.g.

class AirportCode
  ALL = %w(AAA BBB CCC).shuffle

  attr_reader :index

  def initialize(index = rand(ALL.length))
    @index = index
  end

  def value
    ALL[@index]
  end

  def to_s
    value
  end

  # might need to explicitly delegate more methods to the value

  def method_missing(method, *args)
    value.send method, *args
  end

  def next
    AirportCode.new((index + 1) % ALL.length)
  end

end

(this one only has three unique values but it's just to make the point), create a FactoryGirl sequence and set its value to an instance of that class. I didn't try the FactoryGirl part so please report back if it works :)

Mckellar answered 29/4, 2014 at 21:24 Comment(3)
Thanks Dave. I'm looking for a solution that doesn't use a sequence.Board
Still not clear why factory_girl's sequence construct shouldn't be a part of a solution. I get that you don't want sequential data, but if you can trick a factory_girl sequence into returning non-sequential data, as above, why not?Mckellar
Because the sequence in your answer is still a sequence, i.e. it still requires maintaining the list order and tracking the index. This would be essentially the same as the "pre-calculate a sequence" example in the question. What I'm seeking is a solution without needing to track the state in code. (For example, an ideal solution will work on distributed setups, where the factories are running on different machines, all talking to the same DB)Board
S
1

Similar to @GergoErdosi's answer, I was able to get this working:

CODES = ("AAA".."ZZZ").to_a.shuffle

factory :airport do after(:build) do |airport| if Airport.exists?(code: airport.code) new_code = ('AAA'..'ZZZ').to_a.sample airport.code = new_code end end code { CODES.rotate!.first } ... #other stuff for building Airports end

Stutter answered 6/5, 2014 at 22:19 Comment(1)
Thanks for the reply. I believe the core value of your answer is the hybrid of the after(:build) retry from GergoErdosi, and the #rotate from my example.Board

© 2022 - 2024 — McMap. All rights reserved.