Should we be using Faker in Rails Factories?

Asked 23/1, 2016 at 11:35 Answered 7/9, 2017 at 13:2

Solved ruby-on-rails ruby factory-bot faker

I love Faker, I use it in my seeds.rb all the time to populate my dev environment with real-ish looking data.

I've also just started using Factory Girl which also saves a lot of time - but when i sleuth around the web for code examples I don't see much evidence of people combining the two.

Q. Is there a good reason why people don't use faker in a factory?

My feeling is that by doing so I'd increase the robustness of my tests by seeding random - but predictable - data each time, which hopefully would increase the chances of a bug popping up.

But perhaps that's incorrect and there is either no benefit over hard coding a factory or I'm not seeing a potential pitfall. Is there a good reason why these two gems should or shouldn't be combined?

Mortonmortuary answered 23/1, 2016 at 11:35 Comment(5)

Why would you want to generate data dynamically every time you create a test model? It's just overhead – Sandpiper 23/1, 2016 at 11:37

Right so agreed, test performance would be impacted - but couldn't that be worth it on a complex app, especially one with loads of validation, to check that I've not written something stupid that allows firstName: Michal but not firstName: Huw, surely Faker's variety would lead to more robust testing? – Mortonmortuary 23/1, 2016 at 11:41

It's called edge case testing. Still no need for random data – Sandpiper 23/1, 2016 at 11:43

But then don't you need to know all edge cases you might want to test in advance? – Mortonmortuary 23/1, 2016 at 11:52

Of course you do! But look. Imagine you use random data for tests and want to compare that name assigned to Person model instance is correct. If the name was generated by Faker, how would you want to do it? Compare model with itself? That makes no sense. Whole point of unit testing is to compare code output with KNOWN values! – Sandpiper 23/1, 2016 at 11:57

Some people argue against it, as here.

DO NOT USE RANDOM ATTRIBUTE VALUES

One common pattern is to use a fake data library (like Faker or Forgery) to generate random values on the fly. This may seem attractive for names, email addresses or telephone numbers, but it serves no real purpose. Creating unique values is simple enough with sequences:
FactoryGirl.define do   
  sequence(:title) { |n| "Example title #{n}" }

  factory :post do
    title
  end 
end

FactoryGirl.create(:post).title # => 'Example title 1' 
Your randomised data might at some stage trigger unexpected results in your tests, making your factories frustrating to work with. Any value that might affect your test outcome in some way would have to be overridden, meaning:

Over time, you will discover new attributes that cause your test to fail sometimes. This is a frustrating process, since tests might fail only once in every ten or hundred runs – depending on how many attributes and possible values there are, and which combination triggers the bug. You will have to list every such random attribute in every test to override it, which is silly. So, you create non-random factories, thereby negating any benefit of the original randomness. One might argue, as Henrik Nyh does, that random values help you discover bugs. While possible, that obviously means you have a bigger problem: holes in your test suite. In the worst case scenario the bug still goes undetected; in the best case scenario you get a cryptic error message that disappears the next time you run the test, making it hard to debug. True, a cryptic error is better than no error, but randomised factories remain a poor substitute for proper unit tests, code review and TDD to prevent these problems.

Randomised factories are therefore not only not worth the effort, they even give you false confidence in your tests, which is worse than having no tests at all.

But there's nothing stopping you from doing it if you want to, just do it.

Oh, and there is an even easier way to inline a sequence in recent FactoryGirl, that quote was written for an older version.

Carpetbag answered 23/1, 2016 at 18:39 Comment(2)

Thank you Jrochkind, much appreciate the link too, interesting that he lists a link to that Henrik Nyh post, I find myself deeply tempted by this approach and Aef's contribution above. But I think Arjan van der Gaag makes a good case that using faker is "a poor substitute for proper unit tests, code review and TDD to prevent these problems." Thanks all for the perspectives. – Mortonmortuary 23/1, 2016 at 21:57

At some point there will be so much combinatorial complexity in growing projects, that you will always have gaps in the test coverage, even if it is 100% on the paper. And as I said in my answer, the effort you have building the factories has usually no extra cost, because you might do it anyway if you want to be able to present features easily to customers with some decent, non personalized demo data. – Drucie 23/1, 2016 at 23:2

It's up to you.

In my opinion is a very good idea to have random data in tests and it always helped me to discover bugs and corner cases I didn't think about.

I never regret to have random data. All the points described by @jrochkind would be correct (and you should read the other answer before reading this one) but it's also true that you can (and should) write that in your spec_helper.rb

config.before(:all)  { Faker::Config.random = Random.new(config.seed) }

this will make so that you have repeatable tests with repeatable data as well. If you don't do that then you have all the problems described in the other answer.

Polymerize answered 7/9, 2017 at 13:2 Comment(1)

I particularly like this answer, gets me enough randomness initially to ensure that I can test my initial expectations, but repeatability when i need it, thanks @Polymerize :) – Mortonmortuary 1/10, 2017 at 17:18

I like to use Faker and usually do so when working with larger code bases. I see the following advantages and disadvantages when using Faker with Factory Girl:

Possible disadvantages:

A bit harder to reproduce the exact same test scenario (at least RSpec works around this by displaying the random number generator seed every time and allows you to reproduce the exact same test with it)
Generating data wastes a bit of performance

Possible advantages:

Makes data displayed usually more humanly comprehensible. When creating test-data manually, people tend to all kinds of short-cuts to avoid the tediousness.
Building factories with Faker for tests at the same time provides you with the means of generating nice demo data for presentations.
You could randomly discover edge case bugs when running the tests a lot

Drucie answered 23/1, 2016 at 15:31 Comment(3)

Thanks Aef, that's a nice summary- can you speak to Michal's point above? Do you think this violates a basis of testing with known values? Also when you say it gives nice test data for demos, I guess your talking about seed files there? How does that factor into using Factories for tests? Are you using factories in your seed file? – Mortonmortuary 23/1, 2016 at 16:51

What I experience very often when writing tests is, that it often doesn't matter what the actual value is, but more that it is the same value that was given somewhere traverses through some code and is returned either exactly the same or modified in a specific way. Here, it doesn't matter if the value is fixed or random. The Faker data provides further use because it tries to be closer to the real data so humans can better comprehend it when debugging tests. When the content of the value matters for the test, usually you don't use Faker for it anyway. – Drucie 23/1, 2016 at 17:22

I usually define the factories as (maybe optional) part of my main code base instead of having it bundled with the test code. Then I can create demo data everywhere, be it in a seed file, or in a live console, or really in any other context I wish. – Drucie 23/1, 2016 at 17:24

Recommended topics

Hot tags