WARNING: after working with sequelize for over a year, I've come to realize that my suggestion is a very bad practice. I'll explain at the bottom.
tl;dr:
- never use seeders, only use migrations
- never use your sequelize models in migrations, only write explicit SQL
My other suggestion still holds up that you use some "configuration" to drive the generation of seed data. (But that seed data should be inserted via migration.)
vv DO NOT DO THIS vv
Here's another pattern, which I prefer, because I believe it is more flexible and more readily understood. I offer it here as an alternative to the accepted answer (which seems fine to me, btw), in case others find it a better fit for their circumstances.
The strategy is to leverage the sqlz models you've already defined to fetch data that was created by other seeders, use that data to generate whatever new associations you want, and then use bulkInsert
to insert the new rows.
In this example, I'm tracking a set of people and the cars they own. My models/tables:
Driver
: a real person, who may own one or more real cars
Car
: not a specific car, but a type of car that could be owned by someone (i.e. make
+ model
)
DriverCar
: a real car owned by a real person, with a color and a year they bought it
We will assume a previous seeder has stocked the database with all known Car
types: that information is already available and we don't want to burden users with unnecessary data entry when we can bundle that data in the system. We will also assume there are already Driver
rows in there, either through seeding or because the system is in-use.
The goal is to generate a whole bunch of fake-but-plausible DriverCar
relationships from those two data sources, in an automated way.
const {
Driver,
Car
} = require('models')
module.exports = {
up: async (queryInterface, Sequelize) => {
// fetch base entities that were created by previous seeders
// these will be used to create seed relationships
const [ drivers , cars ] = await Promise.all([
Driver.findAll({ /* limit ? */ order: Sequelize.fn( 'RANDOM' ) }),
Car.findAll({ /* limit ? */ order: Sequelize.fn( 'RANDOM' ) })
])
const fakeDriverCars = Array(30).fill().map((_, i) => {
// create new tuples that reference drivers & cars,
// and which reflect the schema of the DriverCar table
})
return queryInterface.bulkInsert( 'DriverCar', fakeDriverCars );
},
down: (queryInterface, Sequelize) => {
return queryInterface.bulkDelete('DriverCar');
}
}
That's a partial implementation. However, it omits some key details, because there are a million ways to skin that cat. Those pieces can all be gathered under the heading "configuration," and we should talk about it now.
When you generate seed data, you usually have requirements like:
- I want to create at least a hundred of them, or
- I want their properties determined randomly from an acceptable set, or
- I want to create a web of relationships shaped exactly like this
You could try to hard-code that stuff into your algorithm, but that's the hard way. What I like to do is declare "configuration" at the top of the seeder, to capture the skeleton of the desired seed data. Then, within the tuple-generation function, I use that config to procedurally generate real rows. That configuration can obviously be expressed however you like. I try to put it all into a single CONFIG
object so it all stays together and so I can easily locate all the references within the seeder implementation.
Your configuration will probably imply reasonable limit
values for your findAll
calls. It will also probably specify all the factors that should be used to calculate the number of seed rows to generate (either by explicitly stating quantity: 30
, or through a combinatoric algorithm).
As food for thought, here is an example of a very simple config that I used with this DriverCar system to ensure that I had 2 drivers who each owned one overlapping car (with the specific cars to be chosen randomly at runtime):
const CONFIG = {
ownership: [
[ 'a', 'b', 'c', 'd' ], // driver 1 linked to cars a, b, c, and d
[ 'b' ], // driver 2 linked to car b
[ 'b', 'b' ] // driver 3 has two of the same kind of car
]
};
I actually used those letters, too. At runtime, the seeder implementation would determine that only 3 unique Driver
rows and 4 unique Car
rows were needed, and apply limit: 3
to Driver.findAll
, and limit: 4
to Car.findAll
. Then it would assign a real, randomly-chosen Car
instance to each unique string. Finally, when generating association tuples, it uses the string to look up the chosen Car
from which to pull foreign keys and other values.
There are undoubtedly fancier ways of specifying a template for seed data. Skin that cat however you like. Hopefully this makes it clear how you'd marry your chosen algorithm to your actual sqlz implementation to generate coherent seed data.
Why the above is bad
If you use your sequelize models in migration or seeder files, you will inevitably create a situation in which the application will not build successfully from a clean slate.
How to avoid madness:
- Never use seeders, only use migrations
(Anything you can do in a seeder, you can do in a migration. Bear that in mind as I enumerate the problems with seeders, because that means none of these problems gain you anything.)
By default, sequelize does not keep records of which seeders have been run. Yes, you can configure it to keep records, but if the app has already been deployed without that setting, then when you deploy your app with the new setting, it'll still re-run all your seeders one last time. If that's not safe, your app will blow up. My experience is that seed data can't and shouldn't be duplicated: if it doesn't immediately violate uniqueness constraints, it'll create duplicate rows.
Running seeders is a separate command, which you then need to integrate into your startup scripts. It's easy for that to lead to a proliferation of npm scripts that make app startup harder to follow. In one project, I converted the only 2 seeders into migrations, and reduced the number of startup-related npm scripts from 13 to 5.
It's been hard to pin down, but it can be hard to make sense of the order in which seeders are run. Remember also that the commands are separate for running migrations and seeders, which means you can't interleave them efficiently. You'll have to run all migrations first, then run all seeders. As the database changes over time, you'll run into the problem I describe next:
- Never use your sequelize models in your migrations
When you use a sequelize model to fetch records, it explicitly fetches every column it knows about. So, imagine a migration sequence like this:
- M1: create tables Car & Driver
- M2: use Car & Driver models to generate seed data
That will work. Fast-forward to a date when you add a new column to Car (say, isElectric
). That involves: (1) creating a migraiton to add the column, and (2) declaring the new column on the sequelize model. Now your migration process looks like this:
- M1: create tables Car & Driver
- M2: use Car & Driver models to generate seed data
- M3: add
isElectric
to Car
The problem is that your sequelize models always reflect the final schema, without acknowledging the fact that the actual database is built by ordered accretion of mutations. So, in our example, M2 will fail because any built-in selection method (e.g. Car.findOne
) will execute a SQL query like:
SELECT
"Car"."make" AS "Car.make",
"Car"."isElectric" AS "Car.isElectric"
FROM
"Car"
Your DB will throw because Car doesn't have an isElectric
column when M2 executes.
The problem won't occur in environments that are only one migration behind, but you're boned if you hire a new developer or nuke the database on your local workstation and build the app from scratch.