Loading initial data with Django 1.7+ and data migrations
Asked Answered
W

8

105

I recently switched from Django 1.6 to 1.7, and I began using migrations (I never used South).

Before 1.7, I used to load initial data with a fixture/initial_data.json file, which was loaded with the python manage.py syncdb command (when creating the database).

Now, I started using migrations, and this behavior is deprecated :

If an application uses migrations, there is no automatic loading of fixtures. Since migrations will be required for applications in Django 2.0, this behavior is considered deprecated. If you want to load initial data for an app, consider doing it in a data migration. (https://docs.djangoproject.com/en/1.7/howto/initial-data/#automatically-loading-initial-data-fixtures)

The official documentation does not have a clear example on how to do it, so my question is :

What is the best way to import such initial data using data migrations :

  1. Write Python code with multiple calls to mymodel.create(...),
  2. Use or write a Django function (like calling loaddata) to load data from a JSON fixture file.

I prefer the second option.

I don't want to use South, as Django seems to be able to do it natively now.

Waugh answered 21/9, 2014 at 15:37 Comment(5)
Also, I want to add another question to the OP's original question: How should we do data migrations for data not belonging in our applications. For instance, if somebody is using the sites framework he needs to have a fixture with the sites data. Since the sites framework is not related to our applications, where should we put that data migration ? Thanks !Bennettbenni
An important point that has not been addressed by anyone here yet is what happens when you need to add data defined in a data migration to a database that you have faked migrations on. Since the migrations were faked, your data migration will not run and you must do it by hand. At this point you may as well just call loaddata on a fixture file.Aoristic
Another interesting scenario is what happens if you have a data migration to create auth.Group instances for example and later on you have a new Group you want to create as seed data. You'll need to create a new data migration. This can be annoying because your Group seed data will be in multiple files. Also in the event you want to reset migrations, you'll have to look through to find the data migrations that set up seed data and port them as well.Aoristic
@Bennettbenni The question "Where to put the initial data for a third party app" does not change if you use a data migration instead of fixtures, since you only change the way the data gets loaded. I use a small custom app for things like this. If the third-party app is called "foo", I call my simple app containing the data migration/fixture "foo_integration".Fernandafernande
@Fernandafernande yes, probably using an extra application is the best way to do it!Bennettbenni
B
90

Update: See @GwynBleidD's comment below for the problems this solution can cause, and see @Rockallite's answer below for an approach that's more durable to future model changes.


Assuming you have a fixture file in <yourapp>/fixtures/initial_data.json

  1. Create your empty migration:

    In Django 1.7:

    python manage.py makemigrations --empty <yourapp>
    

    In Django 1.8+, you can provide a name:

    python manage.py makemigrations --empty <yourapp> --name load_intial_data
    
  2. Edit your migration file <yourapp>/migrations/0002_auto_xxx.py

    2.1. Custom implementation, inspired by Django' loaddata (initial answer):

    import os
    from sys import path
    from django.core import serializers
    
    fixture_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), '../fixtures'))
    fixture_filename = 'initial_data.json'
    
    def load_fixture(apps, schema_editor):
        fixture_file = os.path.join(fixture_dir, fixture_filename)
    
        fixture = open(fixture_file, 'rb')
        objects = serializers.deserialize('json', fixture, ignorenonexistent=True)
        for obj in objects:
            obj.save()
        fixture.close()
    
    def unload_fixture(apps, schema_editor):
        "Brutally deleting all entries for this model..."
    
        MyModel = apps.get_model("yourapp", "ModelName")
        MyModel.objects.all().delete()
    
    class Migration(migrations.Migration):  
    
        dependencies = [
            ('yourapp', '0001_initial'),
        ]
    
        operations = [
            migrations.RunPython(load_fixture, reverse_code=unload_fixture),
        ]
    

    2.2. A simpler solution for load_fixture (per @juliocesar's suggestion):

    from django.core.management import call_command
    
    fixture_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), '../fixtures'))
    fixture_filename = 'initial_data.json'
    
    def load_fixture(apps, schema_editor):
        fixture_file = os.path.join(fixture_dir, fixture_filename)
        call_command('loaddata', fixture_file) 
    

    Useful if you want to use a custom directory.

    2.3. Simplest: calling loaddata with app_label will load fixtures from the <yourapp>'s fixtures dir automatically :

    from django.core.management import call_command
    
    fixture = 'initial_data'
    
    def load_fixture(apps, schema_editor):
        call_command('loaddata', fixture, app_label='yourapp') 
    

    If you don't specify app_label, loaddata will try to load fixture filename from all apps fixtures directories (which you probably don't want).

  3. Run it

    python manage.py migrate <yourapp>
    
Badtempered answered 22/9, 2014 at 19:38 Comment(13)
Code in load_fixture function can be enhanced by just calling call_command('loaddata', fixture_file)Bowne
Thanks for the comment. I don't recall exactly why I did it this way (by mimicking loaddata), instead of calling loaddata directly. I remember encountering problems with loaddata not working "as expected" when running full migrations. python manage.py migrate <yourapp> would work perfectly, and so would python manage.py loaddata, but python manage.py migrate wouldn't be able to locate the fixture file. Maybe I was doing something wrong.Badtempered
Also, I think call_command('loaddata', fixture_file) would load any fixtures with that name (initial_data.json), in any model. Would have to call it with option --app=<yourapp>?Badtempered
since you have used os.path.dirname(__file__) to create fixture_file, loaddata will used initial_data.json of the current app so no --app.. isn't neededBowne
ok, you're right... Also calling loaddata('loaddata', fixture_filename, app_label='<yourapp>') will also go directly to the app fixture dir (hence no need to build the fixture's full path)Badtempered
Using that method, serializer will work on models state from current models.py files, that can have some extra fields or some other changes. If some changes were made after creating migration, it will fail (so we can't even create schema migrations after that migration). To fix that we can teporaly change apps registry that serializer is working on to registry provided to migration fuction on first parameter. Registry to path is located at django.core.serializers.python.apps.Featheredge
All works fine, but I think it is not a good idea to call the fixture file "initial_data.[xml/yaml/json]". Automatic loading is deprecated. I guess it still gets done until Django1.9. This means this file gets loaded twice in Django1.7 and Django1.8: once as automatic loading, once in you migration.Fernandafernande
Why are we doing this? Why is it that Django becomes more and more difficult to run and maintain? I don't want to go though this, i want a simple command line interface that solves this problem for me i.e. like it used to be with fixtures. Django is supposed to make this stuff easier, not harder :(Kaltman
@Featheredge This is a very important point you are making, and I think it should appear in this accepted answer. It is the same remark that appears as comment in the data migration code example of the documentation. Do you know another way to use serializers with the provided app registry, without changing a global variable (which could cause problems in an hypothetic future with parallel database migrations).Helminth
@AdN Alternatively, you can monkey patch the deserializer to look at the provided app registry.Butterworth
This answer being upvoted to kazoo along with acceptance is exactly why I recommend to folks not to use stackoverflow. Even now with the comments & anecdotes I still have folks in #django referring to this.Gonococcus
After having used this solution some months ago I ended up having issues after adding new fields to my models. When I try to run all migrations from scratch I get ProgrammingError: column "field" of relation "app_model" does not exist. After trying different things using @Rockallite 's solution (https://mcmap.net/q/204333/-loading-initial-data-with-django-1-7-and-data-migrations) was what solved the issue for me.Knotweed
Maybe I'm wrong... but if you modify a .sql file, or even add a new .sql file, python manage.py migrate load_initial_data will NOT detect any changes. So this is useful for REALLY static initial data, with no changes allowedSubauricular
J
62

Short version

You should NOT use loaddata management command directly in a data migration.

# Bad example for a data migration
from django.db import migrations
from django.core.management import call_command


def load_fixture(apps, schema_editor):
    # No, it's wrong. DON'T DO THIS!
    call_command('loaddata', 'your_data.json', app_label='yourapp')


class Migration(migrations.Migration):
    dependencies = [
        # Dependencies to other migrations
    ]

    operations = [
        migrations.RunPython(load_fixture),
    ]

Long version

loaddata utilizes django.core.serializers.python.Deserializer which uses the most up-to-date models to deserialize historical data in a migration. That's incorrect behavior.

For example, supposed that there is a data migration which utilizes loaddata management command to load data from a fixture, and it's already applied on your development environment.

Later, you decide to add a new required field to the corresponding model, so you do it and make a new migration against your updated model (and possibly provide a one-off value to the new field when ./manage.py makemigrations prompts you).

You run the next migration, and all is well.

Finally, you're done developing your Django application, and you deploy it on the production server. Now it's time for you to run the whole migrations from scratch on the production environment.

However, the data migration fails. That's because the deserialized model from loaddata command, which represents the current code, can't be saved with empty data for the new required field you added. The original fixture lacks necessary data for it!

But even if you update the fixture with required data for the new field, the data migration still fails. When the data migration is running, the next migration which adds the corresponding column to the database, is not applied yet. You can't save data to a column which does not exist!

Conclusion: in a data migration, the loaddata command introduces potential inconsistency between the model and the database. You should definitely NOT use it directly in a data migration.

The Solution

loaddata command relies on django.core.serializers.python._get_model function to get the corresponding model from a fixture, which will return the most up-to-date version of a model. We need to monkey-patch it so it gets the historical model.

(The following code works for Django 1.8.x)

# Good example for a data migration
from django.db import migrations
from django.core.serializers import base, python
from django.core.management import call_command


def load_fixture(apps, schema_editor):
    # Save the old _get_model() function
    old_get_model = python._get_model

    # Define new _get_model() function here, which utilizes the apps argument to
    # get the historical version of a model. This piece of code is directly stolen
    # from django.core.serializers.python._get_model, unchanged. However, here it
    # has a different context, specifically, the apps variable.
    def _get_model(model_identifier):
        try:
            return apps.get_model(model_identifier)
        except (LookupError, TypeError):
            raise base.DeserializationError("Invalid model identifier: '%s'" % model_identifier)

    # Replace the _get_model() function on the module, so loaddata can utilize it.
    python._get_model = _get_model

    try:
        # Call loaddata command
        call_command('loaddata', 'your_data.json', app_label='yourapp')
    finally:
        # Restore old _get_model() function
        python._get_model = old_get_model


class Migration(migrations.Migration):
    dependencies = [
        # Dependencies to other migrations
    ]

    operations = [
        migrations.RunPython(load_fixture),
    ]
Jenny answered 28/9, 2016 at 9:38 Comment(11)
Rockallite, you make a very strong point. Your answer left me wondering though, would solution 2.1 from @n__o/@mlissner's answer which relies on objects = serializers.deserialize('json', fixture, ignorenonexistent=True) suffer from the same issue as loaddata? Or does ignorenonexistent=True cover all possible issues?Knotweed
If you look at the source, you'll find that the ignorenonexistent=True argument has two effects: 1) it ignores models of a fixture which are not in the most current model definitions, 2) it ignores fields of a model of a fixture which are not in the most current corresponding model definition. None of them handles the new-required-field-in-the-model situation. So, yes, I think it suffers the same issue as plain loaddata.Jenny
This worked great once I figured out that my old json had models referenced other models using a natural_key(), which this method doesn't seem to support - I just replaced the natural_key value with the actual id of the referenced model.Tropaeolin
Probably this answer as accepted answer would be more helpful, because in running testcases a new database is created and all migrations are applied from scratch. This solution fix problems that a project with unittest will face in case of not replacing _get_model in data migration. TnxDeboradeborah
Thanks for the update and explanations, @Rockallite. My initial answer was posted a few weeks after migrations were introduced in Django 1.7, and documentation on how to proceed was unclear (and still is, last time I checked). Hopefully Django will update their loaddata / migration mechanism to take into account model history some day.Badtempered
@Jenny thanks for the answer. I was in the impression that we shouldn't put loaddata in migrations and couldn't remember why. Your answer makes a lot sense. The only question I have is how _get_model() gets the historical model when the code is exactly same as django.core.serializers.python._get_model which gets the most up-to-date model?Ose
This is a phenomenal answer and saved my butt today. Thanks!Toleration
Actually, as it turns out if you're not wedded to using a JSON file & loaddata there's a better way, using apps.get_model, see docs.djangoproject.com/en/1.11/topics/migrations/…Toleration
@Ose I updated the comment for the _get_model function. Have a look.Jenny
@AdamParkin Thanks. I think many are wedded to JSON files because Django has a convenient management command dumpdata ;)Jenny
This solution is great, but, as mentioned by @dsummersl, it does not support fixtures that rely on natural keys. If you do need to deal with natural keys, have a look at this adaptation.Yep
F
7

Inspired by some of the comments (namely n__o's) and the fact that I have a lot of initial_data.* files spread out over multiple apps I decided to create a Django app that would facilitate the creation of these data migrations.

Using django-migration-fixture you can simply run the following management command and it will search through all your INSTALLED_APPS for initial_data.* files and turn them into data migrations.

./manage.py create_initial_data_fixtures
Migrations for 'eggs':
  0002_auto_20150107_0817.py:
Migrations for 'sausage':
  Ignoring 'initial_data.yaml' - migration already exists.
Migrations for 'foo':
  Ignoring 'initial_data.yaml' - not migrated.

See django-migration-fixture for install/usage instructions.

Felic answered 6/1, 2015 at 21:27 Comment(0)
B
2

In order to give your database some initial data, write a data migration. In the data migration, use the RunPython function to load your data.

Don't write any loaddata command as this way is deprecated.

Your data migrations will be run only once. The migrations are an ordered sequence of migrations. When the 003_xxxx.py migrations is run, django migrations writes in the database that this app is migrated until this one (003), and will run the following migrations only.

Ballesteros answered 22/9, 2014 at 7:17 Comment(2)
So you encourage me to repeat calls to myModel.create(...) (or using a loop) in the RunPython function ?Quiteria
pretty much yeah. Transaactionnal databases will handle it perfectly :)Ballesteros
D
2

The solutions presented above didn't work for me unfortunately. I found that every time I change my models I have to update my fixtures. Ideally I would instead write data migrations to modify created data and fixture-loaded data similarly.

To facilitate this I wrote a quick function which will look in the fixtures directory of the current app and load a fixture. Put this function into a migration in the point of the model history that matches the fields in the migration.

Depredation answered 10/10, 2015 at 21:7 Comment(1)
Thanks for this! I wrote a version that works with Python 3 (and passes our strict Pylint). You can use it as a factory with RunPython(load_fixture('badger', 'stoat')). gist.github.com/danni/1b2a0078e998ac080111Indiscreet
U
2

On Django 2.1, I wanted to load some models (Like country names for example) with initial data.

But I wanted this to happen automatically right after the execution of initial migrations.

So I thought that it would be great to have a sql/ folder inside each application that required initial data to be loaded.

Then within that sql/ folder I would have .sql files with the required DMLs to load the initial data into the corresponding models, for example:

INSERT INTO appName_modelName(fieldName)
VALUES
    ("country 1"),
    ("country 2"),
    ("country 3"),
    ("country 4");

To be more descriptive, this is how an app containing a sql/ folder would look: enter image description here

Also I found some cases where I needed the sql scripts to be executed in a specific order. So I decided to prefix the file names with a consecutive number as seen in the image above.

Then I needed a way to load any SQLs available inside any application folder automatically by doing python manage.py migrate.

So I created another application named initial_data_migrations and then I added this app to the list of INSTALLED_APPS in settings.py file. Then I created a migrations folder inside and added a file called run_sql_scripts.py (Which actually is a custom migration). As seen in the image below:

enter image description here

I created run_sql_scripts.py so that it takes care of running all sql scripts available within each application. This one is then fired when someone runs python manage.py migrate. This custom migration also adds the involved applications as dependencies, that way it attempts to run the sql statements only after the required applications have executed their 0001_initial.py migrations (We don't want to attempt running a SQL statement against a non-existent table).

Here is the source of that script:

import os
import itertools

from django.db import migrations
from YourDjangoProjectName.settings import BASE_DIR, INSTALLED_APPS

SQL_FOLDER = "/sql/"

APP_SQL_FOLDERS = [
    (os.path.join(BASE_DIR, app + SQL_FOLDER), app) for app in INSTALLED_APPS
    if os.path.isdir(os.path.join(BASE_DIR, app + SQL_FOLDER))
]

SQL_FILES = [
    sorted([path + file for file in os.listdir(path) if file.lower().endswith('.sql')])
    for path, app in APP_SQL_FOLDERS
]


def load_file(path):
    with open(path, 'r') as f:
        return f.read()


class Migration(migrations.Migration):

    dependencies = [
        (app, '__first__') for path, app in APP_SQL_FOLDERS
    ]

    operations = [
        migrations.RunSQL(load_file(f)) for f in list(itertools.chain.from_iterable(SQL_FILES))
    ]

I hope someone finds this helpful, it worked just fine for me!. If you have any questions please let me know.

NOTE: This might not be the best solution since I'm just getting started with Django, however I still wanted to share this "How-to" with you all since I didn't find much information while googling about this.

Unhouse answered 24/1, 2019 at 21:26 Comment(1)
Maybe I'm wrong... but if you modify a .sql file, or even add a new .sql file, python manage.py migrate load_initial_data will NOT detect any changes. So this is useful for REALLY static initial data, with no changes allowed. Though, it's an improvement over the accepted answerSubauricular
D
1

In my opinion fixtures are a bit bad. If your database changes frequently, keeping them up-to-date will came a nightmare soon. Actually, it's not only my opinion, in the book "Two Scoops of Django" it's explained much better.

Instead I'll write a Python file to provide initial setup. If you need something more I'll suggest you look at Factory boy.

If you need to migrate some data you should use data migrations.

There's also "Burn Your Fixtures, Use Model Factories" about using fixtures.

Doily answered 21/9, 2014 at 16:52 Comment(2)
I agree on your point "hard to maintain if frequent changes", but here the fixture only aims to provide initial (and minimal) data when installing the project...Quiteria
This is for a one time load of data, which if it done within the context of migrations makes sense. Since if it is within a migration, one should not have to make changes to the json data. Any schema changes that require changes to the data further down the road should be handled via another migration (at that point other data maybe in the database that will also need to be modified).Adali
Y
0

Although @rockallite's answer is excellent, it does not explain how to handle fixtures that rely on natural keys instead of integer pk values.

Simplified version

First, note that @rockallite's solution can be simplified by using unittest.mock.patch as a context manager, and by patching apps instead of _get_model:

...
from unittest.mock import patch
...

def load_fixture(apps, schema_editor):
    with patch('django.core.serializers.python.apps', apps):
        call_command('loaddata', 'your_data.json', ...)

...

This works well, as long as your fixtures do not rely on natural keys.

If they do, you're likely to see a DeserializationError: ... value must be an integer....

The problem with natural keys

Under the hood, loaddata uses django.core.serializers.deserialize() to load your fixture objects.

The deserialization of fixtures based on natural keys relies on two things:

  1. the presence of a get_by_natural_key() method on the model's default manager
  2. the presence of a natural_key() method on the model itself

The get_by_natural_key() method is necessary for the deserializer to know how to interpret the natural key, instead of an integer pk value.

Both methods are necessary for the deserializer to get existing objects from the database by natural key, as also explained here.

However, the apps registry which is available in your migrations uses historical models, and these do not have access to custom managers or custom methods such as natural_key().

Possible solution: step 1

The problem of the missing get_by_natural_key() method from our custom model manager is relatively easy to solve: Just set use_in_migrations=True on your custom manager, as described in the documentation.

This ensures that your historical models can access the current get_by_natural_key() during migrations, and fixture loading should now succeed.

However, your historical models still don't have a natural_key() method. As a result, your fixtures will be treated as new objects, even if they are already present in the database. This may lead to a variety of errors if the data-migration is ever re-applied, such as:

  • unique-constraint violations (if your models have unique-constraints)
  • duplicate fixture objects (if your models do not have unique-constraints)
  • "get returned multiple objects" errors (due to duplicate fixture objects created previously)

So, effectively, you're still missing out on a kind of get_or_create-like behavior during deserialization.

To experience this, just apply a data-migration as described above (in a test environment), then roll back the same data-migration (without removing the data), then re-apply the data-migration.

Possible solution: step 2

The problem of the missing natural_key() method from the model itself is a bit more difficult to solve. One solution would be to assign the natural_key() method from the current model to the historical model, for example:

...
from unittest.mock import patch

from django.apps import apps as current_apps
from django.core.management import call_command
...


def load_fixture(apps, schema_editor):
    def _get_model_patch(app_label):
        """ add natural_key method from current model to historical model """
        historical_model = apps.get_model(app_label=app_label)
        current_model = current_apps.get_model(app_label=app_label)
        historical_model.natural_key = current_model.natural_key
        return historical_model

    with patch('django.core.serializers.python._get_model', _get_model_patch):
        call_command('loaddata', 'your_data.json', ...)

...

Notes:

  • For clarity, I omitted things like error handling and attribute checking from the example. You should implement those where necessary.
  • This solution uses the current model's natural_key method, which may still lead to trouble in certain scenarios, but the same goes for Django's use_in_migrations option for model managers.
Yep answered 15/9, 2022 at 14:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.