Django backup strategy with dumpdata and migrations
Asked Answered
H

1

12

As in this question, I set up a dumpdata-based backup system for my database. The setup is akin to running a cron script that calls dumpdata and moves the backup to a remote server, with the aim of simply using loaddata to recover the database. However, I'm not sure this plays well with migrations. loaddata now has an ignorenonexistent switch to deal with deleted models/fields, but it is not able to resolve cases where columns were added with one-off defaults or apply RunPython code.

The way I see it, there are two sub-problems to address:

  • Tag each dumpdata output file with the current version of each app
  • Splice the fixtures into the migration path

I'm stumped about how to tackle the first problem without introducing a ton of overhead. Would it be enough to save an extra file per backup that contained an {app_name: migration_number} mapping?

The second problem I think is easier once the first one is solved, since the process is roughly:

  1. Create a new database
  2. Run migrations forward to the appropriate point for each app
  3. Call loaddata with the given fixture file
  4. Run the rest of the migrations

There's some code in this question (linked from the bug report) that I think could be adapted for this purpose.

Since these are fairly regular/large snapshots of the database, I don't want to keep them as data migrations cluttering up the migrations directory.

Hydrops answered 16/1, 2016 at 0:11 Comment(3)
This doesn't answer your question, but I've found using database dumps to be a significantly simpler backup solution. I use pg_dump for Postgres and mysqldump for MySQL databases.Inainability
How is it simpler? Don't you end up with the same problem if you apply migrations after a backup?Hydrops
Your database dump contains a full copy of your database. This includes the django_migrations table. Migrations created before the database dump will not be run again, only migrations newer than the database dump will be run.Inainability
K
22

I am taking the following steps to backup, restore or transfer my postgresql database between any instance of my project:

The idea is to keep the least possible migrations as if manage.py makemigrations was run for the first time on an empty database.

Let's assume that we have a working database to our development environment. This database is a current copy of the production database that should not be open to any changes. We have added models, altered attributes etc and those actions have generated additional migrations.

Now the database is ready to be migrated to production which -as stated before- is not open to public so it is not altered in any way. In order to achieve this:

  • I perform the normal procedure in the development environment.
  • I copy the project to the production environment.
  • I perform the normal procedure in the production environment

We make the changes in our development environment. No changes should happen in the production database because they will be overridden.

Normal Procedure

Before anything else, I have a backup of the project directory (which includes a requirements.txt file), a backup of the database and -of course- git is a friend of mine.

  1. I take a dumpdata backup in case I need it. However, dumpdata has some serious limitations regarding content types, permissions or other cases where a natural foreignkey should be used:

    ./manage.py dumpdata --exclude auth.permission --exclude contenttypes  --exclude admin.LogEntry --exclude sessions --indent 2 > db.json
    
  2. I take a pg_dump backup to use:

    pg_dump -U $user -Fc $database --exclude-table=django_migrations > path/to/backup-dir/db.dump
    
  3. Only if I want to merge existing migrations in one, I delete all migrations from every application.

    In my case the migrations folder is a symlink, so I use the following script:

    #!/bin/bash
    for dir in $(find -L -name "migrations")
    do
      rm -Rf $dir/*
    done
    
  4. I delete and recreate the database:

    For example, a bash script can include the following commands:

    su -l postgres -c "PGPASSWORD=$password psql -c 'drop database $database ;'"
    su -l postgres -c "createdb --owner $username $database"
    su -l postgres -c "PGPASSWORD=$password psql $database -U $username -c 'CREATE EXTENSION $extension ;'"
    
  5. I restore the database from the dump:

    pg_restore -Fc -U $username -d $database path/to/backup-dir/db.dump
    
  6. If migrations were deleted in step 3, I recreate them in the following way:

    ./manage.py makemigrations <app1> <app2> ... <appn>
    

    ... by using the following script:

    #!/bin/bash
    apps=()
    for app in $(find ./ -maxdepth 1 -type d ! -path "./<project-folder> ! -path "./.*" ! -path "./")
    do
      apps+=(${app#??})
    done
    all_apps=$(printf "%s "  "${apps[@]}")
    
    ./manage.py makemigrations $all_apps
    
  7. I migrate using a fake migration:

    ./manage.py migrate --fake
    

In case something has gone completely wrong and everything is ***, (this can happen, indeed), I can use the backup to revert everything to its previous working state. If I would like to use the db.json file from step one, it goes like this:

When pg_dump or pg_restore fails

I perform the steps:

  • 3 (delete migrations)
  • 4 (delete and recreate the database)
  • 6 (makemigrations)

and then:

  • Apply the migrations:

    ./manage.py migrate
    
  • Load the data from db.json:

    ./manage.py loaddata path/to/db.json
    

Then I try to find out why my previous effort was not successful.

When the steps are performed successfully, I copy the project to the server and perform the same ones to that box.

This way, I always keep the least number of migrations and I am able to use pg_dump and pg_restore to any box that shares the same project.

Karinkarina answered 19/1, 2017 at 1:28 Comment(8)
First, thank you for this full explanation ! I wonder, how would you do if there are multiple migrations: a) install db b) migration1 c) dumpdb d) migration2. I think you cant import your dump db into a), you have to make migration1 before. But then, how do you prevent the system to make the second migration ?Birck
@JulienGreard This is a cannibalistic but "soooothing" approach given that migrations are very subtle and many times cannot be automatically reproduced. However, the idea is that after applying all migrations, backup the database, delete everything and reinstall. Something like squahing on steroids...not very useful when the project is already deployed, but sexy when you are angry!Karinkarina
I'm sorry, I dont understand your answer. I dont understand what I should do if my backup doesn't comply with "raw" db (with no migration applyed) and doesn't comply with the most recent recent db (with all migrations applyed). If my backup was made between 2 migrations, I think I'm stuck. I would have to make the N first migrations, then "loaddata" my backup and then apply the M other migrations. The thing I dont know with this approach is how to tell Django to only apply the first N migrations. I dont know if I made myself more clear. Tell me if this doesn't make sense ;-)Birck
@JulienGreard pg_restore should be performed on the code that reflects the database structure as it was when the pg_dump was performed. ./manage.py restore data should meet an empty database that ./manage.py migrate has been performed on the code that reflects the database structure as it was when the ./manage.py dumpdata was performed. However, it is better to open a new question for a detailed breakdown of the situation you are facing.Karinkarina
Why don't you use the --natural-foreign flag in step 1?Pledge
@Pledge Actually because I have not implemented it. However, I mention it in the answer. I use step 1 only in cases where a few data of an app should be collected. When the data gets bigger I mainly use step 2. It is faster and less prone to errors.Karinkarina
@Karinkarina natural key for content types not implemented by default?Pledge
@Pledge I have not delved into it but restoring a backup of content types where --natural-foreign was not set during its creation, I had to manually relink correctly the data in the relevant tables based on a pattern I discovered while digging them in an effort to find where the error was.Karinkarina

© 2022 - 2024 — McMap. All rights reserved.