How to handle multiple db alter scripts coming from different Git feature branches?

Asked 9/7, 2014 at 15:56 Answered 2/8, 2014 at 4:2

A bit complex to describe, but I'll do my best. Basically we're using the Git workflow, meaning we have the following branches:

production, which is the live branch. Everything is production is running in the live web environment.
integration, in which all new functionality is integrated. This branch is merged to production every week.
one or more feature branches, in which developers or development teams develop new functionality. After this is done, developers merge their feature branch to integration.

So, nothing really complex here. But, since our application is a web application running against a MySQL database, new functionality often requires changes to the database scheme. To automate this, we're using dbdeploy, which allows us to create alter scripts, given a number. E.g. 00001.sql, 00002.sql, etc. Upon merging to the integration branch, dbdeploy will check which alter scripts have a higher number than the latest executed one on that specific database, and will execute those.

Now assume the following. - integration has alter scripts up until 00200.sql. All of these are executed on the integration database. - developer John has a feature branch featureX, which was created when integration still had 00199.sql as the highest alter script.

John creates 00200.sql because of some required db schema changes.

Now, at some point John will merge his modifications back to the integration branch. John will get a merge conflict and will see that his 00200.sql already exists in integration. This means he needs to open the conflicting file, extract his contents, reset that file back to 'mine' (the original state as in integration) and put his own contents in a new file.

Now, since we're working with ten developers, we get this situation daily. And while we do understand the reasons behind this, it's sometimes very cumbersome. John renames his script, does a merge commit to integration, pushes the changes to the upstream only to see that somebody else already created a 00201.sql, requiring John to do the proces again.

Surely there must be more teams using the Git workflow and using a database change management tool for automating database schema changes?

So, in short, my questions are:

How to automate database schema changes, when working on different feature branches, that operate on different instances of the same db?
How to prevent merge conflicts all the time, while still having the option to have a fixed order in the executed alter scripts? E.g. 00199.sql must be executed before 00200.sql, because 00200.sql might be depending on something done in 00199.sql.

Any other tips are most welcome ofcourse.

Absently answered 9/7, 2014 at 15:56 Comment(1)

A useful and tricky discussion, for sure. Software developers out there, take note. Tracking schema changes as the OP has pointed out really depends on what kind of coverage you expect to get from the effort in the first place. Want a daily build? Versioning? Recovery points? Defensive coding? The solutions will vary as a result. – Peplum 22/7, 2014 at 20:54

Rails used to do this, with exactly the problems you describe. They changed to the following scheme: the files (rails calls them migrations) are labelled with a utc timestamp of when the file was created, eg

20140723069701_add_foo_to_bar

(The second part of the name doesn't contribute to the ordering).

Rails records the timestamps of all the migrations that have been run. When you ask it to run pending migrations it selects all the migration files whose timestamp isn't in the list of already run migrations and runs them in numerical order.

You'll no longer get merge conflicts unless two people create one at exactly the same point in time.

Files still get executed in the order you wrote them, but possibly interleaved with someone else's work. In theory you can still have problems - eg developer a decides to rename a table that I had decided to add a column too. That is much less common than 2 developers both making any changes to the db and you would have problems even not considering the schema changes presumably I have just written code that queries a no longer existant table - at some point developers working on related stuff will have to talk to each other!

Lazybones answered 23/7, 2014 at 6:18 Comment(0)

A few suggestions:

1 - have a look at Liquibase, each version gets a file that references the changes that need to happen, then the change files can be named using a meaningful string rather than by number.

2 - have a central location for getting the next available number, then people use the latest number.

I've used Liquibase in the past, pretty successfully, and we didn't have the problem you describe.

Dogtired answered 23/7, 2014 at 18:33 Comment(2)

But did you use the GIT workflow with feature branches for each developer too? Because, especially with option 2 I would assume you still have problems with guaranteed order of execution on all branches. E.g. John would take number 3, because I took number 2. But John will merge his changes to the integration branch before me. That way John's changes will be executed before mine on the integration branch. Later when merging to production, my changes will be executed before Johns. – Absently 24/7, 2014 at 8:50

I've never used option 2 at all, it's the best I could come up with for a schema management tool that is using numbered script names. I have used feature branches with Liquibase, both with a grouped release and with continuous deployment and it worked fine. Liquibase tracks which change set have been applied and applies those that are missing, it also supports dependencies between change sets allowing for enforced ordering where necessary. – Dogtired 24/7, 2014 at 16:48

As Frederick Cheung suggested, use timestamps rather than a serial number. Applying schema changes by order of datestamp should work, because schema changes can only depend on changes of a prior date.

In addition, include the name of the developer in the name of the alter script. This will prevent merge conflicts 100%.

Your merge hook should just look for newly added alter scripts (present in the merged branch but not in the upstream branch) and execute them by order of timestamp.

Warford answered 26/7, 2014 at 12:53 Comment(0)

I've used two different approaches to overcome your problem in the past.

The first is to use a n ORM which can handle the schema updates.

The other approach is to create a script, which incrementally builds the database schema. This way if a developer needs to an additional row in a table, he should add the appropriate sql statement after the table is create. Likewise if he needs a new table, he should add the sql statement for that. Then merging becomes a question of making sure things happen in the correct order. This is basically what the database update process in an ORM does. Such a script needs to be coded very defensively, and each statement should check if its perquisites exists.

Folkestone answered 22/7, 2014 at 12:4 Comment(0)

For the dbvc commandline tool, I use git log to determine the order of the update scripts.

git log -c --no-merges --pretty="format:" --name-status -p dev/db/updates/ | \
  grep '^A' | awk '{print $2}' | tac

In this case the way the order of your commits will determine the sequence in which the updates are run. Which is most likely what you want.

If you run git merge b, the updates from master will be run first and than from B.
If you run git rebase b, the update from B will run first and than from master.

Lettie answered 2/8, 2014 at 4:2 Comment(0)

Recommended topics

Hot tags