Using a Repo in an Ecto migration
Asked Answered
C

2

5

I've got an Ecto migration where I'd like to modify some columns but also migrate some data. For example:

import Ecto.Query

defmodule MyApp.Repo.Migrations.AddStatus do
  alter table(:foo) do
    add(:status, :text)
  end

  foos = from(f in MyApp.Foo, where: ...)
         |> MyApp.Repo.all

  Enum.each(foos, fn(foo) ->
    # There's then some complex logic here to work 
    # out how to set the status based on other attributes of `foo`
  end)
end

Now, the problem here is that by calling MyApp.Repo.all the migration essentially uses a separate database connection to the one which the alter table... statement used (EDIT: This assumption is false, see the accepted answer). Consequently, no status column so the whole migration blows up! Note, we're using a postgres database so DDL statements are transactional.

I could do this as two separate migrations or a mix task to set the data, leaving the migration only for the schema change, but would prefer not to in order to ensure data consistency.

Any thoughts on how to use the same database connection for MyApp.Repo queries in this manner?

EDIT: Note, I'm working on a small set of data and downtime is acceptable in my use case. See José's response below for some good advice when this is not the case.

Calumniation answered 22/1, 2018 at 12:24 Comment(8)
Does adding a call to flush() after alter work?Bayreuth
Yes, flush() works! Why does that work? Am I wrong about there being two different connections here and it's simply not run the steps yet?Calumniation
Yes, flush() should work. The whole migration is wrapped in a transaction, which means any DDL statement as well as Repo expressions work on the same connection/transaction. The issue is that we group the ddl statements together until the end of the transaction or until a flush. @Bayreuth should submit an answer. :) I will edit my answer to remove the wrong advice.Otilia
Thanks again José! If @Bayreuth if you add an answer I'll accept it :)Calumniation
So will the flush() commit the transaction? What if there's some exception raised during the rest of that migration? Will the changes before the flush() be rollbacked too?Andalusite
No, flush() does not commit the transaction, it just executes the statements that have been queued up before that point. Think of each statement in the migration as getting pushed into a queue, then all run once everything's been queued up. flush() essentially lets you execute what's already been queued before adding subsequent statements. The entire migration is wrapped in a transaction, so in the event of an exception everything (regardless of flush()) that had been executed would get rolled back.Calumniation
For added context @ŠtefanĽupták, the way things are added to the queue is through the Ecto.Migration DSL functions like alter() or create(). The reason I needed a flush() is that in the example above I'm doing other stuff that assumes everything above had been run. So the migration was failing because I'd put stuff into the queue but not actually run it on the database when I tried to subsequently use it in the Enum.each().Calumniation
Thanks @seddy. I had the same problem but was afraid of having fragile migrations because of the flush(). Now you have cleared that up for me. 👍🏻Andalusite
B
6

You can execute the current pending changes in a migration by calling Ecto.Migration.flush/0. Any code after that will have the status field available.

defmodule MyApp.Repo.Migrations.AddStatus do
  alter table(:foo) do
    add(:status, :text)
  end

  flush()

  foos = from(f in MyApp.Foo, where: ...)
         |> MyApp.Repo.all

  ...
end
Bayreuth answered 22/1, 2018 at 14:2 Comment(0)
M
6

Generally speaking, doing data migration and changing the DDL at the same time is a bad practice. If this is a live system and the migration takes long, you may generate a lot of contention for a long period of time.

If your application is still processing requests, new entries can be added while the data is processed, and those won't be processed!

There are different ways to tackle this, depending on the use case, but they usually require a step by step approach. For example, if you are adding a new column:

  • the first step is to introduce the new column in the database and make sure that the new column is populated when new entries are created. This step is only to guarantee that all future entries will be correctly populated.

  • then a second step is to migrate the old data

  • finally, you can put the new data live as you can assume all of the entries are properly populated

Monocyclic answered 22/1, 2018 at 12:53 Comment(3)
Thanks José! I definitely agree with this: > If this is a live system and the migration takes long, you may generate a lot of contention for a long period of time. However, I should probably caveat this with I'm only looking at ~100 rows for an internal app that we have a small out-of-hours maintenance window for.Calumniation
I may just do this in a second migration if there's no way to access the connection (or it's sufficiently difficult to access). We're already doing a phased approach to rollout, but the maintenance window gives us flexibilty to write slightly less code and skip a step or two is all :).Calumniation
I've added an EDIT in my main question to point to this post as I think it's important, but I think Dogbert has potentially offered an alternative by calling flush(), which appears to work. Can you explain why that works for me please? Am I wrong about there being different connections and it's just that the alter statement hasn't actually run yet?Calumniation
B
6

You can execute the current pending changes in a migration by calling Ecto.Migration.flush/0. Any code after that will have the status field available.

defmodule MyApp.Repo.Migrations.AddStatus do
  alter table(:foo) do
    add(:status, :text)
  end

  flush()

  foos = from(f in MyApp.Foo, where: ...)
         |> MyApp.Repo.all

  ...
end
Bayreuth answered 22/1, 2018 at 14:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.