Using SQLAlchemy to migrate databases (sqlite to Postgres CloudSQL)
Asked Answered
M

2

5

Trying to migrate the database for an early project of mine to the cloud. Although the code that built everything is janky, the db structure and data itself are fairly sound. I could probably figure out a dump method of migrating everything (pgdump, etc), but I still have much to learn about this stuff so I'd rather get the experience by doing it step-by-step.

Source: A ~1gb sqlite database file

Destination: Google CloudSQL running Postgres v9.6

Already created the tables in the cloud db, using the same schema & table names as the sqlite db. Not worried about schema enforcement errors because I haven't yet defined foreign keys in the cloud.

The Plan: Create distinct, concurrent SQLAlchemy connections to each database, then read sqlite --> write to CloudSQL. Went back and defined data structures for each table using SQLAlchemy. Snippet from models.py:

from sqlalchemy import Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base, declared_attr

Base = declarative_base()

class PublicMixin(object):
    @declared_attr
    def __tablename__(cls):
        return cls.__name__.lower()

class Category(PublicMixin, Base):
    id = Column(Integer, primary_key=True)
    name = Column(String)

class Player(PublicMixin, Base):
    id = Column(Integer, primary_key=True)
    name = Column(String)
    username = Column(String)
    notes = Column(String)
[...]

I then duplicated this file as models_lite.py so I could import each model without potential for interference. Here's the migration.py file I tried to run as a proof of concept:

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from models import Base, Category, Player
from models_sqlite import Base as Base_lite, Category as Category_lite, Player as Player_lite

# SQLite db
engine_lite = create_engine('sqlite:///mydb.sqlite')
Base_lite.metadata.bind = engine_lite
LiteSession = sessionmaker()
LiteSession.bind = engine_lite
session_lite = LiteSession()

# CloudSQL, via local proxy
engine_cloud = create_engine('postgresql+psycopg2://USER:PW@/DBNAME?host=/cloudsql/INSTANCE')
Base.metadata.bind = engine_cloud
CloudSession = sessionmaker()
CloudSession.bind = engine_cloud
session_cloud = CloudSession()

category_lite = session_lite.query(Category_lite).all()
category_cloud = Category()

for c in category_lite:
    category_cloud = c
    session_cloud.add(category_cloud)

session_cloud.commit()

Running this yields the following error:

File "postgres migration.py", line 68, in <module>
    session_cloud.add(category_cloud)
[...]
sqlalchemy.exc.InvalidRequestError: Object '<Category at 0x11141b908>' is already attached to session '1' (this is '2')

Explicitly setting each column inside the for loop works (i.e.: category_cloud.id = c.id), but there must be a way to avoid having to do this for every column in every table. How should I approach this?

Mutazilite answered 19/4, 2019 at 4:10 Comment(1)
In Python assignment does not copy, but just alters the binding. category_cloud = c means that both names now reference the same object.Clomb
J
14

This kind of data transfer operation is much more easily achieved with sqlalchemy core rather than the orm. There is no benefit to mapping database data to objects here if they are only going to be immediately written to another database, it just adds complexity and slows things down. The following code will iterate through every table in Base, select all the columns in the sqlite database and write them one table at a time to the cloud database.

from sqlalchemy import create_engine, select
from models import Base

engine_lite = create_engine('sqlite:///mydb.sqlite')
engine_cloud = create_engine('postgresql+psycopg2://USER:PW@/DBNAME?host=/cloudsql/INSTANCE')

with engine_lite.connect() as conn_lite:
    with engine_cloud.connect() as conn_cloud:
        for table in Base.metadata.sorted_tables:
            data = [dict(row) for row in conn_lite.execute(select(table.c))]
            conn_cloud.execute(table.insert().values(data))

Junia answered 19/4, 2019 at 21:30 Comment(3)
For posterity: This solution worked for most of my tables. The two tables with the most data (both in terms of raw characters and in number of rows) needed to be broken up into smaller chunks. Attributes of those two tables: #1) 1.1mm rows w/7 columns, brief data per row. #2) 5500 rows, 6 columns -- 3 of which contain large binary data of 50k to 100k characters. Breaking each into 5 equal-sized chunks worked well.Mutazilite
The models module in Pypi was last updated in 2010. I did not find this answer very usefulMithridate
@CorinaRoca the models module is not supposed to come from PyPi. It's the module where the application implements its database model.Saltpeter
F
0

SQLAlchemy 1.4 and greater:

with engine_lite.connect() as conn_lite:
    with engine_cloud.connect() as conn_cloud:
        for table in Base.metadata.sorted_tables:
            for row in conn_lite.execute(select(table.c)):
                conn_cloud.execute(table.insert().values(row._mapping))
Ftlb answered 28/12, 2023 at 15:32 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.