peewee vs sqlalchemy performance
Asked Answered
T

1

6

I have 2 simple scripts:

from sqlalchemy import create_engine, ForeignKey, Table
from sqlalchemy import Column, Date, Integer, String, DateTime, BigInteger, event
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.engine import Engine
from sqlalchemy.orm import relationship, backref, sessionmaker, scoped_session, Session

class Test(declarative_base()):
    __tablename__ = "Test"
    def __init__(self, *args, **kwargs):
        args = args[0]
        for key in args:
            setattr(self, key, args[key] )
    key = Column(String, primary_key=True)

data = []
for a in range(0,10000):
    data.append({ "key" : "key%s" % a})


engine = create_engine("sqlite:///testn", echo=False)
with engine.connect() as connection:
    Test.metadata.create_all(engine)
    session = Session(engine)
    list(map(lambda x: session.merge(Test(x)), data))
    session.commit()

result:

real    0m15.300s
user    0m14.920s
sys     0m0.351s

second script:

from peewee import *

class Test(Model):
    key = TextField(primary_key=True,null=False)

dbname = "test"
db = SqliteDatabase(dbname)
Test._meta.database = db
data = []
for a in range(0,10000):
    data.append({ "key" : "key%s" % a })

if not Test.table_exists():
    db.create_tables([Test])
with db.atomic() as tr:
        Test.insert_many(data).upsert().execute()

result:

real    0m3.253s
user    0m2.620s
sys     0m0.571s

Why?

Tonina answered 29/4, 2016 at 20:24 Comment(2)
Those 10000 merges will make 10000 trips to the db.Reannareap
These are not comparable at all. You're comparing oranges to skyscrapers.Sienkiewicz
O
17

This comparison is not entirely valid, as issuing an upsert style query is very different from what SQLAlchemy's Session.merge does:

Session.merge() examines the primary key attributes of the source instance, and attempts to reconcile it with an instance of the same primary key in the session. If not found locally, it attempts to load the object from the database based on primary key, and if none can be located, creates a new instance.

In this test case this will result in 10,000 load attempts against the database, which is expensive.

On the other hand when using peewee with sqlite the combination of insert_many(data) and upsert() can result in a single query:

INSERT OR REPLACE INTO Test (key) VALUES ('key0'), ('key1'), ...

There's no session state to reconcile, since peewee is a very different kind of ORM from SQLAlchemy and on a quick glance looks closer to Core and Tables

In SQLAlchemy instead of list(map(lambda x: session.merge(Test(x)), data)) you could revert to using Core:

session.execute(Test.__table__.insert(prefixes=['OR REPLACE']).values(data))

A major con about this is that you have to write a database vendor specific prefix to INSERT by hand. This will also subvert the Session, as it will have no information or knowledge about the newly added rows.

Bulk insertions using model objects are a little more involved with SQLAlchemy. Very simply put using an ORM is a trade-off between ease of use and speed:

ORMs are basically not intended for high-performance bulk inserts - this is the whole reason SQLAlchemy offers the Core in addition to the ORM as a first-class component.

Offhand answered 2/5, 2016 at 8:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.