SQLAlchemy models vs Pydantic models
Asked Answered
D

2

28

I'm following this tutorial to adapt it to my needs, in this case, to perform a sql module where I need to record the data collected by a webhook from the gitlab issues.

For the database module I'm using SQLAlchemy library and PostgreSQL as database engine.

So, I would like to solve some doubts, I have regarding the use of the Pydantic library, in particular with this example

From what I've read, Pydantic is a library that is used for data validation using classes with attributes.

But I don't quite understand some things...is the integration of Pydantic strictly necessary? The purpose of using Pydantic I understand, but the integration of using Pydantic with SQLAlchemy models I don't understand.

In the tutorial, models.py has the following content:

from sqlalchemy import Boolean, Column, ForeignKey, Integer, String
from sqlalchemy.orm import relationship

from .database import Base


class User(Base):
    __tablename__ = "users"

    id = Column(Integer, primary_key=True, index=True)
    email = Column(String, unique=True, index=True)
    hashed_password = Column(String)
    is_active = Column(Boolean, default=True)

    items = relationship("Item", back_populates="owner")


class Item(Base):
    __tablename__ = "items"

    id = Column(Integer, primary_key=True, index=True)
    title = Column(String, index=True)
    description = Column(String, index=True)
    owner_id = Column(Integer, ForeignKey("users.id"))

    owner = relationship("User", back_populates="items")

And schemas.py has the following content:

from typing import Optional

from pydantic import BaseModel


class ItemBase(BaseModel):
    title: str
    description: Optional[str] = None


class ItemCreate(ItemBase):
    pass


class Item(ItemBase):
    id: int
    owner_id: int

    class Config:
        orm_mode = True


class UserBase(BaseModel):
    email: str


class UserCreate(UserBase):
    password: str


class User(UserBase):
    id: int
    is_active: bool
    items: list[Item] = []

    class Config:
        orm_mode = True

I know that the primary means of defining objects in Pydantic is via models and also I know that models are simply classes which inherit from BaseModel.

Why does it create ItemBase, ItemCreate and Item that inherits from ItemBase?

In ItemBase it passes the fields that are strictly necessary in Item table? and defines its type?

The ItemCreate class I have seen that it is used latter in crud.py to create a user, in my case I would have to do the same with the incidents? I mean, I would have to create a clase like this:

class IssueCreate(BaseModel):
    pass

There are my examples trying to follow the same workflow:

models.py

import sqlalchemy

from sqlalchemy import Column, Table
from sqlalchemy import Integer, String, Datetime, TIMESTAMP

from .database import Base


class Issues(Base):
    __tablename__ = 'issues'

    id = Column(Integer, primary_key=True)
    gl_assignee_id = Column(Integer, nullable=True)
    gl_id_user = Column(Integer, nullable=False)
    current_title = Column(String, nullable=False)
    previous_title = Column(String, nullable=True)
    created_at = Column(TIMESTAMP(timezone=False), nullable=False)
    updated_at = Column(TIMESTAMP(timezone=False), nullable=True)
    closed_at = Column(TIMESTAMP(timezone=False), nullable=True)
    action = Column(String, nullable=False)

And schemas.py

from pydantic import BaseModel

class IssueBase(BaseModel):
    updated_at: None
    closed_at: None
    previous_title: None

class Issue(IssueBase):
    id: int
    gl_task_id: int
    gl_assignee_id: int
    gl_id_user: int
    current_title: str
    action: str

    class Config:
        orm_mode = True

But I don't know if I'm right doing it in this way, any suggestions are welcome.

Disenthral answered 22/3, 2022 at 10:38 Comment(2)
This may be interesting: github.com/tiangolo/pydantic-sqlalchemySweeps
Just a quick note that the creator of FastAPI has created SQLModel (sqlmodel.tiangolo.com) to deal with the complexity described in the above. With SQLModel you basically declare a single model that does everything, rather than having to make one model for SQLAlchemy and three for Pydantic. Documentation should be updated soon.Halliard
C
44

The tutorial you mentioned is about FastAPI. Pydantic by itself has nothing to do with SQL, SQLAlchemy or relational databases. It is FastAPI that is showing you a way to use a relational database.


is the integration of pydantic strictly necessary [when using FastAPI]?

Yes. Pydantic is a requirement according to the documentation:

Requirements

Python 3.6+

FastAPI stands on the shoulders of giants:


Why does it create ItemBase, ItemCreate and Item that inherits from ItemBase?

Pydantic models are the way FastAPI uses to define the schemas of the data that it receives (requests) and returns (responses). ItemCreate represent the data required to create an item. Item represents the data that is returned when the items are queried. The fields that are common to ItemCreate and Item are placed in ItemBase to avoid duplication.


In ItemBase it passes the fields that are strictly necessary in Item table? and defines its type?

ItemBase has the fields that are common to ItemCreate and Item. It has nothing to do with a table. It is just a way to avoid duplication. Every field of a pydantic model must have a type, there is nothing unusual there.


in my case I would have to do the same with the incidents?

If you have a similar scenario where the schemas of the data that you receive (request) and the data that you return (response) have common fields (same name and type), you could define a model with those fields and have other models inherit from it to avoid duplication.


This could be a (probably simplistic) way of understanding FastAPI and pydantic:

FastAPI transforms requests to pydantic models. Those pydantic models are your input data and are also known as schemas (maybe to avoid confusion with other uses of the word model). You can do whatever you want with those schemas, including using them to create relational database models and persisting them.

Whatever data you want to return as a response needs to be transformed by FastAPI to a pydantic model (schema). It just happens that pydantic supports an orm_mode option that allows it to parse arbitrary objects with attributes instead of dicts. Using that option you can return a relational database model and FastAPI will transform it to the corresponding schema (using pydantic).

FastAPI uses the parsing and validation features of pydantic, but you have to follow a simple rule: the data that you receive must comply with the input schema and the data that you want to return must comply with the output schema. You are in charge of deciding whatever happens in between.

Carnivorous answered 23/3, 2022 at 3:44 Comment(0)
S
2

In addition to @hernán-alarcón answer, I would like to highlight the conceptual difference and responsibilities of both Pydantic and SQLAlchemy.

SQLAlchemy is responsible for db models (should reflect the structure of your database). Pydantic should be responsible for schemas (basically defining input and output formats) and DTOs (used to transfer data between different layers of an app).

This is important because it directly affects the design of your application. For example, we can implement a layer that should be responsible for all interactions with the database - a repository. Repository is the one who should mostly operate with SQLAlchemy models, so a model should be an output of repository methods.

# Models
class User(Base):
    __tablename__ = "users"

    id = Column(Integer, primary_key=True, index=True)
    email = Column(String, unique=True, index=True)
    hashed_password = Column(String)
    is_active = Column(Boolean, default=True)

    items = relationship("Item", back_populates="owner")


# Repositories
class UserRepository:
    """User repository."""

    def __init__(self, db: DatabaseClient):
        """."""

        self.db = db

    async def get_by_id(self, user_id: UUID4) -> User:
        """Get user by id."""

        async with self.db.get_async_session() as session:
            return (
                await session.execute(
                    select(User).where(User.id == user_id)
                )
            ).scalar()

The layer that aggregates and manages data from various repositories, clients etc - is controller. The output of the controller will most likely be the output of your endpoint, so it should be a schema.

# Schemas
class UserSchema(BaseModel):
    model_config = ConfigDict(from_attributes=True)

    id: int
    is_active: bool
    items: list[UserSchema]


# Controllers
class UserController:
    """User controller."""

    def __init__(self, user_repository: UserRepository):
        """."""

        self.user_repository = user_repository

    async def get_user(self, user_id: int) -> UserSchema:
        """Get user."""

        # Check permissions
        user = await self.user_repository.get_by_id(user_id=user_id)  # Get model
        
        # Send analytics
        # Do more things
        return UserSchema.model_validate(obj=user)  # Convert model to schema

In the example above, the schema and the model have the same attribute names, so you can use model_validate to convert one to the other, but this is not always the case. Sometimes they can have completely different attributes since you don't have to reflect your database structure in the API response.

In this case, it would be a good idea to implement your own method for this purpose:

# Assume you would like to return item ids instead of items.
class UserSchema(BaseModel):
    id: int
    is_active: bool
    item_ids: list[int]

    @classmethod
    def from_db_model(cls, user: User) -> "UserSchema":
        """Build schema based of db model provided."""

        return cls(
            id=user.id,
            is_active=user.is_active,
            item_ids=[item.id for item in user.items]
        )

UserSchema.from_db_model(user=user)
Skinhead answered 8/7, 2024 at 15:10 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.