Validate Pydantic dynamic float enum by name with OpenAPI description
Asked Answered
T

3

3

Following on from this question and this discussion I am now trying to create a Pydantic BaseModel that has a field with a float Enum that is created dynamically and is validated by name. (Down the track I will probably want to use Decimal but for now I'm dealing with float.)

The discussion provides a solution to convert all Enums to validate by name, but I'm looking for how to do this for one or more individual fields, not a universal change to all Enums.

I consider this to be a common use case. The model uses an Enum which hides implementation details from the caller. The valid field values that a caller can supply are a limited list of names. These names are associated with internal values (in this case float) that the back-end wants to operate on, without requiring the caller to know them.

The Enum valid names and values do change dynamically and are loaded at run time but for the sake of clarity this would result in an Enum something like the following. Note that the Sex enum needs to be treated normally and validated and encoded by value, but the Factor enum needs to be validated by name:

from enum import Enum
from pydantic import BaseModel

class Sex(str, Enum):
    MALE = "M"
    FEMALE = "F"

class Factor(Enum):
    single = 1.0
    half = 0.4
    quarter = 0.1

class Model(BaseModel):
    sex: Sex
    factor: Factor
    class Config:
        json_encoders = {Factor: lambda field: field.name}

model = Model(sex="M", factor="half")
# Error: only accepts e.g. Model(sex="M", factor=0.4)

This is what I want but doesn't work because the normal Pydantic Enum behaviour requires Model(factor=0.4), but my caller doesn't know the particular float that's in use right now for this factor, it can and should only provide "half". The code that manipulates the model internally always wants to refer to the float and so I expect it to have to use model.factor.value.

It's fairly simple to create the Enum dynamically, but that doesn't provide any Pydantic support for validating on name. It's all automatically validated by value. So I think this is where most of the work is:

Factor = Enum("Factor", {"single": 1.0, "half": 0.4, "quarter": 0.1})

The standard way for Pydantic to customise serialization is with the json_encoders Config attribute. I've included that in the sample static Enum. That doesn't seem to be problematic.

Finally, there needs to be support to provide the right description to the OpenAPI schema.

Actually, in my use-case I only need the Enum name/values to be dynamically established. So an implementation that modifies a declared Enum would work, as well as an implementation that creates the Enum type.

Turbinate answered 28/2, 2023 at 2:49 Comment(1)
See github.com/pydantic/pydantic/discussions/… for an up-to-date answer.Minutiae
U
2

Update (2023-03-03)

Class decorator solution

A convenient way to solve this is by creating a reusable decorator that adds both a __get_validators__ method and a __modify_schema__ method to any given Enum class. Both of these methods are documented here.

We can define a custom validator function that will be called for our decorated Enum classes, which will enforce that only names will be turned into members and actual members will pass validation.

The schema modifier will ensure that the JSON schema only shows the names as enum options.

from collections.abc import Callable, Iterator
from enum import EnumMeta
from typing import Any, Optional, TypeVar, cast

from pydantic.fields import ModelField

E = TypeVar("E", bound=EnumMeta)

def __modify_enum_schema__(
    field_schema: dict[str, Any],
    field: Optional[ModelField],
) -> None:
    if field is None:
        return
    field_schema["enum"] = list(cast(EnumMeta, field.type_).__members__.keys())

def __enum_name_validator__(v: Any, field: ModelField) -> Any:
    assert isinstance(field.type_, EnumMeta)
    if isinstance(v, field.type_):
        return v  # value is already an enum member
    try:
        return field.type_[v]  # get enum member by name
    except KeyError:
        raise ValueError(f"Invalid {field.type_.__name__} `{v}`")

def __get_enum_validators__() -> Iterator[Callable[..., Any]]:
    yield __enum_name_validator__

def validate_by_name(cls: E) -> E:
    setattr(cls, "__modify_schema__", __modify_enum_schema__)
    setattr(cls, "__get_validators__", __get_enum_validators__)
    return cls

Usage

from enum import Enum
from random import choices, random
from string import ascii_lowercase

from pydantic import BaseModel

# ... import validate_by_name


# Randomly generate an enum of floats:
_members = {
    name: round(random(), 1)
    for name in choices(ascii_lowercase, k=3)
}
Factor = Enum("Factor", _members)  # type: ignore[misc]
validate_by_name(Factor)
first_member = next(iter(Factor))
print("`Factor` members:", Factor.__members__)
print("First `Factor` member:", first_member)


class Foo(Enum):
    member_a = "a"
    member_b = "b"


@validate_by_name
class Bar(int, Enum):
    x = 1
    y = 2


class Model(BaseModel):
    factor: Factor
    foo: Foo
    bar: Bar

    class Config:
        json_encoders = {Factor: lambda field: field.name}


obj = Model.parse_obj({
    "factor": first_member.name,
    "foo": "a",
    "bar": "x",
})
print(obj.json(indent=4))
print(Model.schema_json(indent=4))

Example output:

`Factor` members: {'r': <Factor.r: 0.1>, 'j': <Factor.j: 0.9>, 'z': <Factor.z: 0.6>}
First `Factor` member: Factor.r
{
    "factor": "r",
    "foo": "a",
    "bar": 1
}
{
    "title": "Model",
    "type": "object",
    "properties": {
        "factor": {
            "$ref": "#/definitions/Factor"
        },
        "foo": {
            "$ref": "#/definitions/Foo"
        },
        "bar": {
            "$ref": "#/definitions/Bar"
        }
    },
    "required": [
        "factor",
        "foo",
        "bar"
    ],
    "definitions": {
        "Factor": {
            "title": "Factor",
            "description": "An enumeration.",
            "enum": [
                "r",
                "j",
                "z"
            ]
        },
        "Foo": {
            "title": "Foo",
            "description": "An enumeration.",
            "enum": [
                "a",
                "b"
            ]
        },
        "Bar": {
            "title": "Bar",
            "description": "An enumeration.",
            "enum": [
                "x",
                "y"
            ],
            "type": "integer"
        }
    }
}

This just demonstrates a few variations for this approach. As you can see, the Factor and Bar enums are both validated by name, whereas Foo is validated by value (as a regular Enum).

Since we defined a custom JSON Encoder for Factor, the factor value is exported/encoded as the name string, while both Foo and Bar are exported by value (as a regular Enum).

Both Factor and Bar display the enum names in their JSON schema, while Foo shows the enum values.

Note that the "type": "integer" for the JSON Schema of Bar is only present because I specified int as a explicit base class of Bar and disappears, if we remove that. To further ensure consistency, we could of course also simply add "type": "string" inside our __modify_enum_schema__ function.

The only thing that is seemingly impossible right now is to also somehow register our custom way of encoding those enums inside our decorator, so that we do not need to set it in the Config or pass the encoder argument to json explicitly. That may be possible with a few changes to the BaseModel logic, but I think this would be overkill.


Original answer

Validating Enum by name

The parsing part of your problem can be solved fairly easily with a custom validator.

Since a validator method can take the ModelField as an argument and that has the type_ attribute pointing to the type of the field, we can use that to try to coerce any value to a member of the corresponding Enum.

We can actually write a more or less generalized implementation that applies to any arbitrary Enum subtype fields. If we use the "*" argument for the validator, it will apply to all fields, but we also need to set pre=True to perform our checks before the default validators kick in:

from enum import Enum
from typing import Any

from pydantic import BaseModel, validator
from pydantic.fields import ModelField


class CustomBaseModel(BaseModel):
    @validator("*", pre=True)
    def coerce_to_enum_member(cls, v: Any, field: ModelField) -> Any:
        """For any `Enum` typed field, attempt to """
        type_ = field.type_
        if not (isinstance(type_, type) and issubclass(type_, Enum)):
            return v  # field is not an enum type
        if isinstance(v, type_):
            return v  # value is already an enum member
        try:
            return type_(v)  # get enum member by value
        except ValueError:
            try:
                return type_[v]  # get enum member by name
            except KeyError:
                raise ValueError(f"Invalid {type_.__name__} `{v}`")

That validator is agnostic of the specific Enum subtype and it should work for all of them because it uses the common EnumType API, such as EnumType.__getitem__ to get the member by name.

The nice thing about this approach is that while valid Enum names will be turned into the correct Enum members, passing a valid Enum value still works as it did before. As does passing the member directly.

Enum names in the JSON Schema

This is a bit more hacky, but not too bad.

Pydantic actually allows us to easily customize schema generation for specific fields. This is done by adding the __modify_schema__ classmethod to the type in question.

For Enum this turns out to be tricky, especially since you want to it to be created dynamically (via the Functional API). We cannot simply subclass Enum and add our modifier method there due to some magic around the EnumType. What we can do is simply monkey-patch it into Enum (or alternatively do that to our specific Enum subclasses).

Either way, this method again gives us all we need to replace the default "enum" schema section with an array of names instead of values:

from enum import Enum
from typing import Any, Optional

from pydantic.fields import ModelField


def __modify_enum_schema__(
    field_schema: dict[str, Any],
    field: Optional[ModelField],
) -> None:
    if field is None:
        return
    enum_cls = field.type_
    assert isinstance(enum_cls, type) and issubclass(enum_cls, Enum)
    field_schema["enum"] = list(enum_cls.__members__.keys())


# Monkey-patch `Enum` to customize schema modification:
Enum.__modify_schema__ = __modify_enum_schema__  # type: ignore[attr-defined]

And that is all we need. (Mypy will complain about the monkey-patching of course.)

Full demo

from enum import Enum
from random import choices, random
from string import ascii_lowercase
from typing import Any, Optional

from pydantic import BaseModel, validator
from pydantic.fields import ModelField


def __modify_enum_schema__(
    field_schema: dict[str, Any],
    field: Optional[ModelField],
) -> None:
    if field is None:
        return
    enum_cls = field.type_
    assert isinstance(enum_cls, type) and issubclass(enum_cls, Enum)
    field_schema["enum"] = list(enum_cls.__members__.keys())


# Monkey-patch `Enum` to customize schema modification:
Enum.__modify_schema__ = __modify_enum_schema__  # type: ignore[attr-defined]


class CustomBaseModel(BaseModel):
    @validator("*", pre=True)
    def coerce_to_enum_member(cls, v: Any, field: ModelField) -> Any:
        """For any `Enum` typed field, attempt to """
        type_ = field.type_
        if not (isinstance(type_, type) and issubclass(type_, Enum)):
            return v  # field is not an enum type
        if isinstance(v, type_):
            return v  # value is already an enum member
        try:
            return type_(v)  # get enum member by value
        except ValueError:
            try:
                return type_[v]  # get enum member by name
            except KeyError:
                raise ValueError(f"Invalid {type_.__name__} `{v}`")


# Randomly generate an enum of floats:
_members = {
    name: round(random(), 1)
    for name in choices(ascii_lowercase, k=3)
}
Factor = Enum("Factor", _members)  # type: ignore[misc]
first_member_name = next(iter(Factor)).name
print("Random `Factor` members:", Factor.__members__)
print("First member:", first_member_name)


class Model(CustomBaseModel):
    factor: Factor
    foo: str
    bar: int

    class Config:
        json_encoders = {Factor: lambda field: field.name}


obj = Model.parse_obj({
    "factor": first_member_name,
    "foo": "spam",
    "bar": -1,
})
print(obj.json(indent=4))
print(Model.schema_json(indent=4))

Output:

Random `Factor` members: {'a': <Factor.a: 0.9>, 'q': <Factor.q: 0.6>, 'e': <Factor.e: 0.8>}
First member: a
{
    "factor": "a",
    "foo": "spam",
    "bar": -1
}
{
    "title": "Model",
    "type": "object",
    "properties": {
        "factor": {
            "$ref": "#/definitions/Factor"
        },
        "foo": {
            "title": "Foo",
            "type": "string"
        },
        "bar": {
            "title": "Bar",
            "type": "integer"
        }
    },
    "required": [
        "factor",
        "foo",
        "bar"
    ],
    "definitions": {
        "Factor": {
            "title": "Factor",
            "description": "An enumeration.",
            "enum": [
                "a",
                "q",
                "e"
            ]
        }
    }
}

Notes

I chose this super weird way of randomly generating an Enum just for illustrative purposes. I wanted to show that both validation and schema generation still work fine in that case. But in practice I would assume that the names actually don't change that drastically every time the program is run. (At least I hope they don't for the sake of your users.)

The value of factor is still a regular Enum member, so obj.factor.value will still give us 0.9 (for this random example).

The validator will obviously prevent invalid names/values to be passed. You can make it more specific, if you like or restrict it to only deal with str arguments assuming them to be Enum member names and delegate the rest to Pydantic's default validator. As it is written right now, it essentially replaces that default Enum validator.

Any other schema modifications (such as the description) can be done according to the docs I linked as well.

Urmia answered 28/2, 2023 at 20:34 Comment(11)
Firstly really glad to get what seems to be a comprehensive answer so quickly, thanks. I'm looking through it now but, no, the names of the enum don't really change. That's the primary characteristic related to why we need to take an incoming name rather than a value, but it is part of an automated system and the name/value pairs are loaded on each run, so the possibility needs to be managed.Turbinate
I'm still looking but I think you've taken the generic approach all the way through. Which means all enums in the BaseModel will be validated by name. I need to apply name validation only to specific enums. Also, the OpenAPI spec changes need to apply to only those specific enums. This is what I've been finding difficult. Also, it's not actually "nice" that values are also accepted. Values are not a valid part of the client-side enum. That's a violation too. I'm going to add an additional enum to the demo code to highlight the requirement to treat enums individually.Turbinate
I really appreciate your answer @DaniilFajnberg, but I can't accept it because it doesn't meet the requirements to only handle one or more specific enums (not all enums - there are still some ordinary value-validated enums in there as well - I've added one to the example for clarity). This has been the tricky part for me. It would be great if you could consider that and modify your answer. Meanwhile I'm working from your answer to see if I can use it to solve the problem.Turbinate
You did not explicitly state those requirements in the initial version of your question, leaving that open for interpretation. That being said, I thought it was obvious that you can modify what fields the validator applies to by changing "*" to whatever fields you want. As for __modify_schema__, I mentioned that you can instead monkeypatch your specific Enum subtypes in my answer. And if you don't want the Enum values to be valid, you can just omit that try section from the validator method. But I'll add a "PS" to my post, if I find the time later today.Urmia
Thanks for coming back to comment, Daniil. I did emphatically include those requirements in my original question. This is the journey I've been on with this one. My links in the question refer to other places where a generic approach has worked. Generic is not good enough for me. The modification that I made to the question was only to add an additional normal Enum to make the requirement stick a bit more in your face when you try to code it. I hear what you're saying about targeting your approach. I may have to try to go that way but whether it will work remains to be seen.Turbinate
Hi @DaniilFajnberg I finally got my approach working except somehow json_encoders is not being called. I also modified your answer as suggested and thats working (with json_encoders too). I now feel your approach looks better. It requires less enum custom dunder method hacking. I'm going to accept your answer and add another answer to show your approach cut down to only work on the one individual enum. If you add that to your answer, I'll delete the extra answer. But please add it on, don't delete your generalised solution. that's good value for others that we shouldn't lose.Turbinate
@Turbinate I came up with a somewhat more elegant approach IMO. Using a decorator, we can selectively apply this special treatment to our Enum subclasses. Dynamic or not, it works as expected and hides a lot of the ugly monkey-patching from the user. And neither we nor the user have to define custom validator methods because each decorated Enum will have it already.Urmia
Once we finally get an intersection type in Python, it will even be possible to annotate the decorator in a way that will add the information about the methods it adds to the class, can then be picked up by static type checkers. But for now, the methods will be "hidden" of course.Urmia
Oh no! And I only just got it working! Lol! This question has certainly been nailed, anyway. Thanks very much for your multiple contributions... I will be looking at the decorator method too, though, sigh ;-)Turbinate
Note @DaniilFajnberg, the schema should have a type entry too, set to string, in addition to the enum entry with the member names.Turbinate
Wow @DaniilFajnberg that's beautiful - everything (except JSON encoding as you say) is packed away inside the decorator and that can be put in a separate module or library. Then all you need to do is decorate the Enum and add a JSON encoder! It's fantastic. Compared to the mess I was originally coding you've really packed everything into a nice little gift box. You really really earned this one. Thanks heaps. I'll be implementing the decorator of course.Turbinate
T
0

I've managed to almost complete my own answer to this question, using methods attached to the dynamic Enum to handle schema generation and validation, but there is still apparently a problem with JSON encoding.

I preferred to attach the custom processing to the type (Factor) because that is it's logical home, given the modifications are all related to the type, not the model. This also keeps it DRY if the type is used in other models too. But the Pydantic model still needs to call the custom methods on the type, they don't function on their own, so the point is a little moot, although this design still avoids code duplication.

The following code should run as-is, and accomplishes everything that is in the question, except Pydantic doesn't seem to be respecting the json_encoders config with this set-up.

import types
from enum import Enum
from pydantic import BaseModel, ValidationError
import pytest


class Sex(str, Enum):
    """Normal Enum validated by value."""
    MALE = "M"
    FEMALE = "F"


def __modify_schema__(cls, schema):
    """Specify Enum names for schema for Factor enum."""
    schema["enum"] = list(cls.__members__.keys())
    schema["type"] = "string"


def __get_validators__(cls):
    """Validators for Factor enum."""
    yield cls._validate


def _validate(cls, value):
    """Validation for Factor enum by name, not value."""
    names = list(cls.__members__.keys())
    if value in names:
        return cls.__members__[value]
    raise ValueError(f"{value} is not a valid enumeration member for {cls.__name__}; permitted: {names}")


members = {"single": 1.0, "half": 0.4, "quarter": 0.1}
"""Change these members to create dynamic enum Factor."""
Factor = Enum("Factor", members, type=float)
Factor.__modify_schema__ = types.MethodType(__modify_schema__, Factor)
Factor._validate = types.MethodType(_validate, Factor)
Factor.__get_validators__ = types.MethodType(__get_validators__, Factor)


class Model(BaseModel):
    sex: Sex
    factor: Factor

    class Config:
        json_encoders = {Factor: lambda field: field.name}
        """Apparently the JSON encoder is not being called."""

model = Model(sex="M", factor="half")

# broken: assert model.json() == '{"sex": "M", "factor": "half"}'

assert model.schema() == {
    "title": "Model",
    "type": "object",
    "properties": {"sex": {"$ref": "#/definitions/Sex"}, "factor": {"$ref": "#/definitions/Factor"}},
    "required": ["sex", "factor"],
    "definitions": {
        "Sex": {"title": "Sex", "description": "An enumeration.", "enum": ["M", "F"], "type": "string"},
        "Factor": {
            "title": "Factor",
            "description": "An enumeration.",
            "enum": ["single", "half", "quarter"],
            "type": "string",
        },
    },
}


with pytest.raises(ValidationError) as excinfo:
    model = Model(sex="M", factor=1.0)
assert excinfo.value.errors()[0]["msg"].startswith("1.0 is not a valid enumeration member for Factor;")

with pytest.raises(ValidationError) as excinfo:
    model = Model(sex="MALE", factor="half")
assert excinfo.value.errors()[0]["msg"].startswith("value is not a valid enumeration member; permitted: 'M', 'F'")

I did try to subclass the dynamically created type Factor, in order to add the altered behaviour in a normally defined class, but it seems the dynamically created Enum doesn't like that. Python says TypeError: ReFactor: cannot extend enumeration 'Factor' when attempting class ReFactor(Factor):.

As Daniil-Fajnberg says, there is also probably a solution by making the generic validator and other magic methods look for the specific enum, but I feel it's a bit uglier to "zoom out" to the generic case and then have to check for the individual enum, rather than just implement it on the specific enum itself. Although I'm wondering now if at least that method will work with the json_encoders.

It took me some time to find out how to apply these magic methods to the dynamically created Enum, but now they're working the json_encoders call isn't. I've stepped through the call to model.json() but I can't see where json_encoders is consulted. That's the only part of this solution missing. If anyone can tell me why json_encoders has stopped working I'd be grateful.

Turbinate answered 1/3, 2023 at 9:28 Comment(0)
T
0

As you can see from my other answer, I tried a slightly different approach from @DaniilFajnberg's answer but it's not quite complete.

I thought that my approach would be "nicer" because it focused the custom functions on the actual Enum type that was going to use them. However, as it turned out, it's not that good. There are multiple custom dunder methods that have to be assigned manually to the dynamic Enum, and I think it's even less compact than @Daniil's. Whereas @Daniil's method sticks to better documented features intended for the purpose, (even though it's spread out over the BaseModel as well as the dynamic Enum), and it reads better.

Also, fairly crucially, the json_encoders call is still working in @Daniil's set up whereas with my solution, for some unknown reason, the json_encoders call appears to have stopped working somewhere along the line; so the JSON is wrong.

I therefore accepted @Daniil's answer. But @Daniil implemented his answer for the general case, that is, all and every Enum in the app will be "reversed" (validated by name instead of value). Whereas my requirement is for only the specific individual Enum to be customised. So I'm just going to show here a version of @Daniil's answer that is cut back to work on just the one Enum, for the benefit of others (and myself), but I will still accept the original answer from @Daniil since he did that work.

from enum import Enum
from typing import Any, Optional

from pydantic import BaseModel, validator, ValidationError
from pydantic.fields import ModelField
import pytest


class Sex(str, Enum):
    MALE = "M"
    FEMALE = "F"


_members = {"single": 1.0, "half": 0.4, "quarter": 0.1}
"""Any dict of str, float pairs can be loaded from wherever at run time."""
Factor = Enum("Factor", _members)  # type: ignore[misc]
"""The Factor Enum is created dynamically."""


def __modify_factor_schema__(schema: dict[str, Any], field: Optional[ModelField]) -> None:
    """Schema modification is applied only to the specific Enum being customised."""
    schema["enum"] = list(field.type_.__members__.keys())
    schema["type"] = "string"

Factor.__modify_schema__ = __modify_factor_schema__  # type: ignore[attr-defined]



class Model(BaseModel):
    sex: Sex
    factor: Factor

    @validator("factor", pre=True)
    def validate_by_name(cls, value: Any, field: ModelField) -> Any:
        """Return Enum member by name instead of member value."""
        members = field.type_.__members__
        if value in members:
            return members[value]
        members = list(members.keys())
        raise ValueError(f"value is not a valid enumeration member for {field.type_.__name__}; permitted: {members}")

    class Config:
        json_encoders = {Factor: lambda field: field.name}


model = Model(sex="M", factor="half")

assert model.json() == '{"sex": "M", "factor": "half"}'

assert model.schema() == {
    "title": "Model",
    "type": "object",
    "properties": {"sex": {"$ref": "#/definitions/Sex"}, "factor": {"$ref": "#/definitions/Factor"}},
    "required": ["sex", "factor"],
    "definitions": {
        "Sex": {"title": "Sex", "description": "An enumeration.", "enum": ["M", "F"], "type": "string"},
        "Factor": {
            "title": "Factor",
            "description": "An enumeration.",
            "enum": ["single", "half", "quarter"],
            "type": "string",
        },
    },
}

with pytest.raises(ValidationError) as excinfo:
    model = Model(sex="M", factor=1.0)
assert excinfo.value.errors()[0]["msg"].startswith("value is not a valid enumeration member for Factor")

with pytest.raises(ValidationError) as excinfo:
    model = Model(sex="MALE", factor="half")
assert excinfo.value.errors()[0]["msg"].startswith("value is not a valid enumeration member; permitted: 'M', 'F'")
Turbinate answered 3/3, 2023 at 12:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.