Validate Python TypedDict at runtime
Asked Answered
N

6

22

I'm working in a Python 3.8+ Django/Rest-Framework environment enforcing types in new code but built on a lot of untyped legacy code and data. We are using TypedDicts extensively for ensuring that data we are generating passes to our TypeScript front-end with the proper data type.

MyPy/PyCharm/etc. does a great job of checking that our new code spits out data that conforms, but we want to test that the output of our many RestSerializers/ModelSerializers fits the TypeDict. If I have a serializer and typed dict like:

class PersonSerializer(ModelSerializer):
    class Meta:
        model = Person
        fields = ['first', 'last']

class PersonData(TypedDict):
    first: str
    last: str
    email: str

and then run code like:

person_dict: PersonData = PersonSerializer(Person.objects.first()).data

Static type checkers don't be able to figure out that person_dict is missing the required email key, because (by design of PEP-589) it is just a normal dict. But I can write something like:

annotations = PersonData.__annotations__
for k in annotations:
    assert k in person_dict  # or something more complex.
    assert isinstance(person_dict[k], annotations[k])

and it will find that email is missing from the data of the serializer. This is well and good in this case, where I don't have any changes introduced by from __future__ import annotations (not sure if this would break it), and all my type annotations are bare types. But if PersonData were defined like:

class PersonData(TypedDict):
    email: Optional[str]
    affiliations: Union[List[str], Dict[int, str]]

then isinstance is not good enough to check if the data passes (since "Subscripted generics cannot be used with class and instance checks").

What I'm wondering is if there already exists a callable function/method (in mypy or another checker) that would allow me to validate a TypedDict (or even a single variable, since I can iterate a dict myself) against an annotation and see if it validates?

I'm not concerned about speed, etc., since the point of this is to check all our data/methods/functions once and then remove the checks later once we're happy that our current data validates.

Nicki answered 17/3, 2021 at 0:18 Comment(4)
PEP 589 links to one possible solution: typing_inspect. You could also map the input to dataclasses.Pareira
Unfortunately, typing_inspect is focused on looking at the types themselves, not whether data conforms to a type.Nicki
+1 to using dataclasses - there are already some powerful validation libraries for converting dicts to dataclasses and vice versa (I'm a big fan of dacite for simple deserialization, and you can also check out marshmallow if you want to do heavier validation)Indicatory
Thank you @sara -- I appreciate it.Nicki
G
12

The simplest solution I found works using pydantic.

Solution for pydantic v2

import pydantic

from pydantic import TypeAdapter, ValidationError
from typing_extensions import TypedDict # Required by pydantic for python < 3.12

class SomeDict(TypedDict):
    val: int
    name: str
    
SomeDictValidator = TypeAdapter(SomeDict)

# this could be a valid/invalid declaration
obj: SomeDict = {
    'val': 12,
    'name': 'John',
}

# validate with pydantic
try:
    obj = SomeDictValidator.validate_python(obj)
except ValidationError as exc: 
    print(f"ERROR: Invalid schema: {exc}")

see TypeAdapter documentation for more information.

Solution for pydantic v1

from typing import cast, TypedDict 

import pydantic

class SomeDict(TypedDict):
    val: int
    name: str

# this could be a valid/invalid declaration
obj: SomeDict = {
    'val': 12,
    'name': 'John',
}

# validate with pydantic
try:
    obj = cast(SomeDict, pydantic.create_model_from_typeddict(SomeDict)(**obj).dict())
except pydantic.ValidationError as exc: 
    print(f"ERROR: Invalid schema: {exc}")

EDIT: When type checking this, it currently returns an error, but works as expected. See here: https://github.com/samuelcolvin/pydantic/issues/3008

Geosphere answered 22/7, 2021 at 11:1 Comment(5)
have you got this working with nested TypedDicts? i.e. let's say name is defined like name: SomeOtherDict where SomeOtherDict is also a TypedDict?Maskanonge
Actually, pydantic will not throw an error with obj {'val': '12', 'name': 'John'} (see github.com/pydantic/pydantic/issues/578) except for when you use pydantic's own strict types.Shearin
Adding to @MoritzMakowski information, fixing the check above would need to change the class definition to: class SomeDict(TypedDict): val: pydantic.StrictInt name: pydantic.StrictStrAmboise
pydantic.create_model_from_typeddict has been removed in pydantic v2.Hexateuch
What if you want to throw in the case there's an extra key? Ex: {'val': 12, 'name': 'John', 'unwanted_key': 'should throw!'}. I'm asking here too.Sonnie
I
3

You may want to have a look at https://pypi.org/project/strongtyping/. This may help.

In the docs you can find this example:

from typing import List, TypedDict

from strongtyping.strong_typing import match_class_typing


@match_class_typing
class SalesSummary(TypedDict):
    sales: int
    country: str
    product_codes: List[str]

# works like expected
SalesSummary({"sales": 10, "country": "Foo", "product_codes": ["1", "2", "3"]})

# will raise a TypeMisMatch
SalesSummary({"sales": "Foo", "country": 10, "product_codes": [1, 2, 3]})
Intuitivism answered 26/7, 2021 at 8:34 Comment(0)
A
1

A little bit of a hack, but you can check two types using mypy command line -c options. Just wrap it in a python function:

import subprocess

def is_assignable(type_to, type_from) -> bool:
    """
    Returns true if `type_from` can be assigned to `type_to`,
    e. g. type_to := type_from

    Example:
    >>> is_assignable(bool, str) 
    False
    >>> from typing import *
    >>> is_assignable(Union[List[str], Dict[int, str]], List[str])
    True
    """
    code = "\n".join((
        f"import typing",
        f"type_to: {type_to}",
        f"type_from: {type_from}",
        f"type_to = type_from",
    ))
    return subprocess.call(("mypy", "-c", code)) == 0
Appointed answered 9/4, 2021 at 21:33 Comment(3)
Thank you! it IS a hack, but it is a hack that works, so worth the bounty! How to get the types from actual data is still a little bit of a mystery, BUT I think that I should be able do something like a repr(data) and try to make the assignment.Nicki
I was not, unfortunately, able to get this to work with actual data, where I had instance data as one of the types.Nicki
... But this doesn't solve the problem, does it? Isn't OP asking about checking whether the model matches the class at runtime, e.g. has all required fields and possibly not additional fields? Doesn't this just check two types? And launching a process to do it?Tabanid
T
0

You could do something like this:

def validate(typ: Any, instance: Any) -> bool:
    for property_name, property_type in typ.__annotations__.items():
        value = instance.get(property_name, None)
        if value is None:
            # Check for missing keys
            print(f"Missing key: {property_name}")
            return False
        elif property_type not in (int, float, bool, str):
            # check if property_type is object (e.g. not a primitive)
            result = validate(property_type, value)
            if result is False:
                return False
        elif not isinstance(value, property_type):
            # Check for type equality
            print(f"Wrong type: {property_name}. Expected {property_type}, got {type(value)}")
            return False
    return True

And then test some object, e.g. one that was passed to your REST endpoint:

class MySubModel(TypedDict):
    subfield: bool


class MyModel(TypedDict):
    first: str
    last: str
    email: str
    sub: MySubModel

m = {
    'email': 'JohnDoeAtDoeishDotCom',
    'first': 'John'
}
assert validate(MyModel, m) is False

This one prints the first error and returns bool, you could change that to exceptions, possibly with all the missing keys. You could also extend it to fail on additional keys than defined by the model.

Tabanid answered 30/6, 2022 at 4:29 Comment(1)
probably need a Sentinel instead of None, since None can be a valid value.Nicki
S
0

I like your solution!. In order to avoid iteration fixes for some user, I added some code to your solution :D

def validate_custom_typed_dict(instance: Any, custom_typed_dict:TypedDict) -> bool|Exception:
    key_errors = []
    type_errors = []
    for property_name, type_ in my_typed_dict.__annotations__.items():
        value = instance.get(property_name, None)
        if value is None:
            # Check for missing keys
            key_errors.append(f"\t- Missing property: '{property_name}' \n")
        elif type_ not in (int, float, bool, str):
            # check if type is object (e.g. not a primitive)
            result = validate_custom_typed_dict(type_, value)
            if result is False:
                type_errors.append(f"\t- '{property_name}' expected {type_}, got {type(value)}\n")
        elif not isinstance(value, type_):
            # Check for type equality
            type_errors.append(f"\t- '{property_name}' expected {type_}, got {type(value)}\n")

    if len(key_errors) > 0 or len(type_errors) > 0:
        error_message = f'\n{"".join(key_errors)}{"".join(type_errors)}'
        raise Exception(error_message)
    
    return True

some console output:

Exception: 
        - Missing property: 'Combined_cycle' 
        - Missing property: 'Solar_PV' 
        - Missing property: 'Hydro' 
        - 'timestamp' expected <class 'str'>, got <class 'int'>
        - 'Diesel_engines' expected <class 'float'>, got <class 'int'>
Stulin answered 22/7, 2022 at 23:30 Comment(1)
What are you referring to with I like your solution? Please hyperlink.Peart
M
0

I'd use typing.get_type_hints function which returns a dict from a TypeDict (tested under python 3.8):

from typing import TypedDict, get_type_hints

def checkdict(value: object, typedict: type) -> None:
    """
    Raise a TypeError if value does not check the TypeDict.
    :param value: the value to check
    :param typedict: the TypeDict type
    """
    if not isinstance(value, dict):
        raise TypeError(f'Value must be a dict not a: {type(value).__name__}')
    d = get_type_hints(typedict)
    diff = d.keys() ^ value.keys()
    if diff: # must have the same fields
        raise TypeError(f"Invalid dict fields: {' '.join(diff)}")
    for k, v in get_type_hints(typedict).items():
        if not isinstance(value[k], v): # must have same types
            raise TypeError(
                f"Invalid type: '{k}' should be {v.__name__} "
                f"but is {type(value[k]).__name__}"
            )

class TargetDict(TypedDict):
    name: str
    integer: int

obj: dict = {
    'name': 'John',
    'integer': '3',
}

checkdict(
    obj, TargetDict
)  # TypeError: Invalid type: 'integer' should be int but is str
Maloney answered 13/5, 2023 at 9:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.