Force type conversion in python dataclass __init__ method
Asked Answered
B

9

68

I have the following very simple dataclass:

import dataclasses

@dataclasses.dataclass
class Test:
    value: int

I create an instance of the class but instead of an integer I use a string:

>>> test = Test('1')
>>> type(test.value)
<class 'str'>

What I actually want is a forced conversion to the datatype i defined in the class defintion:

>>> test = Test('1')
>>> type(test.value)
<class 'int'>

Do I have to write the __init__ method manually or is there a simple way to achieve this?

Brambling answered 25/2, 2019 at 9:52 Comment(0)
Z
64

The type hint of dataclass attributes is never obeyed in the sense that types are enforced or checked. Mostly static type checkers like mypy are expected to do this job, Python won't do it at runtime, as it never does.

If you want to add manual type checking code, do so in the __post_init__ method:

@dataclasses.dataclass
class Test:
    value: int

    def __post_init__(self):
        if not isinstance(self.value, int):
            raise ValueError('value not an int')
            # or self.value = int(self.value)

You could use dataclasses.fields(self) to get a tuple of Field objects which specify the field and the type and loop over that to do this for each field automatically, without writing it for each one individually.

def __post_init__(self):
    for field in dataclasses.fields(self):
        value = getattr(self, field.name)
        if not isinstance(value, field.type):
            raise ValueError(f'Expected {field.name} to be {field.type}, '
                             f'got {repr(value)}')
            # or setattr(self, field.name, field.type(value))
Zoltai answered 25/2, 2019 at 10:6 Comment(6)
thanks! this not exactly what i wanted (exception instead of conversion), but thanks to your suggestion I was able to find a solutionBrambling
I would err on the side of exceptions instead of implicit conversion, but I did give you the conversion alternative in the comments there…Zoltai
oops, didnt see that last comment!Brambling
An alternative solution using a decorator instead of __post_init__.Spatterdash
Don't do this with dataclasses you should prefer attrs for these kind of jobs... attrs.org/en/stableDisinclination
This was a nice workaroung for me until using from __future__ import annotations. Then field.type becomes a string, which prevents it from being used as an argument in isinstance or as a type converter in field.type(). I will switch to attrs and their explicit converters now.Varney
G
24

It's easy to achieve by using pydantic.validate_arguments

Just use the validate_arguments decorator in your dataclass:

from dataclasses import dataclass
from pydantic import validate_arguments


@validate_arguments
@dataclass
class Test:
    value: int

Then try your demo, the 'str type' 1 will convert from str to int

>>> test = Test('1')
>>> type(test.value)
<class 'int'>

If you pass the truly wrong type, it will raise exception

>>> test = Test('apple')
Traceback (most recent call last):
...
pydantic.error_wrappers.ValidationError: 1 validation error for Test
value
  value is not a valid integer (type=type_error.integer)
Georgettegeorgi answered 3/2, 2021 at 9:49 Comment(3)
It's in beta as of April 2021, but it looks amazing!Hittite
Thanks for this answer. It worked as expected!Decagon
One caveat with dataclass and validate_arguments: Together they seem to break classmethod. I.e. defining a @classmethod factory function raises exception TypeError: 'classmethod' object is not callable only when @validate_arguments is applied to the classMoravia
A
17

You could achieve this using the __post_init__ method:

import dataclasses

@dataclasses.dataclass
class Test:
    value : int

    def __post_init__(self):
        self.value = int(self.value)

This method is called following the __init__ method

https://docs.python.org/3/library/dataclasses.html#post-init-processing

Apt answered 25/2, 2019 at 10:8 Comment(0)
L
8

With Python dataclasses, the alternative is to use the __post_init__ method, as pointed out in other answers:

@dataclasses.dataclass
class Test:
    value: int

    def __post_init__(self):
        self.value = int(self.value)
>>> test = Test("42")
>>> type(test.value)
<class 'int'>

Or you can use the attrs package, which allows you to easily set converters:

@attr.define
class Test:
    value: int = attr.field(converter=int)
>>> test = Test("42")
>>> type(test.value)
<class 'int'>

You can use the cattrs package, that does conversion based on the type annotations in attr classes and dataclasses, if your data comes from a mapping instead:

@dataclasses.dataclass
class Test:
    value: int
>>> test = cattrs.structure({"value": "42"}, Test)
>>> type(test.value)
<class 'int'>

Pydantic will automatically do conversion based on the types of the fields in the model:

class Test(pydantic.BaseModel):
    value: int
>>> test = Test(value="42")
>>> type(test.value)
<class 'int'>
Lara answered 1/3, 2022 at 17:29 Comment(0)
F
1

Yeah, the easy answer is to just do the conversion yourself in your own __init__(). I do this because I want my objects frozen=True.

For the type validation, Pydandic claims to do it, but I haven't tried it yet: https://pydantic-docs.helpmanual.io/

Favouritism answered 11/4, 2019 at 18:57 Comment(1)
after the conversion, how can you assign the converted value since it is frozen!?Outofdoor
B
1

You could use descriptor-typed field:

class IntConversionDescriptor:

    def __set_name__(self, owner, name):
        self._name = "_" + name

    def __get__(self, instance, owner):
        return getattr(instance, self._name)

    def __set__(self, instance, value):
        setattr(instance, self._name, int(value))


@dataclass
class Test:
    value: IntConversionDescriptor = IntConversionDescriptor()
>>> test = Test(value=1)
>>> type(test.value)
<class 'int'>

>>> test = Test(value="12")
>>> type(test.value)
<class 'int'>

test.value = "145"
>>> type(test.value)
<class 'int'>

test.value = 45.12
>>> type(test.value)
<class 'int'>
Beatrisbeatrisa answered 29/9, 2022 at 13:10 Comment(1)
I believe the type annotation needs to be fixed. when run within an IDE, looks like it gives me warning when I try to pass it like value=1 into the constructor.Newborn
N
0

You could use a generic type-conversion descriptor, declared in descriptors.py:

import sys


class TypeConv:

    __slots__ = (
        '_name',
        '_default_factory',
    )

    def __init__(self, default_factory=None):
        self._default_factory = default_factory

    def __set_name__(self, owner, name):
        self._name = "_" + name
        if self._default_factory is None:
            # determine default factory from the type annotation
            tp = owner.__annotations__[name]
            if isinstance(tp, str):
                # evaluate the forward reference
                base_globals = getattr(sys.modules.get(owner.__module__, None), '__dict__', {})
                idx_pipe = tp.find('|')
                if idx_pipe != -1:
                    tp = tp[:idx_pipe].rstrip()
                tp = eval(tp, base_globals)
            # use `__args__` to handle `Union` types
            self._default_factory = getattr(tp, '__args__', [tp])[0]

    def __get__(self, instance, owner):
        return getattr(instance, self._name)

    def __set__(self, instance, value):
        setattr(instance, self._name, self._default_factory(value))

Usage in main.py would be like:

from __future__ import annotations
from dataclasses import dataclass
from descriptors import TypeConv


@dataclass
class Test:
    value: int | str = TypeConv()


test = Test(value=1)
print(test)

test = Test(value='12')
print(test)

# watch out: the following assignment raises a `ValueError`
try:
    test.value = '3.21'
except ValueError as e:
    print(e)

Output:

Test(value=1)
Test(value=12)
invalid literal for int() with base 10: '3.21'

Note that while this does work for other simple types, it does not handle conversions for certain types - such as bool or datetime - as normally expected.

If you are OK with using third-party libraries for this, I have come up with a (de)serialization library called the dataclass-wizard that can perform type conversion as needed, but only when fromdict() is called:

from __future__ import annotations
from dataclasses import dataclass

from dataclass_wizard import JSONWizard


@dataclass
class Test(JSONWizard):
    value: int
    is_active: bool


test = Test.from_dict({'value': '123', 'is_active': 'no'})
print(repr(test))

assert test.value == 123
assert not test.is_active

test = Test.from_dict({'is_active': 'tRuE', 'value': '3.21'})
print(repr(test))

assert test.value == 3
assert test.is_active
Newborn answered 30/9, 2022 at 23:14 Comment(0)
L
0

Why not use setattr?

from dataclasses import dataclass, fields

@dataclass()
class Test:
    value: int

    def __post_init__(self):
        for field in fields(self):
            setattr(self, field.name, field.type(getattr(self, field.name)))

Which yields the required result:

>>> test = Test('1')
>>> type(test.value)
<class 'int'>
Landlubber answered 16/11, 2022 at 7:17 Comment(1)
Thanks for you effort, but thats pretty much exactly what deceze suggested almost 4 years ago.Brambling
I
0

I had the problem of converting numpy arrays to lists and this did the job:

def fix_field_types(self):
    for key, value in self.asdict().items():
        field = self.__dataclass_fields__[key]
        if not field.type == type(value):
            new_value = field.type.__call__(value)
            self.__setattr__(field.name, new_value)
Imaginable answered 21/11, 2023 at 12:48 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.