How does one ignore extra arguments passed to a dataclass?
Asked Answered
S

7

69

I'd like to create a config dataclass in order to simplify whitelisting of and access to specific environment variables (typing os.environ['VAR_NAME'] is tedious relative to config.VAR_NAME). I therefore need to ignore unused environment variables in my dataclass's __init__ function, but I don't know how to extract the default __init__ in order to wrap it with, e.g., a function that also includes *_ as one of the arguments.

import os
from dataclasses import dataclass

@dataclass
class Config:
    VAR_NAME_1: str
    VAR_NAME_2: str

config = Config(**os.environ)

Running this gives me TypeError: __init__() got an unexpected keyword argument 'SOME_DEFAULT_ENV_VAR'.

Salvatore answered 13/2, 2019 at 19:51 Comment(0)
T
36

I would just provide an explicit __init__ instead of using the autogenerated one. The body of the loop only sets recognized value, ignoring unexpected ones.

Note that this won't complain about missing values without defaults until later, though.

@dataclass(init=False)
class Config:
    VAR_NAME_1: str
    VAR_NAME_2: str

    def __init__(self, **kwargs):
        names = set([f.name for f in dataclasses.fields(self)])
        for k, v in kwargs.items():
            if k in names:
                setattr(self, k, v)

Alternatively, you can pass a filtered environment to the default Config.__init__.

field_names = set(f.name for f in dataclasses.fields(Config))
c = Config(**{k:v for k,v in os.environ.items() if k in field_names})
Tobin answered 13/2, 2019 at 20:14 Comment(5)
Yeah that was my concern, it looked like the function was a little more complicated with some checks etc (but I only looked for a second). Is there any way to just rip out the autogenerated function and wrap it? I also don't really want the other environment variables in there.Salvatore
You don't want to wrap the autogenerated function; you want to replace it. That said, you can always filter the environment mapping before calling the default __init__: c = Config({k:v for k,v in kwargs if k in set(f.name for f in dataclasses.fields(Config))})Tobin
Filtering the arguments before initializing the instance worked great! If you make that into a separate answer I'll accept it. Code I ended up with: from dataclasses import dataclass, fields ... config = Config(**{k:v for k,v in os.environ.items() if k in set(f.name for f in fields(Config))}.Salvatore
Following "favor composition over inheritance" you may want iterative calls to this as a helper function (e.g., when pulling an already joined query that might be a pain to splice) to properly separate base dataclasses.Abruzzi
you are losing all of the magic that dataclass is doing in 'init'. This is not a solution to this problem!Celestaceleste
U
72

Cleaning the argument list before passing it to the constructor is probably the best way to go about it. I'd advice against writing your own __init__ function though, since the dataclass' __init__ does a couple of other convenient things that you'll lose by overriding it.

Also, since the argument-cleaning logic is very tightly bound to the behavior of the class and returns an instance, it might make sense to put it into a classmethod:

from dataclasses import dataclass
import inspect

@dataclass
class Config:
    var_1: str
    var_2: str

    @classmethod
    def from_dict(cls, env):      
        return cls(**{
            k: v for k, v in env.items() 
            if k in inspect.signature(cls).parameters
        })


# usage:
params = {'var_1': 'a', 'var_2': 'b', 'var_3': 'c'}
c = Config.from_dict(params)   # works without raising a TypeError 
print(c)
# prints: Config(var_1='a', var_2='b')
Unreliable answered 11/3, 2019 at 7:25 Comment(14)
Don't use cls.__annotations__, use dataclass.fields() so you can introspect their configuration (e.g. ignore init=False fields).Smaltite
But you'd want InitVars in this context, no? They also get skipped by dataclasses.fields(), so there might be a bit more I'll have to fix here.Unreliable
@MartijnPieters cls.__dataclass_fields__ works with InitVar inclusion and has access to the init field.Unreliable
Unfortunately that mapping also includes ClassVar fields and the init flag is not set to False for those..Smaltite
I don’t see a way to reliably achieve this without using the private API of the dataclasses module, actually :-/Smaltite
Unless you instead introspected the __init__ method.Smaltite
Updated with an _is_classvar check. I found no way to get it to work without it that didn't include essentially writing my own buggy version of it =( Introspecting __init__ sounds even riskier, or do you see a way that doesn't boil down to using regexes?Unreliable
here is an alternative with inspect.getsource, which sadly can't give me __init__. It's worse than the current one imo because typing types aliases are quite common in my experience.Unreliable
That's not what I meant. inspect.signature() will give you a Signature instance which will let you trivially create a set of acceptable parameter names.Smaltite
I wasn't aware of inspect.signature(), thanks for the hint. The version right now seems to just work for all my test cases, and it gets rid of all the private attribute/function accesses.Unreliable
I've applied this to my metaclass in the other post; it is a little more complex than just verifying that the argument exists as there may be positional-only arguments.Smaltite
If performance is a concern, it's much faster to check directly cls.__dataclass_fields__. Performance can be improved also by assigning inspect.signature(cls).parameters to a variable outside the dictionary comprehension.Juggins
@Juggins see revision 5 of my post, it's not easy to get right. You're right about the condition into a variable though.Unreliable
@Unreliable ah sweet, thanks for linking your revision!Juggins
T
36

I would just provide an explicit __init__ instead of using the autogenerated one. The body of the loop only sets recognized value, ignoring unexpected ones.

Note that this won't complain about missing values without defaults until later, though.

@dataclass(init=False)
class Config:
    VAR_NAME_1: str
    VAR_NAME_2: str

    def __init__(self, **kwargs):
        names = set([f.name for f in dataclasses.fields(self)])
        for k, v in kwargs.items():
            if k in names:
                setattr(self, k, v)

Alternatively, you can pass a filtered environment to the default Config.__init__.

field_names = set(f.name for f in dataclasses.fields(Config))
c = Config(**{k:v for k,v in os.environ.items() if k in field_names})
Tobin answered 13/2, 2019 at 20:14 Comment(5)
Yeah that was my concern, it looked like the function was a little more complicated with some checks etc (but I only looked for a second). Is there any way to just rip out the autogenerated function and wrap it? I also don't really want the other environment variables in there.Salvatore
You don't want to wrap the autogenerated function; you want to replace it. That said, you can always filter the environment mapping before calling the default __init__: c = Config({k:v for k,v in kwargs if k in set(f.name for f in dataclasses.fields(Config))})Tobin
Filtering the arguments before initializing the instance worked great! If you make that into a separate answer I'll accept it. Code I ended up with: from dataclasses import dataclass, fields ... config = Config(**{k:v for k,v in os.environ.items() if k in set(f.name for f in fields(Config))}.Salvatore
Following "favor composition over inheritance" you may want iterative calls to this as a helper function (e.g., when pulling an already joined query that might be a pain to splice) to properly separate base dataclasses.Abruzzi
you are losing all of the magic that dataclass is doing in 'init'. This is not a solution to this problem!Celestaceleste
V
7

I used a combination of both answers; setattr can be a performance killer. Naturally, if the dictionary won't have some records in the dataclass, you'll need to set field defaults for them.

from __future__ import annotations
from dataclasses import field, fields, dataclass

@dataclass()
class Record:
    name: str
    address: str
    zip: str = field(default=None)  # won't fail if dictionary doesn't have a zip key

    @classmethod
    def create_from_dict(cls, dict_) -> Record:
        class_fields = {f.name for f in fields(cls)}
        return Record(**{k: v for k, v in dict_.items() if k in class_fields})
Viewless answered 25/7, 2019 at 18:38 Comment(0)
M
4

Using the dacite python library to populate a dataclass using a dictionary of values ignores extra arguments / values present in the dictionary (along with all the other benefits the library provides).

from dataclasses import dataclass
from dacite import from_dict


@dataclass
class User:
    name: str
    age: int
    is_active: bool


data = {
    'name': 'John',
    'age': 30,
    'is_active': True,
    "extra_1": 1000,
    "extra_2": "some value"
}

user = from_dict(data_class=User, data=data)
print(user)
# prints the following: User(name='John', age=30, is_active=True)
Malevolent answered 14/8, 2022 at 8:41 Comment(0)
S
0

I did this based on previous answers:

import functools
import inspect

@functools.cache
def get_dataclass_parameters(cls: type):
    return inspect.signature(cls).parameters


def instantiate_dataclass_from_dict(cls: type, dic: dict):
    parameters = get_dataclass_parameters(cls)
    dic = {k: v for k, v in dic.items() if k in parameters}
    return cls(**dic)

Since inspect.signature(cls).parameters takes much more time than the actual instantiation / initialization, I use functools.cache to cache the result for each class.

Sharecropper answered 2/11, 2022 at 4:7 Comment(0)
C
0

It is possible with the dataclasses_json extension.

simply add the @dataclass_json(undefined=Undefined.EXCLUDE) decorator. The Undefined.EXCLUDE will do the magic. Here is an example:

from dataclasses import dataclass
from dataclasses_json import dataclass_json, Undefined

@dataclass_json(undefined=Undefined.EXCLUDE)
@dataclass
class MyClass:
    var1: int
Convalescent answered 12/1, 2024 at 8:59 Comment(0)
C
0

You can use this as decorator:

def filter_unexpected_fields(cls):
    original_init = cls.__init__

    def new_init(self, *args, **kwargs):
        expected_fields = {field.name for field in fields(cls)}
        cleaned_kwargs = {key: value for key, value in kwargs.items() if key in expected_fields}
        original_init(self, *args, **cleaned_kwargs)

    cls.__init__ = new_init
    return cls

@filter_unexpected_fields
@dataclass
class YourDataClass:

    field_a: str
Charmaincharmaine answered 5/3, 2024 at 21:10 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.