Using Python class as a data container [closed]
Asked Answered
U

12

74

Sometimes it makes sense to cluster related data together. I tend to do so with a dict, e.g.,

group = dict(a=1, b=2, c=3)
print(group['a'])

One of my colleagues prefers to create a class

class groupClass:
    def __init__(self, a, b, c):
        self.a = a
        self.b = b
        self.c = c

group = groupClass(1, 2, 3)
print(group.a)

Note that we are not defining any class methods.

I like to use a dict because I like to minimize the number of lines of code. My colleague thinks the code is more readable if you use a class, and it makes it easier to add methods to the class in the future.

Which do you prefer and why?

Unhand answered 28/7, 2010 at 21:18 Comment(4)
minor nitpick: you would use print self.group['a']Pentheam
@Jesse: No he's not a new programmer (and he's a good programmer), but he does more Java than Python so he likes to add a lot of code (getters, etc) that I think is unnecessary. I also may not be representing his position well.Unhand
your colleague should find this useful then.Pentheam
Related post stackoverflow.com/questions/29290359/…Shum
M
57

If you're really never defining any class methods, a dict or a namedtuple make far more sense, in my opinion. Simple+builtin is good! To each his own, though.

Mehalick answered 28/7, 2010 at 21:23 Comment(5)
namedtuple's are great; just remember they're immutable. And don't forget if you do need to add class methods later, you can always just inherit from the result of namedtuple. E.g. class Point(namedtuple('Point', 'x y')): ...Indevout
Great answers all around. I picked this one because I like the suggestion to try a namedtuple.Unhand
An example of how to use namedtuple with the OP code may be nice.Melodize
In Python 3.7 there is the @dataclass decorator which solves some problems with namedtuples and other former alternatives.Josefina
A good answer if you do not need mutability or there is no memory limit when you need to create millions of objects.Tetreault
S
66

Background

A summary of alternative attribute-based, data containers was presented by R. Hettinger at the SF Python's 2017 Holiday meetup. See his tweet and his slide deck. He also gave a talk at PyCon 2018 on dataclasses.

Other data container types are mentioned in this article and predominantly in Python 3 documentation (see links below).

Here is a discussion on the python-ideas mailing list on adding recordclass to the standard library.

Options

Alternatives in the Standard Library

External options

  • records: mutable namedtuple (see also recordclass)
  • bunch: add attribute access to dicts (inspiration for SimpleNamedspace; see also munch (py3))
  • box: wrap dicts with dot-style lookup functionality
  • attrdict: access elements from a mapping as keys or attributes
  • fields: remove boilerplate from container classes.
  • namedlist: mutable, tuple-like containers with defaults by E. Smith
  • attrs: similar to dataclasses, packed with features (validation, converters, __slots__, etc.). See also docs on cattrs.
  • misc.: posts on making your own custom struct, object, bunch, dict proxy, etc.

Which one?

Deciding which option to use depends on the situation (see Examples below). Usually an old fashioned mutable dictionary or immutable namedtuple is good enough. Dataclasses are the newest addition (Python 3.7a) offering both mutability and optional immutability, with promise of reduced boilerplate as inspired by the attrs project.


Examples

import typing as typ
import collections as ct
import dataclasses as dc


# Problem: You want a simple container to hold personal data.
# Solution: Try a NamedTuple.
>>> class Person(typ.NamedTuple):
...     name: str
...     age: int
>>> a = Person("bob", 30)
>>> a
Person(name='bob', age=30)
# Problem: You need to change age each year, but namedtuples are immutable. 
# Solution: Use assignable attributes of a traditional class.
>>> class Person:
...     def __init__(self, name, age):
...         self.name = name
...         self.age = age
>>> b = Person("bob", 30)
>>> b.age = 31
>>> b
<__main__.Person at 0x4e27128>
# Problem: You lost the pretty repr and want to add comparison features.
# Solution: Use included repr and eq features from the new dataclasses.
>>> @dc.dataclass(eq=True)
... class Person:
...     name: str
...     age: int
>>> c = Person("bob", 30)
>>> c.age = 31
>>> c
Person(name='bob', age=31)
>>> d = Person("dan", 31)
>>> c != d
True
Shum answered 13/12, 2017 at 2:37 Comment(0)
M
57

If you're really never defining any class methods, a dict or a namedtuple make far more sense, in my opinion. Simple+builtin is good! To each his own, though.

Mehalick answered 28/7, 2010 at 21:23 Comment(5)
namedtuple's are great; just remember they're immutable. And don't forget if you do need to add class methods later, you can always just inherit from the result of namedtuple. E.g. class Point(namedtuple('Point', 'x y')): ...Indevout
Great answers all around. I picked this one because I like the suggestion to try a namedtuple.Unhand
An example of how to use namedtuple with the OP code may be nice.Melodize
In Python 3.7 there is the @dataclass decorator which solves some problems with namedtuples and other former alternatives.Josefina
A good answer if you do not need mutability or there is no memory limit when you need to create millions of objects.Tetreault
C
11

By the way, I think Python 3.7 implemented @dataclass is the simplest and most efficient way to implement classes as data containers.

@dataclass
class Data:
    a: list
    b: str    #default variables go after non default variables
    c: bool = False

def func():
    return A(a="hello")

print(func())

The output would be :hello

It is too similar to Scala like case class and the easiest way to use a class as a container.

Charil answered 20/7, 2018 at 10:22 Comment(1)
I don't know if "most efficient" is true as they have a fairly large memory footprint. They are arguablly "more convenient" however.Shum
B
10

I prefer to follow YAGNI and use a dict.

Bougie answered 28/7, 2010 at 21:21 Comment(1)
I agree, except that I really enjoy the convenience of attribute access (as in JavaScript), so I prefer to use an AttributeDict.Unattended
C
9

There is a new proposal that aims to implement exactly what you are looking for, called data classes. Take a look at it.

Using a class over a dict is a matter of preference. Personally I prefer using a dict when the keys are not known a priori. (As a mapping container).

Using a class to hold data means you can provide documentation to the class attributes.

Personally, perhaps the biggest reason for me to use a class is to make use of the IDEs auto-complete feature! (technically a lame reason, but very useful in practise)

Cathartic answered 2/10, 2017 at 12:16 Comment(3)
The proposal is now official in Python 3.7Josefina
Does anyone know if there is a shim or something that allows usage of "data classes" in Python 3.6?Rearward
Just do pip install DataclassesCathartic
P
7

Your way is better. Don't try to anticipate the future too much as you are not likely to succeed.

However, it may make sense sometimes to use something like a C struct, for example if you want to identify different types rather than use dicts for everything.

Pentheam answered 28/7, 2010 at 21:41 Comment(0)
E
6

You can combine advantages of dict and class together, using some wrapper class inherited from dict. You do not need to write boilerplate code, and at the same time can use dot notation.

class ObjDict(dict):
    def __getattr__(self,attr):
        return self[attr]
    def __setattr__(self,attr,value):
        self[attr]=value

self.group = ObjDict(a=1, b=2, c=3)
print self.group.a
Equivocate answered 16/12, 2016 at 7:54 Comment(0)
T
5

If one do not care about memory footprint then dict, namedtuple, dataclass or just a class with __slots__ are good choices.

But if one have to create millions of objects with a few simple attributes in the context of limited memory then there is a solution based on recordclass library:

from recordclass import make_dataclass
C = make_dataclass("C", ('a', 'b', 'c'))
c = C(1, 2, 3)

Same with a class definition:

from recordclass import dataobject
class C(dataobject):
    a:int
    b:int
    c:int    
c = C(1, 2, 3)

It has minimal memory footprint = sizeof(PyObject_HEAD) + 3*sizeof(PyObject*) bytes.

For comparison __slots__-based variant require sizeof(PyGC_Head) + sizeof(PyObject_HEAD) + 3*sizeof(PyObject*) bytes.

Since 0.15 there is an option fast_new for faster instance creation:

C = make_dataclass("C", ('a', 'b', 'c'), fast_new=True)

or

class C(dataobject, fast_new=True):
    a:int
    b:int
    c:int    

This option accelerates the instance creation twice.

Tetreault answered 27/8, 2019 at 14:9 Comment(0)
T
3

I disagree that the code is more readable using a class with no methods. You usually expect functionality from a class, not only data.

So, I'd go for a dict until the need for functionality arises, and then the constructor of the class could receive a dict :-)

Toughen answered 28/7, 2010 at 21:24 Comment(0)
C
2

What about Prodict:

group = Prodict(a=1, b=2, c=3)
group.d = 4

And if you want auto type conversion and auto code complete(intelli-sense):

class Person(Prodict):
    name: str
    email: str
    rate: int

john = Person(name='John', email='[email protected]')
john.rate = 7
john.age = 35  # dynamic
Cofsky answered 2/3, 2018 at 2:10 Comment(0)
N
1

In a language which supports it, I would use a struct. A dictionary would be closest to a structure in Python, at least as far as I see it.

Not to mention, you could add a method to a dictionary anyway if you really wanted to ;)

Natatorial answered 28/7, 2010 at 21:22 Comment(0)
P
1

A dict is obviously appropriate for that situation. It was designed specifically for that use case. Unless you are actually going to use the class as a class, there's no use in reinventing the wheel and incurring the additional overhead / wasting the space of a class that acts as a bad dictionary (no dictionary features).

Polytechnic answered 28/7, 2010 at 21:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.