To understand the importance(or not) of initializing attributes in __init__
, let's take a modified version of your class MyClass
as an example. The purpose of the class is to compute the grade for a subject, given the student name and score. You may follow along in a Python interpreter.
>>> class MyClass:
... def __init__(self,name,score):
... self.name = name
... self.score = score
... self.grade = None
...
... def results(self, subject=None):
... if self.score >= 70:
... self.grade = 'A'
... elif 50 <= self.score < 70:
... self.grade = 'B'
... else:
... self.grade = 'C'
... return self.grade
This class requires two positional arguments name
and score
. These arguments must be provided to initialize a class instance. Without these, the class object x
cannot be instantiated and a TypeError
will be raised:
>>> x = MyClass()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __init__() missing 2 required positional arguments: 'name' and 'score'
At this point, we understand that we must provide the name
of the student and a score
for a subject as a minimum, but the grade
is not important right now because that will be computed later on, in the results
method. So, we just use self.grade = None
and don't define it as a positional arg. Let's initialize a class instance(object):
>>> x = MyClass(name='John', score=70)
>>> x
<__main__.MyClass object at 0x000002491F0AE898>
The <__main__.MyClass object at 0x000002491F0AE898>
confirms that the class object x
was successfully created at the given memory location. Now, Python provides some useful built-in methods to view the attributes of the created class object. One of the methods is __dict__
. You can read more about it here:
>>> x.__dict__
{'name': 'John', 'score': 70, 'grade': None}
This clearly gives a dict
view of all the initial attributes and their values. Notice, that grade
has a None
value as assigned in __init__
.
Let's take a moment to understand what __init__
does. There are many answers and online resources available to explain what this method does but I'll summarize:
Like __init__
, Python has another built-in method called __new__()
. When you create a class object like this x = MyClass(name='John', score=70)
, Python internally calls __new__()
first to create a new instance of the class MyClass
and then calls __init__
to initialize the attributes name
and score
. Of course, in these internal calls when Python does not find the values for the required positional args, it raises an error as we've seen above. In other words, __init__
initializes the attributes. You can assign new initial values for name
and score
like this:
>>> x.__init__(name='Tim', score=50)
>>> x.__dict__
{'name': 'Tim', 'score': 50, 'grade': None}
It is also possible to access individual attributes like below. grade
does not give anything because it is None
.
>>> x.name
'Tim'
>>> x.score
50
>>> x.grade
>>>
In the results
method, you will notice that the subject
"variable" is defined as None
, a positional arg. The scope of this variable is inside this method only. For the purposes of demonstration, I explicitly define subject
inside this method but this could have been initialized in __init__
too. But what if I try to access it with my object:
>>> x.subject
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'MyClass' object has no attribute 'subject'
Python raises an AttributeError
when it cannot locate an attribute within the class's namespace. If you do not initialize attributes in __init__
, there is a possibility to encounter this error when you access an undefined attribute that could be local to the method of a class only. In this example, defining subject
inside __init__
would have avoided the confusion and would've been perfectly normal to do so as it is not required for any computation either.
Now, lets call results
and see what we get:
>>> x.results()
'B'
>>> x.__dict__
{'name': 'Tim', 'score': 50, 'grade': 'B'}
This prints the grade for the score and notice when we view the attributes, the grade
has also been updated. Right from the start, we had a clear view of the initial attributes and how their values have changed.
But what about subject
? If I want to know how much Tim scored in Math and what was the grade, I can easily access the score
and the grade
as we've seen before but how do I know the subject? Since, the subject
variable is local to the scope of the results
method we could just return
the value of subject
. Change the return
statement in the results
method:
def results(self, subject=None):
#<---code--->
return self.grade, subject
Let's call results()
again. We get a tuple with the grade and subject as expected.
>>> x.results(subject='Math')
('B', 'Math')
To access the values in the tuple, let's assign them to variables. In Python, it is possible to assign values from a collection to multiple variables in the same expression, provided that the number of variables is equal to the length of the collection. Here, the length is just two, so we can have two variables to the left of the expression:
>>> grade, subject = x.results(subject='Math')
>>> subject
'Math'
So, there we have it, though it needed a few extra lines of code to get the subject
. It would be more intuitive to access all of them at once using just the dot operator to access the attributes with x.<attribute>
, but this is just an example and you could try it with subject
initialized in __init__
.
Next, consider there are many students(say 3) and we want the names, scores, grades for Math. Except the subject, all others must be some sort of a collection data type like a list
that can store all the names, scores and grades. We could just initialize like this:
>>> x = MyClass(name=['John', 'Tom', 'Sean'], score=[70, 55, 40])
>>> x.name
['John', 'Tom', 'Sean']
>>> x.score
[70, 55, 40]
This seems fine at first sight, but when you take a another look(or some other programmer) at the initialization of name
, score
and grade
in __init__
, there is no way to tell that they need a collection data type. The variables are also named singular making it more obvious that they could be just some random variables that may need just one value. The purpose of programmers should be to make the intent as clear as as possible, by way of descriptive variable naming, type declarations, code comments and so on. With this in mind, let's change the attribute declarations in __init__
. Before we settle for a well-behaved, well-defined declaration, we must take care of how we declare default arguments.
Edit: Problems with mutable default arguments:
Now, there are some 'gotchas' that we must be aware of while declaring default args. Consider the following declaration that initializes names
and appends a random name on object creation. Recall that lists are mutable objects in Python.
#Not recommended
class MyClass:
def __init__(self,names=[]):
self.names = names
self.names.append('Random_name')
Let's see what happens when we create objects from this class:
>>> x = MyClass()
>>> x.names
['Random_name']
>>> y = MyClass()
>>> y.names
['Random_name', 'Random_name']
The list continues to grow with every new object creation. The reason behind this is that the default values are always evaluated whenever __init__
is called. Calling __init__
multiple times, keeps using the same function object thus appending to the previous set of default values. You can verify this yourself as the id
remains the same for every object creation.
>>> id(x.names)
2513077313800
>>> id(y.names)
2513077313800
So, what is the correct way of defining default args while also being explicit about the data type the attribute supports? The safest option is to set default args to None
and initialize to an empty list when the arg values are None
. The following is a recommended way to declare default args:
#Recommended
>>> class MyClass:
... def __init__(self,names=None):
... self.names = names if names else []
... self.names.append('Random_name')
Let's examine the behavior:
>>> x = MyClass()
>>> x.names
['Random_name']
>>> y = MyClass()
>>> y.names
['Random_name']
Now, this behavior is what we are looking for. The object does not "carry over" old baggage and re-initializes to an empty list whenever no values are passed to names
. If we pass some valid names (as a list of course) to the names
arg for the y
object, Random_name
will simply be appended to this list. And again, the x
object values will not be affected:
>>> y = MyClass(names=['Viky','Sam'])
>>> y.names
['Viky', 'Sam', 'Random_name']
>>> x.names
['Random_name']
Perhaps, the most simplest explanation on this concept can also be found on the Effbot website. If you'd like to read some excellent answers: “Least Astonishment” and the Mutable Default Argument.
Based on the brief discussion on default args, our class declarations will be modified to:
class MyClass:
def __init__(self,names=None, scores=None):
self.names = names if names else []
self.scores = scores if scores else []
self.grades = []
#<---code------>
This makes more sense, all variables have plural names and initialized to empty lists on object creation. We get similar results as before:
>>> x.names
['John', 'Tom', 'Sean']
>>> x.grades
[]
grades
is an empty list making it clear that the grades will be computed for multiple students when results()
is called. Therefore, our results
method should also be modified. The comparisons that we make should now be between the score numbers(70, 50 etc.) and items in the self.scores
list and while it does that the self.grades
list should also be updated with the individual grades. Change the results
method to:
def results(self, subject=None):
#Grade calculator
for i in self.scores:
if i >= 70:
self.grades.append('A')
elif 50 <= i < 70:
self.grades.append('B')
else:
self.grades.append('C')
return self.grades, subject
We should now get the grades as a list when we call results()
:
>>> x.results(subject='Math')
>>> x.grades
['A', 'B', 'C']
>>> x.names
['John', 'Tom', 'Sean']
>>> x.scores
[70, 55, 40]
This looks good but imagine if the lists were large and to figure out who's score/grade belongs to whom would be an absolute nightmare. This is where it is important to initialize the attributes with the correct data type that can store all of these items in a way that they are easily accessible as well as clearly show their relationships. The best choice here is a dictionary.
We can have a dictionary with names and scores defined initially and the results
function should put together everything into a new dictionary that has all the scores, grades etc. We should also comment the code properly and explicitly define args in the method wherever possible. Lastly, we may not require self.grades
anymore in __init__
because as you will see the grades are not being appended to a list but explicitly assigned. This is totally dependent upon the requirements of the problem.
The final code:
class MyClass:
"""A class that computes the final results for students"""
def __init__(self,names_scores=None):
"""initialize student names and scores
:param names_scores: accepts key/value pairs of names/scores
E.g.: {'John': 70}"""
self.names_scores = names_scores if names_scores else {}
def results(self, _final_results={}, subject=None):
"""Assign grades and collect final results into a dictionary.
:param _final_results: an internal arg that will store the final results as dict.
This is just to give a meaningful variable name for the final results."""
self._final_results = _final_results
for key,value in self.names_scores.items():
if value >= 70:
self.names_scores[key] = [value,subject,'A']
elif 50 <= value < 70:
self.names_scores[key] = [value,subject,'B']
else:
self.names_scores[key] = [value,subject,'C']
self._final_results = self.names_scores #assign the values from the updated names_scores dict to _final_results
return self._final_results
Please note _final_results
is just an internal arg that stores the updated dict self.names_scores
. The purpose is to return a more meaningful variable from the function that clearly informs the intent. The _
in the beginning of this variable indicates that it is an internal variable, as per convention.
Lets give this a final run:
>>> x = MyClass(names_scores={'John':70, 'Tom':50, 'Sean':40})
>>> x.results(subject='Math')
{'John': [70, 'Math', 'A'],
'Tom': [50, 'Math', 'B'],
'Sean': [40, 'Math', 'C']}
This gives a much clearer view of the results for each student. It is now easy to access the grades/scores for any student:
>>> y = x.results(subject='Math')
>>> y['John']
[70, 'Math', 'A']
Conclusion:
While the final code needed some extra hard work but it was worth it. The output is more precise and gives clear information about each students' results. The code is more readable and clearly informs the reader about the intent of creating the class, methods, & variables. The following are the key takeaways from this discussion:
- The variables(attributes) that are expected to be shared amongst class methods, should be defined in
__init__
. In our example, names
, scores
and possibly subject
were required by results()
. These attributes could be shared by another method like say average
that computes the average of the scores.
- The attributes should be initialized with the appropriate data type. This should be decided before-hand before venturing into a class-based design for a problem.
- Care must be taken while declaring attributes with default args. Mutable default args can mutate the values of the attribute if the enclosing
__init__
is causing mutation of the attribute on every call. It is safest to declare default args as None
and re-initialize to an empty mutable collection later whenever the default value is None
.
- The attribute names should be unambiguous, follow PEP8 guidelines.
- Some variables should be initialized within the scope of the class method only. These could be, for example, internal variables that are required for computations or variables that don't need to be shared with other methods.
- Another compelling reason to define variables in
__init__
is to avoid possible AttributeError
s that may occur due to accessing unnamed/out-of-scope attributes. The __dict__
built-in method provides a view of the attributes initialized here.
While assigning values to attributes(positional args) on class instantiation, the attribute names should be explicitly defined. For instance:
x = MyClass('John', 70) #not explicit
x = MyClass(name='John', score=70) #explicit
Finally, the aim should be to communicate the intent as clearly as possible with comments. The class, its methods and attributes should be well commented. For all attributes, a short description alongwith an example, is quite useful for a new programmer who encounters your class and its attributes for the first time.
__init__
, even ifNone
initially. It makes it clear what the instance data attributes are, and preventsAttributeErrors
onself
when using the instance (though of course other exceptions are still possible). – Taconite__init__
, you know (a.) it's all there and (b.) it's been initialized in the most sensible place, where you'd look first. – Employment__init__
then this problem disappears. – SirreverenceNone
is the normal choice for such a placeholder, but a better design would be to avoid the placeholder if possible. – Sirreverence