Django multi-table inheritance alternatives for basic data model pattern
Asked Answered
T

1

17

tl;dr

Is there a simple alternative to multi-table inheritance for implementing the basic data-model pattern depicted below, in Django?

Premise

Please consider the very basic data-model pattern in the image below, based on e.g. Hay, 1996.

Simply put: Organizations and Persons are Parties, and all Parties have Addresses. A similar pattern may apply to many other situations.

The important point here is that the Address has an explicit relation with Party, rather than explicit relations with the individual sub-models Organization and Person.

diagram showing basic data model

Note that each sub-model introduces additional fields (not depicted here, but see code example below).

This specific example has several obvious shortcomings, but that is beside the point. For the sake of this discussion, suppose the pattern perfectly describes what we wish to achieve, so the only question that remains is how to implement the pattern in Django.

Implementation

The most obvious implementation, I believe, would use multi-table-inheritance:

class Party(models.Model):
    """ Note this is a concrete model, not an abstract one. """
    name = models.CharField(max_length=20)


class Organization(Party):
    """ 
    Note that a one-to-one relation 'party_ptr' is automatically added, 
    and this is used as the primary key (the actual table has no 'id' 
    column). The same holds for Person.
    """
    type = models.CharField(max_length=20)


class Person(Party):
    favorite_color = models.CharField(max_length=20)


class Address(models.Model):
    """ 
    Note that, because Party is a concrete model, rather than an abstract
    one, we can reference it directly in a foreign key.

    Since the Person and Organization models have one-to-one relations 
    with Party which act as primary key, we can conveniently create 
    Address objects setting either party=party_instance,
    party=organization_instance, or party=person_instance.

    """
    party = models.ForeignKey(to=Party, on_delete=models.CASCADE)

This seems to match the pattern perfectly. It almost makes me believe this is what multi-table-inheritance was intended for in the first place.

However, multi-table-inheritance appears to be frowned upon, especially from a performance point-of-view, although it depends on the application. Especially this scary, but ancient, post from one of Django's creators is quite discouraging:

In nearly every case, abstract inheritance is a better approach for the long term. I’ve seen more than few sites crushed under the load introduced by concrete inheritance, so I’d strongly suggest that Django users approach any use of concrete inheritance with a large dose of skepticism.

Despite this scary warning, I guess the main point in that post is the following observation regarding multi-table inheritance:

These joins tend to be "hidden" — they’re created automatically — and mean that what look like simple queries often aren’t.

Disambiguation: The above post refers to Django's "multi-table inheritance" as "concrete inheritance", which should not be confused with Concrete Table Inheritance on the database level. The latter actually corresponds better with Django's notion of inheritance using abstract base classes.

I guess this SO question nicely illustrates the "hidden joins" issue.

Alternatives

Abstract inheritance does not seem like a viable alternative to me, because we cannot set a foreign key to an abstract model, which makes sense, because it has no table. I guess this implies that we would need a foreign key for every "child" model plus some extra logic to simulate this.

Proxy inheritance does not seem like an option either, as the sub-models each introduce extra fields. EDIT: On second thought, proxy models could be an option if we use Single Table Inheritance on the database level, i.e. use a single table that includes all the fields from Party, Organization and Person.

GenericForeignKey relations may be an option in some specific cases, but to me they are the stuff of nightmares.

As another alternative, it is often suggested to use explicit one-to-one relations (eoto for short, here) instead of multi-table-inheritance (so Party, Person and Organization would all just be subclasses of models.Model).

Both approaches, multi-table-inheritance (mti) and explicit one-to-one relations (eoto), result in three database tables. So, depending on the type of query, of course, some form of JOIN is often inevitable when retrieving data.

By inspecting the resulting tables in the database, it becomes clear that the only difference between the mti and eoto approaches, on the database level, is that an eoto Person table has an id column as primary-key, and a separate foreign-key column to Party.id, whereas an mti Person table has no separate id column, but instead uses the foreign-key to Party.id as its primary-key.

Question(s)

I don't think the behavior from the example (especially the single direct relation to the parent) can be achieved with abstract inheritance, can it? If it can, then how would you achieve that?

Is an explicit one-to-one relation really that much better than multi-table-inheritance, except for the fact that it forces us to make our queries more explicit? To me the convenience and clarity of the multi-table approach outweighs the explicitness argument.

Note that this SO question is very similar, but does not quite answer my questions. Moreover, the latest answer there is almost nine years old now, and Django has changed a lot since.

[1]: Hay 1996, Data Model Patterns

Taffeta answered 21/12, 2018 at 11:36 Comment(7)
Do Organization and Person also need extra fields in their tables? Otherwise, I believe you could use Proxy Models instead and having just a single database table plus several classes to implement different business logic.Steddman
My suggestion is to think about this from a database design point of view and leave Django out of it. Conflating the two will lead to confusion. For example, the distinction you noted between mti and eoto isn't meaningful, it's just an artifact of Django's implementation. And abstract inheritance is just a code reuse pattern in Django, it doesn't have any meaning at the database level. Also, it will be much easier to find existing solutions to your (very common) problem if you look at databases broadly rather than searching for Django-specific answers.Surovy
@KevinChristopherHenry: You are absolutely right (+1), I guess I was (still am) indeed suffering from tunnel vision here. A quick search on SO, inspired by your comment, already yields some interesting discussions, e.g. 1, 2, 3. Still, we are tied to Django, and want to stick to "standard" Django as much as possible, so we cannot leave Django out of it altogether. ;-)Taffeta
I understand, I would just start with the database design and then figure out the corresponding Django code, rather than the other way around.Surovy
@CesarCanassa: Actually, you do have a good point, even if Organization an Person need extra fields: we could include all fields from Party, Organization, and Person in a single table on the database level (known as Single Table Inheritance), then use proxy models in Django for Organization and Person, where necessary.Taffeta
@KevinChristopherHenry: Thanks again. I followed you suggestion and tried to summarize my findings below. Any comments are much appreciated.Taffeta
related: https://mcmap.net/q/746344/-extending-models-in-django, https://mcmap.net/q/203044/-django-abstract-models-versus-regular-inheritanceTaffeta
T
11

While awaiting a better one, here's my attempt at an answer.

As suggested by Kevin Christopher Henry in the comments above, it makes sense to approach the problem from the database side. As my experience with database design is limited, I have to rely on others for this part.

Please correct me if I'm wrong at any point.

Data-model vs (Object-Oriented) Application vs (Relational) Database

A lot can be said about the object/relational mismatch, or, more accurately, the data-model/object/relational mismatch.

In the present context I guess it is important to note that a direct translation between data-model, object-oriented implementation (Django), and relational database implementation, is not always possible or even desirable. A nice three-way Venn-diagram could probably illustrate this.

Data-model level

To me, a data-model as illustrated in the original post represents an attempt to capture the essence of a real world information system. It should be sufficiently detailed and flexible to enable us to reach our goal. It does not prescribe implementation details, but may limit our options nonetheless.

In this case, the inheritance poses a challenge mostly on the database implementation level.

Relational database level

Some SO answers dealing with database implementations of (single) inheritance are:

These all more or less follow the patterns described in Martin Fowler's book Patterns of Application Architecture. Until a better answer comes along, I am inclined to trust these views. The inheritance section in chapter 3 (2011 edition) sums it up nicely:

For any inheritance structure there are basically three options. You can have one table for all the classes in the hierarchy: Single Table Inheritance (278) ...; one table for each concrete class: Concrete Table Inheritance (293) ...; or one table per class in the hierarchy: Class Table Inheritance (285) ...

and

The trade-offs are all between duplication of data structure and speed of access. ... There's no clearcut winner here. ... My first choice tends to be Single Table Inheritance ...

A summary of patterns from the book is found on martinfowler.com.

Application level

Django's object-relational mapping (ORM) API allows us to implement these three approaches, although the mapping is not strictly one-to-one.

The Django Model inheritance docs distinguish three "styles of inheritance", based on the type of model class used (concrete, abstract, proxy):

  1. abstract parent with concrete children (abstract base classes): The parent class has no database table. Instead each child class has its own database table with its own fields and duplicates of the parent fields. This sounds a lot like Concrete Table Inheritance in the database.

  2. concrete parent with concrete children (multi-table inheritance): The parent class has a database table with its own fields, and each child class has its own table with its own fields and a foreign-key (as primary-key) to the parent table. This looks like Class Table Inheritance in the database.

  3. concrete parent with proxy children (proxy models): The parent class has a database table, but the children do not. Instead, the child classes interact directly with the parent table. Now, if we add all the fields from the children (as defined in our data-model) to the parent class, this could be interpreted as an implementation of Single Table Inheritance. The proxy models provide a convenient way of dealing with the application side of the single large database table.

Conclusion

It seems to me that, for the present example, the combination of Single Table Inheritance with Django's proxy models may be a good solution that does not have the disadvantages of "hidden" joins.

Applied to the example from the original post, it would look something like this:

class Party(models.Model):
    """ All the fields from the hierarchy are on this class """
    name = models.CharField(max_length=20)
    type = models.CharField(max_length=20)
    favorite_color = models.CharField(max_length=20)


class Organization(Party):
    class Meta:
        """ A proxy has no database table (it uses the parent's table) """
        proxy = True

    def __str__(self):
        """ We can do subclass-specific stuff on the proxies """
        return '{} is a {}'.format(self.name, self.type)


class Person(Party):
    class Meta:
        proxy = True

    def __str__(self):
        return '{} likes {}'.format(self.name, self.favorite_color)


class Address(models.Model):
    """ 
    As required, we can link to Party, but we can set the field using
    either party=person_instance, party=organization_instance, 
    or party=party_instance
    """
    party = models.ForeignKey(to=Party, on_delete=models.CASCADE)

One caveat, from the Django proxy-model documentation:

There is no way to have Django return, say, a MyPerson object whenever you query for Person objects. A queryset for Person objects will return those types of objects.

A potential workaround is presented here.

Taffeta answered 22/12, 2018 at 15:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.