How do you effectively model inheritance in a database?
Asked Answered
S

10

182

What are the best practices for modeling inheritance in databases?

What are the trade-offs (e.g. queriability)?

(I'm most interested in SQL Server and .NET, but I also want to understand how other platforms address this issue.)

Submultiple answered 10/10, 2008 at 6:3 Comment(5)
If you are interested in "best practice", most of the answers are simply incorrect. Best practice dictates that the RDb and the app are independent; they have completely different design criteria. Therefore "modelling inheritance" in a database (or modelling the RDb to suit a single app or app language) is a very bad practice, uninformed, and breaks basic RDb design rules, and cripples it.Shortsighted
possible duplicate of Something like inheritance in database designLithomarge
@Shortsighted So what is your suggestion to avoid inheritance in DB model? Let's say we have 50 different type of teachers, and that we want to connect that particular teacher with class. How would you achieve that without having inheritance?Scalariform
@svlada. That is straight-forward to implement in a RDb, so "inheritance" required. Ask a question, include the table defns and an example, and I will answer it in detail. If you do it in OO terms, it will be a royal mess.Shortsighted
Possible duplicate of How can you represent inheritance in a database?Claim
T
211

There are several ways to model inheritance in a database. Which you choose depends on your needs. Here are a few options:

Table-Per-Type (TPT)

Each class has its own table. The base class has all the base class elements in it, and each class which derives from it has its own table, with a primary key which is also a foreign key to the base class table; the derived table's class contains only the different elements.

So for example:

class Person {
    public int ID;
    public string FirstName;
    public string LastName;
}

class Employee : Person {
    public DateTime StartDate;
}

Would result in tables like:

table Person
------------
int id (PK)
string firstname
string lastname

table Employee
--------------
int id (PK, FK)
datetime startdate

Table-Per-Hierarchy (TPH)

There is a single table which represents all the inheritance hierarchy, which means several of the columns will probably be sparse. A discriminator column is added which tells the system what type of row this is.

Given the classes above, you end up with this table:

table Person
------------
int id (PK)
int rowtype (0 = "Person", 1 = "Employee")
string firstname
string lastname
datetime startdate

For any rows which are rowtype 0 (Person), the startdate will always be null.

Table-Per-Concrete (TPC)

Each class has its own fully formed table with no references off to any other tables.

Given the classes above, you end up with these tables:

table Person
------------
int id (PK)
string firstname
string lastname

table Employee
--------------
int id (PK)
string firstname
string lastname
datetime startdate
Tails answered 10/10, 2008 at 6:11 Comment(18)
'Which you choose depends on your needs' - please elaborate, as I think the reasons for choices form the core of the question.Clarineclarinet
As with most things in a database, you're trading off storage cost vs. performance. TPH stores everything in one table, so selects are fast. TPT is more compact than TPC, but more expensive because of the joins. Finding things in TPC means checking multiple tables. Each of these is a tradeoff, and your decision has to be based on what's most important to your application.Tails
See my comment on the question. Using funny new names for Rdb technical terms that have existed leads to confusion. "TPT" is supertype-subtype. "TPH" is Unnormalised, a gross error. "TPH" is even less Normalised, another gross error.Shortsighted
Only a DBA would presume that denormalization is always an error. :)Tails
@PerformanceDBA, why is denormalisation always an error? I really dislike denormalisation, but I'm not experienced with DBs, so I don't have the experience to really know.Pinette
While I will concede that denormalization results in performance gains in some cases this is entirely due to an incomplete (or nonexistent) separation between the logical and physical structure of data in the DBMS. Unfortunately the majority of commercial DBMS suffer from this problem. @Shortsighted is correct. Undernormalization is a judgement error, sacrificing data consistency for speed. Sadly, it's a choice a DBA or dev would never need to make if the DBMS were designed properly. For the record I am not a DBA.Indeterminate
No matter which appoach you choose, you can also add views to simplify it. Let's say that you choose to have only one table (TPH) but at the same time you want to be able to show only one type. In that case, you can create one view per type. The good thing about TPH is that you don't need to join tables and that you can order the results easily. The bad thing is that you have wide tables with multiples NULLs. You can check that depending on the type of entity, some columns shouldn't be null using check constraints (msdn.microsoft.com/en-us/library/ms179491%28v=sql.105%29.aspx).Liston
@Brad Wilson. Only a developer would denormalise, "for performance", or otherwise. Often, it isn't de-normalisation, the truth is it is unnormalised. That de-Normalisation or unnormalised is an error, is a fact, supported by theory, and experienced by millions, it is not a "presumption".Shortsighted
I think you should also add the possibility to use booleans to diferentiate between the inherited classes, especially when there's a chance of overlap between roles. For instance a Person can be a Customer and also an Employee, etc... So variables such as isCustomer and isEmployee are very handful, when adopting 'one table for all' approach, to create the views for instance. I think this reference is quite rich in this matter (agiledata.org/essays/mappingObjects.html#MappingInheritance).Sisyphus
@FelipeCosta this doesn't scale well. Each time there's a new Person type - you have to add a column. Also it becomes harder to index the table.Kirshbaum
@FelipeCosta I think the right way to do that, would be to have a "Customers" table and "Employee" table, which reference a PersonID.Lumbar
@Lumbar But then, in the case I have multiple types of user I would need multiple tables, so the comment above yours about how well the solution scales apply to your solution as well, but now instead of creating a single column I would need to create an entire table.Sisyphus
Yea, so? Entity framework for example creates all these tables automatically. Although I do agree, something smells if there are so many tables to do with one person. In this case, I think it depends on the objective. So you have 100 types of people, what is that defined upon? What are you trying to achieve. You MIGHT be trying to talk about Roles in which case, it's already a thing. We just have a table with the UserRoles Which defines what a user can do. Does that sound about right?Lumbar
User/Person has many Roles (UserRoles). Roles are named ("Employee", "Customer", etc) Roles have many Permissions Permissions allow the User to do specific things in the systemLumbar
@EvaldasBuinauskas It's not a bad solution at all. It will be very performant, and how hard is it to add a column - really. What are we trying to do here, optimize developer work time?Lumbar
@Shortsighted So according to your comment, the only valid solution among the 3 exposed here would be the first one, with a base/parent table, and child table for subclasses. Well that seems legit if you want to factor data shared by different child classes, but it doesn't help at all if you don't what what the child class is. Given a reference to the base table, how do you determine which child table is the associated one ?Isosceles
@Isosceles 1) The only valid [permanent] solution is Open Architecture standards, such that the RDB is totally independent of the [several] apps. This is science that has not changed since Codd. 2) But this Q & all the As are stuck in OO/ORM land, with the Record Filing System (not an RDB) that supposedly provides persistence for the almighty Object. And worse, now you worry about how to implement OO/ORM concepts [eg. inheritance] in a RFS, instead of the DB concept in an RDB, that simply does not have the problem. In this primitive universe, TPT is the best of the bad lot.Shortsighted
3) Jeffrey's answer is correct, upvoted. 4) how do you determine That is an SQL question, and an easy one, nothing to do with [RDB concept] Basetype-Subtype or [OO/ORM concept incorrectly deployed] TPT. (a) The particular SQL required depends on the context (need) which may have CASE etc, and (b) note that the Non-exclusive Basetype has more than 1 child. The generic answer is, JOIN the child file to the base file, or IF EXISTS ( SELECT 1 FROM child WHERE pk = base.pk )Shortsighted
I
184

Proper database design is nothing like proper object design.

If you are planning to use the database for anything other than simply serializing your objects (such as reports, querying, multi-application use, business intelligence, etc.) then I do not recommend any kind of a simple mapping from objects to tables.

Many people think of a row in a database table as an entity (I spent many years thinking in those terms), but a row is not an entity. It is a proposition. A database relation (i.e., table) represents some statement of fact about the world. The presence of the row indicates the fact is true (and conversely, its absence indicates the fact is false).

With this understanding, you can see that a single type in an object-oriented program may be stored across a dozen different relations. And a variety of types (united by inheritance, association, aggregation, or completely unaffiliated) may be partially stored in a single relation.

It is best to ask yourself, what facts do you want to store, what questions are you going to want answers to, what reports do you want to generate.

Once the proper DB design is created, then it is a simple matter to create queries/views that allow you to serialize your objects to those relations.

Example:

In a hotel booking system, you may need to store the fact that Jane Doe has a reservation for a room at the Seaview Inn for April 10-12. Is that an attribute of the customer entity? Is it an attribute of the hotel entity? Is it a reservation entity with properties that include customer and hotel? It could be any or all of those things in an object oriented system. In a database, it is none of those things. It is simply a bare fact.

To see the difference, consider the following two queries. (1) How many hotel reservations does Jane Doe have for next year? (2) How many rooms are booked for April 10 at the Seaview Inn?

In an object-oriented system, query (1) is an attribute of the customer entity, and query (2) is an attribute of the hotel entity. Those are the objects that would expose those properties in their APIs. (Though, obviously the internal mechanisms by which those values are obtained may involve references to other objects.)

In a relational database system, both queries would examine the reservation relation to get their numbers, and conceptually there is no need to bother with any other "entity".

Thus, it is by attempting to store facts about the world—rather than attempting to store entities with attributes—that a proper relational database is constructed. And once it is properly designed, then useful queries that were undreamt of during the design phase can be easily constructed, since all the facts needed to fulfill those queries are in their proper places.

Infectious answered 10/10, 2008 at 22:18 Comment(5)
Interesting answer. How would you suggest modelling the Person-Employee example in the accepted answer?Erotomania
@sevenforce-The DB design really depends on the requirements of the system, which are not given. There isn't nearly enough information provide to decide. In many cases something similar to the "table-per-type" design may be appropriate, if not slavishly followed. For example, start-date is probably a good property for an Employee object to have, but in the database it should really be a field in the Employment table, since a person could be hired multiple times with multiple start dates. This doesn't matter for the objects (which would use the most recent), but it is important in the database.Infectious
Sure, my question was mainly about the way to model inheritance. Sorry for not been clear enough. Thanks. As you mentioned, there most likely should be an Employment table , which collects all employments with their start-dates. So if knowing the current employment start-date of an Employer is important, that could be a proper use case for a View , which includes that property by querying? (note: seems because of the '-' right after my nick I didn't got any notification on your comment)Erotomania
I don't disagree with this answer because the author is just rewording entities into "prepositions". But the examples on OOP seem ill-designed to me. I realize that this is ten years old, but even back then, making DB queries properties of data objects would be considered ridiculous.Ciaracibber
While I agree with many things you are saying, the statement, "Proper database design is nothing like proper object design," just is not true. There are many commonalities between proper database design and proper object design. I realize you are being hyperbolic to make a point. In the modern ERA of ORM's many people use databases to just crap objects into. This is devalues the data and makes it less sound. When I first learned database, many years ago, we learned the normal forms, which I think are still very pertinent and a good litmus test when evaluating a data model.Murky
C
12

TPT, TPH and TPC patterns are the ways you go, as mentioned by Brad Wilson. But couple of notes:

  • child classes inheriting from a base class can be seen as weak-entities to the base class definition in the database, meaning they are dependent to their base-class and cannot exist without it. I've seen number of times, that unique IDs are stored for each and every child table while also keeping the FK to the parent table. One FK is just enough and its even better to have on-delete cascade enable for the FK-relation between the child and base tables.

  • In TPT, by only seeing the base table records, you're not able to find which child class the record is representing. This is sometimes needed, when you want to load a list of all records (without doing select on each and every child table). One way to handle this, is to have one column representing the type of the child class (similar to the rowType field in the TPH), so mixing the TPT and TPH somehow.

Say we want to design a database that holds the following shape class diagram:

public class Shape {
int id;
Color color;
Thickness thickness;
//other fields
}

public class Rectangle : Shape {
Point topLeft;
Point bottomRight;
}

public class Circle : Shape {
Point center;
int radius;
}

The database design for the above classes can be like this:

table Shape
-----------
int id; (PK)
int color;
int thichkness;
int rowType; (0 = Rectangle, 1 = Circle, 2 = ...)

table Rectangle
----------
int ShapeID; (FK on delete cascade)
int topLeftX;
int topLeftY;
int bottomRightX;
int bottomRightY;

table Circle
----------
int ShapeID; (FK on delete cascade)  
int centerX;
int center;
int radius;
Chuckwalla answered 22/8, 2014 at 17:37 Comment(0)
D
11

Short answer: you don't.

If you need to serialize your objects, use an ORM, or even better something like activerecord or prevaylence.

If you need to store data, store it in a relational manner (being careful about what you are storing, and paying attention to what Jeffrey L Whitledge just said), not one affected by your object design.

Diageotropism answered 10/10, 2008 at 22:26 Comment(0)
M
5

There are two main types of inheritance you can setup in a DB, table per entity and table per Hierarchy.

Table per entity is where you have a base entity table that has shared properties of all child classes. You then have per child class another table each with only properties applicable to that class. They are linked 1:1 by their PK's

alt text

Table per hierarchy is where all classes shared a table, and optional properties are nullable. Their is also a discriminator field which is a number that denotes the type that the record currently holds

alt text SessionTypeID is discriminator

Target per hierarchy is faster to query for as you do not need joins(only the discriminator value), whereas target per entity you need to do complex joins in order to detect what type something is as well as retreiuve all its data..

Edit: The images I show here are screen shots of a project I am working on. The Asset image is not complete, hence the emptyness of it, but it was mainly to show how its setup, not what to put inside your tables. That is up to you ;). The session table holds Virtual collaboration session information, and can be of several types of sessions depending on what type of collaboration is involved.

Meek answered 10/10, 2008 at 6:8 Comment(3)
I would also consider Target per Concrete class to not really model inheritance well and so i did not show.Meek
Could you add a reference where the illustration is from?Pythagoras
Where are the images you are talking about at the end of your answer?Inrush
A
2

With SQL databases, I would tend to agree with Jeffrey L Whitledge, luckily with have today new kind of paradigm allowing us to model database as we would do with object design.

TypeDB implements inheritance through the use of subtyping. In TypeDB, all types (e.g. entities, relations, and attributes) can inherit properties from their parent type.

define

platform sub attribute, value string;
id sub attribute, value string;
email sub id;
name sub id;

user sub entity,
    owns name,
    plays mentorship:trainee;
trainer sub user,
    owns email,
    plays mentorship:mentor;

mentorship sub relation,
    relates mentor,
    relates trainee;
online-mentorship sub mentorship,
    owns platform;

As you can see, the sub keyword lets us declare a new subtype. You can subtype either from one of the three root types (attribute, entity or relation) or from an already declared type.

In the previous example, both email and name are subtypes of id (itself a subtype of attribute). The same goes with trainer inheriting the user entity type and doing so, also owning the name attribute and playing the trainee role in the mentorship relation (both declared for user) on top of the email attribute and mentor role explicitly declared.

Finally, we are declaring a mentorship relation between a mentor and a trainee. We are also creating an online-mentorship inheriting the two roleplayers of mentorship but owning an attribute platform.

As you can see TypeDB lets you define inheritance in a natural way, as you would do in an Object-Oriented language. And you can retrieve data with the same ease.

# This retrieves the name of all users or trainers
match $user-or-trainer isa user,
   has name $name;

# This retrieves the name of all trainers
match $trainer isa trainer,
   has name $name;

# This retrieves all name and/or email from all users and/or trainers
match $user-or-trainer isa user,
   has id $name-or-email;

# This retrieves all users and trainers in a mentorship or online-mentorship relationship
match $allRelations (trainee: $user-trainer, mentor: $trainer) isa mentorship;

Disclaimer: TypeDB researcher here.

Aquilegia answered 10/10, 2023 at 10:53 Comment(0)
H
1

You would normalize of your database and that would actually mirror your inheritance. It might have performance degradance, but that's how it is with normalizing. You probably will have to use good common sense to find the balance.

Haller answered 10/10, 2008 at 6:6 Comment(3)
why do people believe that normalizing a database degrades performance? do people also think that the DRY principle degrades code performance? where does this misperception come from?Prue
Possibly because denormalising can improve performance, hence normalising degrades it, relatively speaking. Can't say I agree with it, but that's probably how it came about.Horal
At the start, normalisation might have a small effect on performance, but over time, as the number of rows increases, efficient JOINs will begin to outperform the bulkier tables. Of course, normalisation has other, greater benefits - consistency and lack of redundancy, etc.Rebeccarebecka
P
1

repeat of similar thread answer

in O-R mapping, inheritance maps to a parent table where the parent and child tables use the same identifier

for example

create table Object (
    Id int NOT NULL --primary key, auto-increment
    Name varchar(32)
)
create table SubObject (
    Id int NOT NULL  --primary key and also foreign key to Object
    Description varchar(32)
)

SubObject has a foreign-key relationship to Object. when you create a SubObject row, you must first create an Object row and use the Id in both rows

EDIT: if you're looking to model behavior also, you would need a Type table that listed the inheritance relationships between tables, and specified the assembly and class name that implemented each table's behavior

seems like overkill, but that all depends on what you want to use it for!

Prue answered 10/10, 2008 at 6:13 Comment(1)
That discussion ended up being about adding a couple columns to every table, not about modeling inheritance. I think the title of that discussion should be changed to better reflect the nature of the question and discussion.Submultiple
F
1

Using SQL ALchemy (Python ORM), you can do two types of inheritance.

The one I've had experience is using a singe-table, and having a discriminant column. For instances, a Sheep database (no joke!) stored all Sheep in the one table, and Rams and Ewes were handled using a gender column in that table.

Thus, you can query for all Sheep, and get all Sheep. Or you can query by Ram only, and it will only get Rams. You can also do things like have a relation that can only be a Ram (ie, the Sire of a Sheep), and so on.

Frogman answered 10/10, 2008 at 8:28 Comment(0)
H
1

Note that some database engines already provides inheritance mechanisms natively like Postgres. Look at the documentation.

For an example, you would query the Person/Employee system described in a response above like this:

  /* This shows the first name of all persons or employees */
  SELECT firstname FROM Person ; 

  /* This shows the start date of all employees only */
  SELECT startdate FROM Employee ;

In that is your database's choice, you don't need to be particularly smart !

Herrah answered 10/10, 2008 at 9:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.