Polymorphic relationships vs separate tables per type
Asked Answered
H

3

25

I am working on a database which has some types (e.g. User, Appointment, Task etc.) which can have zero or more Notes associated with each type.

The possible solutions I have come across for implementing these relationships are:

  1. Polymorphic relationship
  2. Separate table per type

Polymorphic Relationship

Suggested by many as being the easiest solution to implement and seemingly the most common implementation for frameworks that follow the Active Record pattern, I would add a table whose data is morphable:

notable

My notable_type would allow me to distinguish between the type (User, Appointment, Task) the Note relates to, whilst the notable_id would allow me to obtain the individual type record in the related type table.

PROS:

  • Easy to scale, more models can be easily associated with the polymorphic class
  • Limits table bloat
  • Results in one class that can be used by many other classes (DRY)

CONS

  • More types can make querying more difficult and expensive as the data grows
  • Cannot have a foreign key
  • Lack of data consistency

Separate Table per Type

Alternatively I could create a table for each type which is responsible for the Notes associated with that type only. The type_id foreign key would allow me to quickly obtain the individual type record.

user notes

Deemed by many online as a code smell, many articles advocate avoiding the polymorphic relationship in favour of an alternative (here and here for example).

PROS:

  • Allows us to use foreign keys effectively
  • Efficient data querying
  • Maintains data consistency

CONS:

  • Increases table bloat as each type requires a separate table
  • Results in multiple classes, each representing the separate type_notes table

Thoughts

The polymorphic relationship is certainly the simpler of the two options to implement, but the lack of foreign key constraints and therefore potential for consistency issues feels wrong.

A table per notes relationship (user_notes, task_notes etc.) with foreign keys seems the correct way (in keeping with design patterns) but could result in a lot of tables (addition of other types that can have notes or addition of types similar to notes [e.g. events]).

It feels like my choice is either simplified table structure but forgo foreign keys and increased query overhead, or increase the number of tables with the same structure but simplify queries and allow for foreign keys.

Given my scenario which of the above would be more appropriate, or is there an alternative I should consider?

Hobbledehoy answered 23/6, 2019 at 19:8 Comment(0)
S
39

What is "table bloat"? Are you concerned about having too many tables? Many real-world databases I've worked on have between 100 and 200 tables, because that's what it takes.

If you're concerned with adding multiple tables, then why do you have separate tables for User, Appointment, and Task? If you had a multi-valued attribute for User, for example for multiple phone numbers per user, would you create a separate table for phones, or would you try to combine them all into the user table somehow? Or have a polymorphic "things that belong to other things" table for user phones, appointment invitees, and task milestones?

Answer: No, you'd create a Phone table, and use it to reference only the User table. If Appointments have invitees, that gets its own table (probably a many-to-many between appointments and users). If tasks have milestones, that gets its own table too.

The correct thing to do is to model your database tables like you would model object types in your application. You might like to read a book like SQL and Relational Theory: How to Write Accurate SQL Code 3rd Edition by C. J. Date to learn more about how tables are analogous to types.

You already know instinctively that the fact that you can't create a foreign key is a red flag. A foreign key must reference exactly one parent table. This should be a clue that it's not valid relational database design to make a polymorphic foreign key. Once you start thinking of tables and their attributes as concrete types (like described in SQL and Relational Theory), this will become obvious.

If you must create one notes table, you could make it reference one table called "Notable" which is like a superclass of User, Appointment, and Task. Then each of those three tables would also reference a primary key of Notable. This mimics the object-oriented structure of polymorphism, where you can have a class Note have a reference to an object by its superclass type.

But IMHO, that's more complex than it needs to be. I would just create separate tables for UserNotes, AppointmentNotes, and TaskNotes. I'm not troubled by having three more tables, and it makes your code more clear and maintainable.

Saxtuba answered 23/6, 2019 at 19:48 Comment(2)
I suppose table bloat could be unnecessary/excessive tables? I've usually created separate tables in the past, but I've not been in this situation before where I've required a shared type so my "concern" was creating multiple tables which are structured the same, hold the same data but related to separate parent types. This feels like its breaking a normalization rule. I will definitely give that book a read, thanks for the recommendation.Hobbledehoy
If we follow the one note table path we also need a generic object table on top of notable table which is persist the source of id for all table, because maybe commentable is one facet of an object like post. Maybe it has some other facets like ratable ... Do you suggest this approach or you think this is too much complicated? (This cost's 1 more "one to one join" for each query and 2 more write with foreign key index during writes)Pickford
C
1

I think you should think about these two things, before you can make a decision.

  1. Performance. a lot of reads, a lot of writes ? Test which is better.
  2. Growth of your model. Can it easily be expanded ?
Coeval answered 23/6, 2019 at 19:17 Comment(0)
R
0

What you're running into here is an example of interface polymorphism, in which you want User, Appointment, and Task to all exhibit the behaviour of owning notes without sharing a common supertype. This is very distinct from inheritance polymorphism, in which the behaviour would be inherited from a common supertype, as described towards the end of Bill Karwin's answer:

If you must create one notes table, you could make it reference one table called "Notable" which is like a superclass of User, Appointment, and Task. Then each of those three tables would also reference a primary key of Notable. This mimics the object-oriented structure of polymorphism, where you can have a class Note have a reference to an object by its superclass type.

The problem with that approach is that types cannot have more than one supertype, so you will run into problems if you want any of your notables to have a proper supertype.

In OOP, this would be solved by having the three notable types implement a Notable interface. There are a number of databases that natively model inheritance structures, including some SQL dialects, but there are very few that natively model independent interface implementations. The class of databases that do are called polymorphic databases.

If you wanted to implement this in TypeDB, a polymorphic database, you could use the following schema:

define

User sub entity,
    owns email @key,
    plays NoteAttachment:Noteable;
Appointment sub entity,
    owns reference @key,
    plays NoteAttachment:Noteable;
Task sub entity,
    owns reference @key,
    plays NoteAttachment:Noteable;

Note sub entity,
    owns reference @key,
    owns body,
    owns created_at,
    owns updated_at;

NoteAttachment sub relation,
    relates Note,
    relates Noteable;

id sub attribute, value string;
email sub id;
reference sub id;
body sub attribute, value string;
created_at sub attribute, value datetime;
updated_at sub attribute, value datetime;

This schema would allow instances of User, Appointment, and Task to own notes. For example, to insert a user and a note:

insert
$kevin isa User,
    has email "[email protected]";
$note isa Note,
    has reference "note001",
    has body "Hello world!",
    has created_at 2023-10-25T18:45:00,
    has updated_at 2023-10-25T18:45:00;
(Note: $note, Noteable: $kevin) isa NoteAttachment;

To match an existing appointment and insert an attached note:

match
$appt isa Appointment,
    has reference "appt001";
insert
$note isa Note,
    has reference "note002",
    has body "Hello again!",
    has created_at 2023-10-25T18:48:00,
    has updated_at 2023-10-25T18:48:00;
(Note: $note, Noteable: $appt) isa NoteAttachment;

You could then query these notes polymorphically. For example, to retrieve any notes belonging to Kevin, you can use:

match
$kevin isa User, has email "[email protected];
$note isa Note;
(Note: $note, Noteable: $kevin) isa NoteAttachment;
fetch
$note: body;

To retrieve all tasks and attached notes, use:

match
$task isa Task;
$note isa Note;
(Note: $note, Noteable: $task) isa NoteAttachment;
fetch
$task: reference;
$note: body;

To retrieve all notes and the IDs of the Noteable, use:

match
$note isa Note;
(Note: $note, Noteable: $noteable) isa NoteAttachment;
# Here we do not specify the type of the notable
fetch
$noteable: id;
$note: body;

Disclaimer: I have worked on TypeDB.

Rockandroll answered 25/10, 2023 at 18:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.