Data Modeling: Supertype / Subtype

Asked 21/1, 2011 at 19:51 Answered 22/1, 2011 at 7:33

Solved database database-design relational-database database-schema datamodel

Looking to figure out the proper way to model the below requirements.

There are 3 types of “parties” to be concerned with, a Fan, a Band, and a BandMember.
That BandMember will always be associated with a Band and can also be a Fan of any band.
There are common attributes between a Fan, a Band, and a BandMember, but each of these 3 will also have their own unique attributes.
A Fan can be a fan of of any Band or none at all

This is a small part of a bigger thought but it is creating confusion in expanding the model. I believe it would have to be diagram 2 or some other option since I don't see how a BandMember can be associated with a Band in the first model.

I appreciate any input.

alt text

Pekan answered 21/1, 2011 at 19:51 Comment(0)

Caveat

First a couple of caveats for understanding the limitations. All the data being used or stored needs to be considered/modelled together. Eg. you have found that out anyway in your "creating confusion in expanding the model". From my side, not knowing how the Parties (subtypes) are related to other entities, limits me from provided a totally correct answer (that will not change).

Since you are providing the data in two tranches, therefore the answer will be in two tranches, and the second tranche will require changes to the first model. Not a complaint, just advising you beforehand, as that could be avoided if I saw all the data up front.
It is really great that you appreciate the need for (a) modelling the data and (b) going through the science (documented for over 30 years) of Conceptual, Logical, then physical. That is exactly what I do. You cannot imagine the time and effort that is saved by the formal process.
- However, that does not come across in my answers at SO, because it is a "question and answer" site, I have to answer at the level of the questioner. (I answer questions more fully than others, and even that, causes negative commentary !). Be assured that I go through the formal sequence.
It must be mentioned that Relation Modelling is the work of giants in the industry, Dr E F Codd and R Brown in the 1980's. The Methodology is based on their work. IDEF1X became a NIST Standard in 1993. When I answer Data Modelling questions, I am not supplying some personal method that I wrote a book about, I stand on the shoulders of giants.
- I adhere to the Relational Model by Dr E F Codd.
- I reject the work of Date; Darwen; Fagin; etc, because it contradicts the Relational Model, it is anti-Relational.
- ERD (P Chen) is pre-relational, pre-IDEF1X, and primitive in comparison. It has not concept of the Identifier, it cannot handle a Relational Key. It is staggering that ERD is still being taught as "relational", and IDEF1X is suppressed. For 30 years.

Data Modelling

Yes, a Supertype-Subtype structure is correct here. Unfortunately it is not common and therefore not commonly understood.

Subtypes existed long before the Relational Model, and continues to exist. IDEF1X, the only Standard for Relational Data Modelling, has specific symbols for it. ERD has nothing.

Separately, even where Subtypes are implemented, they are implemented in very limited fashion. Subtypes have the effect of changing the Role of the Supertype. The correct implementation of that is very rare indeed, I have not seen this anywhere (except of course my own database implementations for customers, and a few high-end suppliers).
Point is, that may look "complex" but it isn't, it is actually very simple.

This is where Ken Downs and Chris Behrens confuse modelled simplicity (highly extensible) with unmodelled implementation (incorrect and un-extensible), due to the simplistic approach advised by dwarves such as Martin Fowler. No offence, I understand that people are attached to, and will defend, what they know, however limited as that may be.
- notice that each Subtype is also a perfectly valid Entity (Table in the Physical, when we get to that stage) in its own right, and can stand on its own.
- for the lower levels or transaction or function tables, which have Relations to these Subtypes, the trick is to use the correct Subtype (Role). The common mistake is they use Party, and then the meaning of the Subtype or Role, and the correct Referential Integrity is lost.
- separately all the RoleNames are derived from Party, but that is not a valid reason to useParty instead of the correct Role.
- Here you understand the data really well, but (no one has taught you this and) you have confused Roles and Subtypes.
- BandMember and Fan are not Parties. They are Persons, first (and Person is a Party, second)
In order to provide clarity re those points, at this Conceptual level, we need to work with Entities and Identifiers (not Attributes), rather than just Entities. Therefore I have provided that as well.
- It appears you have ERwin (the best!); it allows you to view the single model at that level very conveniently. Do implement the Identifiers in the Entities, even at this abstract level.

Obstacle

I point these out before presenting the model, because you appear to be seriously interested in learning IDEF1X, the Standard methodolgy for modelling Relational Databases, with a view to easing your models in the future. SO or any website, is not a good medium for formal interactive education, but we will give it our best shot.

In model (1), Band cannot be Independent (square corners): since it is identified as being Dependent on Party; it is a Subtype of Party; and it has the same Identifier.
The missing Cardinality is critical. Putting it in will actually assist in resolving the model. I am not fussed re IDEF1X (circles) vs IEEE (crows feet), but I always put them in as I go, and keep changing them as the model progresses.
- your model does not show that a Band is made up of one-to-many Members. Et cetera.
  .
  While programming can progress incrementally (once the definition is stable)), modelling does not. Eg. you can't model the Entities but not the Relations; the Relations but not the Cardinality. That is why they are different sciences, programmers do not make good modellers, and vice versa.
  .
At this stage the Rules are also very important. Modelling is, in fact, modelling the Rules. Therefore correcting or modulating the Rules is part of the Modelling process.
- A Fan can be a fan of of any Band or none at all is not reasonable. If a Person is none at all then they are a member of the general public, and they have no Relation to any Band. An ordinary Person.
- A Fan has a Relation to at least one Band. In fact, having a Relation to a Band is what takes a Person out of that realm and causes storage of Fan details or specific fan-of-band details.
- If there are is such an Entity as Fan with no Band (ie. you are storing details of that, separate to Fan as per my model), please advise, and I will change the model (paper is cheap!).
Verb Phrases are also important at this stage; no less than my point re Rules and Cardinality above, it is a part of the modelling process, and it needs change/modulation as the model progresses. You will not believe how important it is to gett the Verb Phrases right. Putting them in may well have assisted you in clarifying Subtypes vs Roles. Here is a definition that every Data Modeller knows by heart.
- The Entities are the Nouns in the model
- The Relations are the Verbs, the actions that take place between the Nouns
- The Verb Phrases define those actions (that's why they are accurately called Verb Phrases, it is not a funny name).

As described in the IDEF1X Notation document, for Associative tables, read the Verb Phrase "through" them, to the parent on the other side of the association.

A Person makes one-to-many Bands, and is thus a Member
A Band is made up of one-to-many People, who are Members
A Person patronises a Band, which makes them a Fan (not merely a Person who has a row in the Fan table)
A Band depends on People, who are Fans

Coming up with the shortest, most meaningful, Verb Phrase; not using simplistic words ("comprises" is to be avoided"), is a challenge for Modellers. Feel free to ameliorate the Verb phrases I supplied.

Here is your Party Data Model at the Entity and Key levels in IDEF1X.

Readers who are unfamiliar with the Standard for Modelling Relational Databases may find my IDEF1X Notation useful.

Note

All I have done is resolve the Subtypes; Roles; the Cardinality of the Relations, as identified above.

The relevance of the Subtypes vs Roles will be more clear to you when you evaluate your second tranche or your transaction Entities (as you have stated, here we have only Identifying Entities).
Identifiers. This is worth spelling out, not only for the purpose of clarifying the model, also because it is a good example of IDEF1X Identifiers being used, and the power they deploy. You have already indicated in your model that you understand that (solid lines), I am merely giving it the full treatment.
- Person and Band are Subtypes of Party. They are also Roles of Party. Therefore we use PersonId and BandId from that point downward, not PartyId (even though it is PartyId).
- When a Person plays the Role of Member, we use MemberId (which is PersonId, which is PartyId).
- When a Person plays the Role of Fan, we use FanId (which is PersonId, which is PartyId).

Let's say you were listing the Fans of a Band, your query is centred on Fan. If you had those Id Surrogate Keys on every table, you would be forced to join Person, then join Party. But with the Relational Identifiers you have, you can go directly to Party:

SELECT  ...,
        Name  -- Party.Name
        ...
    FROM Party
    JOIN Fan
        ON PartyId = FanId

and skip the Band table in-between. Yes, the truth is, Normalised Relational Database require fewer joins, less resources (processing, cache, disk I/O), and that is one reason why they perform so much better. The myth has no scientific basis.

Please evaluate and ask specific questions.

Update

For a generic treatment of Subtypes, and implementation details for genuine SQL platforms, refer to my Subtype Document.

Bivalve answered 22/1, 2011 at 7:33 Comment(5)

thanks for the response here. I am new to databases and trying to learn by standards rather than only inducing my creative nature into a design. This is only a small part of a conceptual design that I am working on as a educational means for myself, inspired to try and take a database design through a more formal process after seeing some of the data models you have posted here on Stack. I understand your point about having to see all the information to give a truly accurate answer. (cont.) – Pekan 22/1, 2011 at 8:59

(cont.) I admit I am stubborn though, since I am learning I did not want to post my whole diagram up with the existing holes I know exist and ask for help. I am not looking for someone to just take my requirements and design something for me, I want the understanding myself. So thanks for the response, and although you don't see what I have done on my end I can say that your response has helped a lot and I will be applying the knowledge learned back into my conceptual model. (cont.) – Pekan 22/1, 2011 at 9:0

(cont.) I was a bit confused about Verb Phrases but with your explanation I no longer have an excuse and its time to go and fill them in. Since I am sure I am not the only one that can benefit from going through this process I will keep my work documented on this site for others to see. In the mean time back to work refining my model. – Pekan 22/1, 2011 at 9:0

@swisscheese. Thank you ! 1) Two tranches. Relax, I already said "no complaint, just reserving the right to change the model". 2) you've got a great attitude. I can see that you are trying to learn the methodology, that is why I posted a full explanation re each point 3) I take it as a serious compliment that my answers inspire a few seekers, wow ! That alone is worth the negativity on this site. 4) Vote please (it is different to choosing an answer), I get 10 cents. – Bivalve 22/1, 2011 at 18:16

@Air I was expecting readers to understand that it was a code fragment, not actual runnable code, from the context, the para immediately above and below. Nevertheless, I have upgraded it. – Bivalve 9/10, 2021 at 5:35

I think this is simpler than you think. You've got two objects - Band and Person, and they can be connected in two different ways, either as a fan or a member. Here is a quickie db script with no foreign keys or anything:

CREATE TABLE [dbo].[XREFBandMembers](
    [MemberID] [int] NOT NULL,
    [BandId] [int] NOT NULL,
 CONSTRAINT [PK_XREFBandMembers] PRIMARY KEY CLUSTERED 
(
    [MemberID] ASC,
    [BandId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[XREFBandFans](
    [FanId] [int] NOT NULL,
    [BandId] [int] NOT NULL,
 CONSTRAINT [PK_XREFBandFans] PRIMARY KEY CLUSTERED 
(
    [FanId] ASC,
    [BandId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

CREATE TABLE [dbo].[People](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [Name] [nvarchar](100) NOT NULL,
 CONSTRAINT [PK_People] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

CREATE TABLE [dbo].[Bands](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [Name] [nvarchar](50) NOT NULL,
 CONSTRAINT [PK_Bands] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

As for the relationship specific attributes, you can place them in the XREF tables, e.g., FanClubMembershipNumber goes in XREFBandFans.

Brink answered 21/1, 2011 at 20:9 Comment(7)

@Chris, I am sure this is simpler than I think but the caveat here is that I am just learning about databases and trying to take a methodical approach to getting from point A to Z. This is only a conceptual project that I am doing to learn from. With that being said I want to take it from a Logical Model to a Physical One before actually building the database and writing scripts to see of my logic holds true and to test its effectiveness. (cont.) – Pekan 21/1, 2011 at 20:52

(cont) With that being said I understand when you say I have 2 objects but in my mind I still haven't cleared how they associate in a logical model. I started the process using diagram 1 but then ran into problems when expanding the model out too far. So here I stand trying to get a solid understanding of the question I presented before moving on. – Pekan 21/1, 2011 at 20:53

I hear you. One criticism I would mount of my model here is that it's not ENFORCING that any Band has members. I mean, I suppose you can have it, but it doesn't really make sense. I guess you'd have to have that in triggers and stuff. But this should at least get you started. – Brink 21/1, 2011 at 20:55

And maybe that is the right approach and would still get me to the final goal of the requirement that I posted but I sure am not the one that can see that far yet...thanks for the help. – Pekan 21/1, 2011 at 21:0

@swiss, I would recommend looking again at Chris's answer, it is exactly what you need. The key point is that databases are in fact very simple, over-thinking them to 'get the point' is the easiest way to miss the point. The details in your question are object-oriented, and databases are not object-oriented. You say you want to get the "logical", in db's the logical and the table design are often one and the same. Chris's first statement, "This is simpler than you think" is the key to understanding his answer. – Yetac 22/1, 2011 at 2:14

@swisscheese. You are absolutely right. It is far too early to be considering tables and the Physical. If you do not get the Conceptual right, then the Logical (extensions, etc) fails. So stick with it, if you want a robust model that sustains unplanned growth; then the Logical; and after that is fully rendered; then the Physical. And triggers will not be necessary. – Bivalve 22/1, 2011 at 2:15

@Ken. That's right, databases are not OO. A different science applies. You cannot apply OO modelling to Db Modelling . Fowler and Ambler "simpler than you think" are dead wrong, and there are many failures to prove it. Just look at all the questions on SO with OO developers struggling with this issue. Give the Db modelling job to the people who have Db Modelling qualifications, not to the Fowlers. – Bivalve 22/1, 2011 at 3:50

Caveat

Data Modelling

Obstacle

Note

Update

Recommended topics

Hot tags