Why primary key is (not) required on fact table in dimensional modelling?
Asked Answered
K

4

15

I have heard a few references that pk is not required on fact table. I believe every single table should have a pk.

How could a person understand a row in a fact table if there is no pk and 10+ foreign keys.

Karena answered 22/1, 2014 at 16:30 Comment(0)
E
35

Primary Key is there

... but Enforcing the primary key constraint in database level is not required.

If you think about this, technically a unique key or primary key is a key that uniquely defines the characteristics of each row. And it can be composed of more than one attributes of that entity. Now in the case of a Fact table, foreign keys flowing-in from the other dimension tables together already act as a compounded primary key. And these foreign-key combinations can uniquely identify each record in the fact table. So, this foreign key combination is the primary key for the fact table.

Why not a Surrogate Key then?

Now if you wanted, you could have defined one surrogate key for the fact table. But what purpose would that serve? You are never going to retrieve one record from that fact table referring its surrogate key (use Indexes instead). Neither you are going to use that surrogate key to join the fact with other tables. Such a surrogate key will be completely waste of space in the database.

Enforcing Database Constraints

When you define this conceptual primary key in the database level, database needs to ensure that this constraint is not getting violated in any of the DML operation performed over it. Ensuring this constraint is a overhead for your database. It might be insignificant for an OLTP system, but for a large OLAP system where data are loaded in batch, this may incur significant performance penalties. Beside, why do you want your database to ensure the integrity of the constraints when you can ensure the same during the data loading phase itself (typically through your ETL coding).

Ericson answered 23/1, 2014 at 3:21 Comment(4)
This is a better answer than mine. I would say though that there are sometimes reasons to create a surrogate key for a fact table. A surrogate can help with certain ETL operations (updates and deletes) and can also support partitioning/archiving/storage strategies. You should still have a "business key" (usually composite) as well of course.Huttan
It's accurate as long as the all FK used in the fact table are PK in other table, BUT if you use view for dimensions and/or union for facts (e.g. to change granularity) the unique key can potentially repeat in the fact table, e.g. same person paid twice on the same day, in same restaurant. SK in fact tables is not completely worthless as mentioned above. it depends on DWH architecture, constraints and business requirements.Animosity
The Surrogate key it think is important depending on business requirements; for example the answer about the SK is not correct when we talk about sales which records FK of customers, FK of warehouses and FK of products, because a real situation would be that business will need some times to fine a specific sale and that is when the SK will help a lot.Cupo
@Cupo You don't use a SK for finding a specific sale. You use the sales Id / Receipt No / Invoice Id etc. (basically, a source side key that can uniquely identify that sales). There is almost no reason (other than over-engineering/bad-design) to put a SK in FactEricson
H
6

You are absolutely right that in principle a fact table should have a key. From the point of view of data modelling it is required. In implementation, key constraints in the database usually require an index however. The overhead of creating and maintaining indexes is such that the uniqueness of the "key" attributes is sometimes maintained by controls in the integration layer ("ETL process") rather than by a constraint in the database.

Whenever practical it does make sense to create the key constraint within the database. If the key isn't explicitly defined in the database then it ought to be clearly documented for users so that they can understand what the data means.

Huttan answered 22/1, 2014 at 17:34 Comment(0)
O
1

As you can read in other answers, primary key constraint is not required, a fact table surrogate key may be helpful at the physical level.

Here a Kimball design tip for Fact Table Surrogate Key:

There are a few circumstances when assigning a surrogate key to the rows in a fact table is beneficial:

  1. Sometimes the business rules of the organization legitimately allow multiple identical rows to exist for a fact table. Normally as a designer, you try to avoid this at all costs by searching the source system for some kind of transaction time stamp to make the rows unique. But occasionally you are forced to accept this undesirable input. In these situations it will be necessary to create a surrogate key for the fact table to allow the identical rows to be loaded.

  2. Certain ETL techniques for updating fact rows are only feasible if a surrogate key is assigned to the fact rows. Specifically, one technique for loading updates to fact rows is to insert the rows to be updated as new rows, then to delete the original rows as a second step as a single transaction. The advantages of this technique from an ETL perspective are improved load performance, improved recovery capability and improved audit capabilities. The surrogate key for the fact table rows is required as multiple identical primary keys will often exist for the old and new versions of the updated fact rows between the time of the insert of the updated row and the delete of the old row.

  3. A similar ETL requirement is to determine exactly where a load job was suspended, either to resume loading or back put the job entirely. A sequentially assigned surrogate key makes this task straightforward.

(source: Design Tip #81 Fact Table Surrogate Key)

Overripe answered 6/10, 2020 at 7:55 Comment(0)
E
0

As we have foreign keys in the Fact table, Which are coming from the primary keys of other dimensions having unique value in each row to identify each record of the fact table so this way foreign keys are itself acting as primary keys.

Edgaredgard answered 6/10, 2020 at 13:31 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.