using Natural key as the ID of DomainObject or GUID + auto-increment Domain Driven Design
Asked Answered
L

4

7

I've been reading a lot of articles about DDD and noticed that most are using GUID as their ID when persisting to a database. They say that GUID scales well and auto incrementing ID's are a big no-no when it comes to scalability.

Im confused now whether to use GUID or auto-increment.

Basically the Domain is about membership system (binary tree). (tracking of register members)

The first requirement is that we should have something that uniquely identifies them in the system (we call it Account No.) perhaps 7digit.

Then new Members can be registered by another Member. We call it referral.

Now what Im planning to do is to have MemberId of GUID type as the DomainObject Id where it serves as the primary key which will be used for joins, foreign keys (on Referral, referer_id would be the GUID MemberId). AccountNo will be an auto-increment column or maybe it will be acquired from repository by MAX() + 1. Mainly it will be used for search features in the system and in links.

Should the ID of DomainObject remain hidden to users of the system since its just a technical implementation?

Is it ok to combine both? GUID as row_id in database (Surrogate Key). and Auto-Increment for (Natural Key)?

Is it okay to exclude the AccountNo from the constructor because it will be auto-incremented anyway? What about the need to enforce invariants? So is getting the next ID from repository the way to go and include AccountNo in the constructor?

Should I just stick with Auto-Increment ID and forget about GUID, remove MemberId and let the AccountNo be the ID of the DomainObject?

NOTE:

I am not building the next facebook of some kind.

I just want to practice the Tactical side of DDD to learn how to make hard architectural decisions knowing their PROS and CONS.

I just want to practice the Strategic side of DDD to learn how to make hard architectural decisions knowing their PROS and CONS and their implementation.

If we will make 3 scenarios with member registration:

  • First Scenario: Member registration happens every minute.
  • Second Scenario: Member registration happens every hour.
  • Third Scenario: Member registration happens atmost 5 daily.

How will it affect the decisions?

Technology Stack:

  • ASP MVC 5
  • Sql Server 2014
  • C#
  • Dapper ORM
Luxuriant answered 28/7, 2015 at 9:45 Comment(0)
J
5

I just want to practice the Tactical side of DDD to learn how to make hard architectural decisions knowing their PROS and CONS.

You got that wrong. You can't learn strategy by doing tactics. Tactics are ways to implement a strategy. But you need a strategy first.

Anyway about your question, it's quite simple: use a Guid. It has 2 advantages

  1. global identifier
  2. can be generated easily from the app. An auto incremented id means a complicated service or reliance on the db. Don't complicate your life.

The natural id, like AccountNo, should be used too. However, the Guid is there for technical purposes. The natural keys format might change in the future, the Guid makes it easy to support natural key multiple formats.

As a practice, is best that your entity id to be a value object (even if it's just Guid). You can incorporate the guid in AccountNo too, a VO doesn't need to be only one value. For example, in my current app, I have a ProjectId(Guid organization,Guid idValue) and ProjectAssetId(Guid organization,Guid projectId,Guid idValue).

Jipijapa answered 28/7, 2015 at 15:6 Comment(3)
Thank you for pointing out that its strategic side. I've edited the post. Your answer is what Im actually thinking of, and in fact implemented it before. I just thought GUID ids are a way to escape natural key generation, I guess Im wrong again. Just a follow up question. If AccountNo is generated late (on persist) can I exclude it from the constructor even if its stated that every Member should have AccountNo which is clearly an invariant and excluding it on the constructor will hide that?Luxuriant
Your business rule conflicts with a technical implementation. Business rule is more important so you have to change the technical implementation, in this case don't generate AccountNo after the object has been saved. So, just use a Guid :)Jipijapa
Guid is not human-readable we cant use it as the AccountNo. I guess I have to fetch the next AccountNo. from the repository before persisting.Luxuriant
S
5

There are probably too many questions in your question to give you a complete answer, because ID design is not simple and has many facets. I can recommend the book "Implementing DDD" by Vaughn Vernon, it has a section dedicated to identity design (Chapter 5 "Entities" - Unique Identity).

I try to point you into the right direction anyway, without reciting everything from that chapter :-) .

What do you need?

You already stated some questions regarding ID design, but there are more questions that you need to ask. Only then can you decide whether GUID, DB-generated or still different IDs are appropriate.

  • What's the domain significance of the ID? Probably the ID is just there to facilitate a technical solution, but is not part of the domain at all?
  • Who provides the ID? This could be the user, the application, the database or even another Bounded Context.
  • When do you need the ID? Before the associated entity is created, at creation time, or only when the entity is persisted?

The answers to these questions will constrain the type of ID generation that you can use.

What you should be aware of

There are a few rules regarding ID design. Following them is strongly recommended, so that you don't shoot yourself in the foot later:

  • Make sure your IDs are unique. This may especially be a concern with user-provided IDs. You need to enforce uniqueness and give your users a possibility to detect existing IDs. With application-generated random IDs or DB-generated IDs this is usually not a problem.
  • Make sure your IDs are stable. Never ever change the ID of an entity! After all, the ID is what you use to reference an entity. The consequence of this is that you should not make things part of your ID that may change. E.g. the last name of a person may not be a good choice, because that could change when someone marries (and it may be problematic because of non-uniqueness, too).
  • Hide IDs if they are just a technical concept. Making a technical concept part of your domain model is wrong from a DDD point of view.

Example

Here is an example of how you could find an ID design:

Allowing the ID to be created late (i.e. by the persistence) means that you do have an entity without an ID when you just create the entity. So if you need early or just-in-time IDs, you cannot use DB-generated IDs (unless you accept contacting the DB just for ID retrieval).

Then you may decide that the user is not interested in the ID, so having to specify the ID would be strange. This leaves application generated IDs.

With application generated IDs, you need to make sure that the ID is unique. This is trivial for single-instance applications, but may be more problematic as soon as you create a load-balanced setup with more than one app instance. This is the reason why many people use random IDs (such as GUIDs) in the first place, so they don't hit a dead end when the scale out.

As you see, this example makes many assumptions. There just is no right or wrong, but with the questions stated above, you should be able to make an informed decision.

Surmise answered 28/7, 2015 at 12:9 Comment(5)
You've mentioned that you can create entity without the ID initially. In DDD point of view. Is it considered that the ID of the domain object an invariant? So its ok to exclude the ID from the entity constructor because we are relaying on a persistence layer to provide it?Luxuriant
If the ID is irrelevant from a domain viewpoint, the constructor (or factory for the purpose of creating an entity) should not require callers to provide one. I usually end up with the following setup: A CreateNewEntityFactory first creates an ID (e.g. GUID) and passes that to the internal entity constructor. This approach avoids logic in the entity constructor.Surmise
Lets take for example AccountNo is a domain concept. Now hardest decision, if the AccountNo should be auto-generated should you make it Primary key or use GUID as primary key instead? Then should both GUID and AccountNo present in the constructor? most DDD examples are using GUID as the ID, but more often than not, guid representation is not a domain concept because its too long to display to the users.Luxuriant
If you are absolutely sure, that AccountNo never EVER changes, you should make it your ID (both for the entity and for the persistence). If for some reason that does not work with your persistence mechanism, then store the AccountNo as ordinary field in your DB and use an artificial ID (which is then not part of the domain).Surmise
Thank you for referring me to the Red Book. Read it last night and learned a lot from that chapter.Luxuriant
J
5

I just want to practice the Tactical side of DDD to learn how to make hard architectural decisions knowing their PROS and CONS.

You got that wrong. You can't learn strategy by doing tactics. Tactics are ways to implement a strategy. But you need a strategy first.

Anyway about your question, it's quite simple: use a Guid. It has 2 advantages

  1. global identifier
  2. can be generated easily from the app. An auto incremented id means a complicated service or reliance on the db. Don't complicate your life.

The natural id, like AccountNo, should be used too. However, the Guid is there for technical purposes. The natural keys format might change in the future, the Guid makes it easy to support natural key multiple formats.

As a practice, is best that your entity id to be a value object (even if it's just Guid). You can incorporate the guid in AccountNo too, a VO doesn't need to be only one value. For example, in my current app, I have a ProjectId(Guid organization,Guid idValue) and ProjectAssetId(Guid organization,Guid projectId,Guid idValue).

Jipijapa answered 28/7, 2015 at 15:6 Comment(3)
Thank you for pointing out that its strategic side. I've edited the post. Your answer is what Im actually thinking of, and in fact implemented it before. I just thought GUID ids are a way to escape natural key generation, I guess Im wrong again. Just a follow up question. If AccountNo is generated late (on persist) can I exclude it from the constructor even if its stated that every Member should have AccountNo which is clearly an invariant and excluding it on the constructor will hide that?Luxuriant
Your business rule conflicts with a technical implementation. Business rule is more important so you have to change the technical implementation, in this case don't generate AccountNo after the object has been saved. So, just use a Guid :)Jipijapa
Guid is not human-readable we cant use it as the AccountNo. I guess I have to fetch the next AccountNo. from the repository before persisting.Luxuriant
C
3

Allow me to defend the auto-increment idea. It is true that GUIDs are more scalable. But an important question any designer should ask at some point is "How much scalability do I need?"

The answer is very rarely "As much as possible!" In the real world, everything has limits. Our databases model the real world.

For example, if you are working with people (users, customers, students, etc.), a 64-bit integer can contain the entire population of the Earth many times over. Very many times. We're talking about "population of the galactic empire" here. Using bigint, you can just about uniquely identify every atom in the universe.

Don't get lazy, especially during the design phase. Design a reasonable margin of safety and go on. Anything more unnecessarily increases the complexity and "friction" of the system.

My experience in this field is now measured in decades -- several of them. In that time, I have never had to use GUIDs for scalability. The only actual use for GUIDs I have found is when entities are created in different, usually remote, databases which then must be merged together into a central database. GUIDs eliminate (statistically speaking, that is) the possibility of collisions during the merge.

Coypu answered 30/7, 2015 at 19:25 Comment(2)
I dont know much about distributed systems. but does int or bigint primary key a good fit when scaling horizontally?Luxuriant
As far as available values, yes. The issue would be in synchronizing the value generation. That is, while the data is distributed, the key generation would have to be centralized. Or merge the locally generated value with the node identifier: A1001, B1001, A1002, B1002, etc. There could be better solutions. I am not a DBA so would gladly accept a more authoritative answer.Coypu
B
0

They say that GUID scales well and auto incrementing ID's are a big no-no when it comes to scalability.

The main scalability problem with auto-increment integers is with the inserts: since new values "bunch up" together at the upper extreme of the value range, they tend to hit the same "side" of a B-Tree (and likely the same leaf node), causing latching and lowering the concurrency.

At just once-a-minute insertion, you simply won't see any of this, so pick your key based on other criteria. As far as you have described, auto-increment integers would serve just fine in your scenario... they are more lightweight and are likely to perform better than GUIDs. And if 32-bit variant is not wide enough, just use 64-bit.


BTW, auto-increment integers are not generated by querying for MAX() + 1 - all DBMSes have their own version of high-performance "sequence generator" that you can use directly.

You can also return the generated value directly to the client without requiring an additional round-trip (e.g Oracle's RETURNING or SQL Server's OUTPUT clause). Unfortunately, ORMs don't always play well with that...

Brackish answered 28/7, 2015 at 18:24 Comment(8)
I question your "bunch up" explanation. These are not just B-Trees, but balanced B-Trees.Coypu
@Coypu Err... every B-Tree is balanced. This effect is not related to B-Tree becoming "unbalanced", but with what DBMS has to do to update it.Brackish
Then you have me at a loss. The "main scalability problem" is supposed to be when new values "bunch up" at the upper end. That only happens if the B-Tree becomes unbalanced.Coypu
@Coypu By "B-Tree always being balanced", I mean that all paths down a B-Tree (from root to leafs) are always equally deep. That is ensured by a simple fact that changing the height of a B-Tree is allowed only through splitting or coalescing the root node, which "raises" or "lowers" all leafs equally.Brackish
@Coypu What I'm describing has nothing to do with "unbalancing" a B-Tree. Imagine what happens when a key value is inserted into B-Tree: the DBMS needs to traverse non-leaf nodes and find a leaf in which to perform the insertion. It needs to latch that node (to protect against other concurrent transactions), and do the actual insertion (possibly split it etc.). The latching is the problem: it means insertions into the same node cannot be done in parallel. And sequential integers will naturally hit the same leaf far more often than well-distributed random integers (or GUIDs for that matter).Brackish
@Coypu This problem is fairly rare and impacts only insert-intensive systems, and corresponding resources around the internet seem to be equally rare, but I have dug up this article - look under section "Ever Increasing". Some DBMSes even have "reverse indexes" (not inverted indexes!) to combat it.Brackish
Now we're getting somewhere. The article explains the situation with more detail than the unfortunate phrase "bunching up." It also specifies that the "problem" would be expected in only 1% of uses. This is far from the "main scalability problem" as you have described. It is good to have this info. But it's like designing a bridge in Kansas and worrying about it withstanding an earthquake. Statistically, the probability is non-zero, but not anything to lose sleep over. Making it "earthquake proof" would be needlessly over-engineering it. Just don't do anything to make it especially fragile.Coypu
@Coypu I never claimed main == large. And I was very specific that it's probably irrelevant in this case.Brackish

© 2022 - 2024 — McMap. All rights reserved.