How to implement uniqueness where the order of the fields does not matter
Asked Answered
M

8

9

I think the following example will explain the situation best. Let's say we have the following table structure:

-------------------------------------
Member1   int      NOT NULL (FK)(PK)
Member2   int      NOT NULL (FK)(PK)
-------------------------------------
Statust   char(1)  NOT NULL

Here are the table contents for the table:

Member1    Member2    Status
----------------------------
  100        105        A

My question is how do I implement uniqueness so that the following INSERT statement will FAIL based on that one row already in the table.

INSERT status_table (Member1,Member2,Status) VALUES(105,100,'D');

Basically, I'm trying to model a relationship between two members. The Status field is the same whether we have (100,105) or (105,100).

I know I could use a before_insert and before_update trigger to check the contents in the table. But I was wondering if there was a better way to do it... Should my database model be different...

Myoglobin answered 14/3, 2012 at 16:56 Comment(6)
Why dont u use CHECK constraint on ur status column?Milwaukee
@Venk can you show the syntax of the CHECK constraint?Nickeliferous
i think table should be differently designedBonner
Check this odetocode.com/Articles/79.aspxMilwaukee
@Venk i think Aaron asked towrite check constraint with respect to current questionBonner
how would a check constraint help me with the above... please explainMyoglobin
R
6

If you can make sure that all applications/users store the members' IDs in least-to-greatest order (the least MemberID in Member1 and the greatest in Member2), then you could simply add a Check constraint:

ALTER TABLE Status_table
  ADD CONSTRAINT Status_table_Prevent_double_pairs
    CHECK (Member1 < Member2)

If you don't want to do that or you want that extra info to be stored (because you are storing (just an example) that "member 100 invited (liked, killed, whatever) member 150" and not vice versa), then you could use @Tegiri's approach, modified a little (multiplying two big enough integers would be an overflow problem otherwise):

CREATE TABLE Status_table
( Member1 INT NOT NULL
, Member2 INT NOT NULL
, Status CHAR(1) NOT NULL
, MemberOne  AS CASE WHEN Member1 < Member2 THEN Member1 ELSE Member2 END
          --- a computed column
, MemberTwo  AS CASE WHEN Member1 < Member2 THEN Member2 ELSE Member1 END
          --- and another one
, PRIMARY KEY (Member1, Member2)
, UNIQUE (MemberOne, MemberTwo)
, ...                                    --- FOREIGN KEY details, etc 
) ;
Rachael answered 14/3, 2012 at 18:4 Comment(4)
How does this compare to JohnDewey's approach. Which one is the best solution?Myoglobin
I like this one better because it doesn't rely on any assumptions about the likelihood of checksum collisions.Nickeliferous
instead of having both a Primary Key and a Unique Index, couldn't I just make the PrimaryKey (MemberOne, MemberTwo) and that would cover all cases?Myoglobin
@user1269532: yes, you could. Not sure if there are any disadvantages of using computed columns in the Primary Key but you can. (you'll have one index less, at least).Blooper
K
3

The database model fails because you have two entities {Member1, Member2} which, by saying that it doesn’t matter which is which, you are saying are the same entity {Member}. In other words, you have one fact in two places, one of the cardinal sins of relational database design.

A high-level solution would be to better model the nature of the relationship. An example might be a marriage of two individuals. Rather than “Bride and Groom are married” and fussing over which gets listed first, you’d have "Marriage #xyz is between (contains) participants A and B". So, table Marriage with a primary key, table MarriageMember with foreign key to Marriage, foreign key to “Person”, and a primary key on both columns. Lets you have more than two members, which can be useful if you’re in a Heinlein story.

If you’re stuck with existing schemas (and aren’t we all), I’d require the data to be submitted with, say, the lowest value listed first, so that they are always ordered properly. You could do tricks with a checksum on the two columns as a computed column, but that wouldn’t absolutely guarantee uniqueness. But and alas, at the end of the day your model appears to be slighly flawed for your purposes.


Addenda

As per the comments below, if you are modeling members that a given member is related to, then you have a "Member is related to other members" situation. Here, Member1 is the "main" member, and Member2 is an other member that "this" member is related to. (And that's the distinction needed between the two Member columns.) Thus, if a relationship is bi-directional, then you'd need two entries, to cover both "Member A is related to Member B" and "Member B is related to Member A". This, of course, would be enforced with a primary key on {Member1, Member2}, since Status appears to be irrelevant (there's only one relationship, not multiple based on status).

Kandicekandinsky answered 14/3, 2012 at 17:21 Comment(9)
Thank you for this answer. In fact, you are right. What I'm trying to model is the relationship between two members. It does not matter if it's (A,B) or (B,A) - the status of the relationship is the same. But I don't quite understand the model you suggest. Basically, I have a Member table which contains all my members and I have a Status table which contains the status between any two members. Can you explain your model based on my tables.Myoglobin
"It's a members table containg members". This begs the question: members of what? What is the [entity] that has instances where each instance can have its own set of members? Model the entity (in my example, Marriage) and list out its members (participants in the Marriage).Kandicekandinsky
I have two tables: Member and Status. The member table contains all my members and the status table is a relationship table. It defines a many-to-many relationship between the member table and the member table again. "A member can be associated to one or more members". It's a many-to-many relationship on itself. How to model that in the DB world.Myoglobin
Is your answer still valid or would you go with @ypercube's answer?Myoglobin
It's as in your example about marriage. If MemberA is married to MemberB, then MemberB is obviously married to MemberA. I only need one row to represent that relationship. The status for MemberA related to MemberB is the same as for MemberB related to MemberA. Therefore, if the row for (MemberA,MemberB) already exists, I want the insert for row (MemberB,MemberA) to fail. I still have a many-to-many relationship because MemberA can be related to other members besides MemberB and the same is true for MemberB (it can be related to other members besides MemberA - polygamy!).Myoglobin
I'm persisting with this because I do want to make sure that I am not making an error in modelling this situation. I appreciate the time you're taking to answer my questions.Myoglobin
As regards your idea for separate tables for Marriage and MarriageMember respectively, the only viable natural key for the marriage is the compound of its members' identifiers and we are back at square one!Dogeatdog
That's where alternate keys and surrogate keys come in. Marriages, for example, would require a marriage license (a viable alternate key), and other possible attributes (church, date/time, etc.) -- not the best example, since "marriage" covers a whole lotta stuff these days. But then, that's where surrogate keys come in -- an arbitrarily made-up value that uniquely identifies the marriage regardless of members within the marriage. What's best? Depends on the data you are modelling and how you intend to model it.Kandicekandinsky
The one-to-many model is a "purist" argument. As I've said, it depends on what is modeled and how it will be usee. Based on what's been said it sounds like the "require Memeber 1 < Member2" check combined with a primary key on both is best. The issue with this is that it pushes a burden on the aplication(s) and users in that they must prepare the data a certain way (ordered) before it's submitted to the database. A trivial enough cost, but it has to be remembered, maintained, and enforced throughout the entire lifetime of the database and application -- including all future modifications.Kandicekandinsky
C
3

Here's an alternative way to look at this. You could actually enforce the rule that the mutual relationship is always expressed by the presence of two rows, (A,B) and (B,A), instead of just one.

CREATE TABLE MutualRelationship
 (Member1 INT NOT NULL,
  Member2 INT NOT NULL,
  Status CHAR(1),
 PRIMARY KEY (Member1, Member2),
 UNIQUE (Member1, Member2, Status),
 FOREIGN KEY (Member2, Member1, Status) REFERENCES MutualRelationship (Member1, Member2, Status));

INSERT INTO MutualRelationship (Member1, Member2, Status)
VALUES
(100,105,'A'),
(105,100,'A');
Caw answered 14/3, 2012 at 22:7 Comment(0)
K
2

One way to avoid a trigger try a UNIQUE computed column on Member1 and Member2:

create table test (Member1 int not null, Member2 int not null, Status char(1)
, bc as abs(binary_checksum(Member1))+abs(binary_checksum(Member2)) PERSISTED UNIQUE)

INSERT INTO test values(123, 456, 'A'); --succeeds
INSERT INTO test values(123, 789, 'B'); --succeeds
INSERT INTO test values(456, 123, 'D'); --fails with the following error:
--Msg 2627, Level 14, State 1, Line 1
--Violation of UNIQUE KEY constraint 'UQ__test__3213B1084A8F946C'. Cannot insert duplicate key in object 'dbo.test'
Keelung answered 14/3, 2012 at 17:23 Comment(2)
Why abs? Would't that increase the chance of false positives?Kandicekandinsky
@PhilipKelley - No. abs allows the sum to preserve the original entropy of the two checksums, since the range of sums would tend to be reduced wherever positive checksums are combined with negative ones.Keelung
P
2

Here an excerpt from "Symmetric Functions" in "SQL Design patterns" book you may find relevant.

Consider an inventory database of boxes

table Boxes (
   length integer,
   width  integer,
   height integer
)

Box dimensions in the real world, however, are generally not given in any specific order. The choice what dimensions becomes length, width, and height is essentially arbitrary. What if we want to identify the boxes according to their dimensions? For example, we would like to be able to tell that the box with length=1, width=2, and height=3 is the same box as the one with length=3, width=1, and height=2. Furthermore, how about declaring a unique dimensional constraint? More specifically, we won’t allow any two boxes that have the same dimensions.

An analytical mind would have no trouble recognizing that the heart of the problem is the column ordering. The values of the length, width, and height columns can be interchanged to form another legitimate record! Therefore, why don’t we introduce 3 pseudo columns, say A, B, and C such that

A ≤ B ≤ C

Then, a unique constraint on A, B, C should satisfy our requirement! It could be implemented as a function based unique index, as long as we can express A, B, C analytically in terms of length, width, height. Piece of cake: A is the greatest of length, width, height; C is the least of them, but how do we express B? Well, the answer is easy to write

B = least (greatest (length,width),
           greatest (width,height),
           greatest (height,length) )

although difficult to explain.

A mathematical perspective, as usual, clarifies a lot. Consider cubic equation

If we know the roots x1, x2, x3 then, the cubic polynomial could be factored, so that we have

Marrying both equations we express coefficients a, b, c in terms of roots x1, x2, x3

Figure 4.1: A shape of the graph of the polynomial y=(x-x1)(x-x2)(x-x3) is entirely defined by the roots x1, x2, and x3. Exchanging them doesn’t affect anything.

The functions -x1-x2-x3, x1x2+x2x3+x3x1, -x1x2x3 are symmetric. Permuting x1, x2, x3 has no effect on the values a, b, c. In other words, the order among the roots of cubic equation is irrelevant: formally, we speak of a set of roots, not a list of roots1. This is exactly the effect we want in our example with Boxes. Symmetric functions rewritten in terms of length, width, height are

length+width+height
length*width+width*height+height*length
length*width*height

Those expressions were simplified a little by leveraging the fact that the negation of a symmetric function is also symmetric.

Our last solution is strikingly similar to the earlier one, where the greatest operator plays the role of multiplication, while the least operator goes as addition. It is even possible to suggest a solution, which is a mix-in between the two

least(length,width,height)
least(length+width,width+height,height+length)
length+width+height

A reader can check that these three functions are again symmetric2. The last step is recording our solution in formal SQL

table Boxes (
   length integer,
   width  integer,
   height integer
);

create unique index b_idx on Boxes(
   length + width + height,
   length * width + width * height + height * length,
   length * width * height
);

Symmetric functions provide a basis for a nifty solution. In practice however, a problem can often be solved by schema redesign. In the box inventory database example, we don’t even need schema redesign: we can just require to change the practice of inserting unconstrained records (length,width,height), and demand that

length ≥ width ≥ height
Polyphagia answered 14/3, 2012 at 17:24 Comment(0)
N
1

Can't think of a better way to augment the existing unique constraint aside from a trigger. e.g.

CREATE TRIGGER dbo.StatusTable_PreventDualUniques
ON dbo.status_table
INSTEAD OF INSERT
AS
BEGIN
    SET NOCOUNT ON;

    IF EXISTS (
      SELECT 1 FROM inserted AS i 
        INNER JOIN dbo.status_table AS s
        ON i.Member1 = s.Member1 AND i.Member2 = s.Member2
        OR i.Member2 = s.Member1 AND i.Member1 = s.Member2
    )
    BEGIN
        RAISERROR('Duplicate detected', 11, 1);
    END
    ELSE
    BEGIN
        INSERT dbo.status_table(Member1, Member2, Status)
            SELECT Member1, Member2, Status
            FROM inserted;
    END
END

Now off the top of my head this only deals with single-row inserts. The logic can get a little more complicated if you need to handle multi-row inserts, since you need to check for duplicates both within inserted and between inserted and the base table. This also doesn't handle high concurrency at the default isolation level (e.g. another transaction inserts a duplicate row between the check and the insert). But this should be a start.

(You'll also need one for UPDATE...)

Nickeliferous answered 14/3, 2012 at 17:21 Comment(4)
Wouldn't a Check constraint be better than a trigger?: CHECK (Member1 < Member2)Blooper
I suppose, but unless the application and/or users know to enter the lower Member value first, you wouldn't be able to insert 105, 100 even when the table is empty.Nickeliferous
Which approach is better the trigger approach described here or the approach described by ypercube?Myoglobin
@ypercube's is probably more complete.Nickeliferous
G
1

A slight variation on @ypercube's solution would be to create an indexed view and move the unique constraint to the view. Here's a complete script that demonstrates the approach:

/* the reference table (almost irrelevant for the tests,
   but added to make the environment closer to the one in the question) */
CREATE TABLE dbo.Members (
  ID int IDENTITY CONSTRAINT PK_Members PRIMARY KEY,
  Name varchar(50)
);

GO

/* the table to add the constraint on */
CREATE TABLE dbo.Data (
  Member1 int CONSTRAINT FK_Data_Member1 FOREIGN KEY REFERENCES dbo.Members (ID),
  Member2 int CONSTRAINT FK_Data_Member2 FOREIGN KEY REFERENCES dbo.Members (ID),
  Statust char(1),
  CONSTRAINT PK_Data PRIMARY KEY (Member1, Member2)
);

GO

/* the indexed view that the constraint will actually be applied to */
CREATE VIEW dbo.DataView
WITH SCHEMABINDING  /* required with indexed views */
AS
SELECT
  /* the column definitions are practically identical to ypercube's */
  Member1 = CASE WHEN Member1 > Member2 THEN Member2 ELSE Member1 END,
  Member2 = CASE WHEN Member1 > Member2 THEN Member1 ELSE Member2 END
FROM dbo.Data

GO

/* finally, the constraint itself */
CREATE UNIQUE CLUSTERED INDEX UQ_DataView ON dbo.DataView (Member1, Member2);

GO

/* preparing the stage: adding some data to the reference table */
INSERT INTO dbo.Members (Name)
SELECT 'Member A' UNION ALL
SELECT 'Member B' UNION ALL
SELECT 'Member C';

GO

/* the first two rows should and do insert into the target table without issues */
INSERT INTO dbo.Data (Member1, Member2, Statust) VALUES (3, 1, 'A');
INSERT INTO dbo.Data (Member1, Member2, Statust) VALUES (2, 3, 'A');

GO

/* and this one fails, which demonstrates the constraint in work */
INSERT INTO dbo.Data (Member1, Member2, Statust) VALUES (1, 3, 'B');

GO

/* cleaning up */
DROP VIEW dbo.DataView;
DROP TABLE dbo.Data;
DROP TABLE dbo.Members;

Read more about indexed views on MSDN:

Gordie answered 15/3, 2012 at 6:42 Comment(2)
I like the fact that the extra fields are not polluting my real table. On the other hand, if that view gets dropped without me knowing then I will lose the constraint. I feel that @ypercube's solution is more foolproof.Myoglobin
Fair point. When I was offering this solution, I realised that separating the extra fields was likely its only advantage over @ypercube's, and not very material, too. I'm sure I could prefer the indexed view approach in some scenarios, but probably in most cases I would choose to endure the extra fields in the table on the same grounds as you would, i.e. in favour of keeping the constraint fastened to the table as well.Gordie
S
0

Instead of trying to have the table itself enforce this particular business logic, would it be better to have it encapsulated in a stored procedure? You certainly gain more flexibility in enforcing unique relationships between two members.

Seaman answered 14/3, 2012 at 18:34 Comment(1)
The problem with that is that you can't enforce all data manipulation go through a stored procedure. You can try, but in most real-world scenarios it's a non-starter.Nickeliferous

© 2022 - 2024 — McMap. All rights reserved.