SQL: Creating a Relational table with 2 different auto_increment
Asked Answered
A

1

9

I have 2 tables, each with their own auto incremented IDs, which are of course primary keys.

When I want to create a 3rd table to establish the relation between these 2 tables, I always have an error.

First one is that you can have only 1 automatically-incremented column, the second one occurs when I delete the auto_increment statement from those 2, therefore AQL doesn't allow me to make them foreign keys, because of the type matching failure.

Is there a way that I can create a relational table without losing auto increment features?

Another possible (but not preferred) solution may be there is another primary key in the first table, which is the username of the user, not with an auto_increment statement, of course. Is it inevitable?

Thanks in advance.

Alcaic answered 14/4, 2015 at 23:31 Comment(4)
your third table shouldnt have an auto increment column at all, if its a relationship table it should just be taking values from the other two tables. auto increment makes no sense in that situationRelume
I agree, but when I try to do so, I get an error because the types with the original tables don't match.Alcaic
that should not be the case -please post your create table statementsRelume
Please review the Answers and comment, vote, choose.Exponential
E
24

1 Concept

You have misunderstood some basic concepts, and the difficulties result from that. We have to address the concepts first, not the problem as you perceive it, and consequently, your problem will disappear.

auto incremented IDs, which are of course primary keys.

No, they are not. That is a common misconception. And problems are guaranteed to ensue.

An ID field cannot be a Primary Key in the English or technical or Relational senses.

  • Sure, in SQL, you can declare any field to be a PRIMARY KEY, but that doesn't magically transform it into a Primary Key in the English, technical, or Relational senses. You can name a chihuahua "Rottweiller", but that doesn't transform it into a Rottweiller, it remains a chihuahua. Like any language, SQL simply executes the commands that you give it, it does not understand PRIMARY KEY to mean something Relational, it just whacks an unique index on the column (or field).

  • The problem is, since you have declared the ID to be a PRIMARY KEY, you think of it as a Primary Key, and you may expect that it has some of qualities of a Primary Key. Except for the uniqueness of the ID value, it provides no benefit. It has none of the qualities of a Primary Key, or any sort of Relational Key for that matter. It is not a Key in the English, technical, or Relational senses. By declaring a non-key to be a key, you will only confuse yourself, and you will find out that there is something terribly wrong only when the user complains about duplicates in the table.

2 Relational Model

2.1  Relational tables must have row uniqueness

A PRIMARY KEY on an ID field does not provide row uniqueness. Therefore it is not a Relational table containing rows, and if it isn't that, then it is a file containing records. It doesn't have any of the integrity, or power (at this stage you will be aware of join power only), or speed, that a table in a Relational database has.

  • Execute this code (MS SQL) and prove it to yourself. Please do not simply read this and understand it, and then proceed to read the rest of this Answer, this code must be executed before reading further. It has curative value.

    -- [1] Dumb, broken file
    -- Ensures unique RECORDS, allows duplicate ROWS
    
    CREATE TABLE dumb_file (
         id         INT       IDENTITY PRIMARY KEY, 
         name_first CHAR(30), 
         name_last  CHAR(30)
         )
    
    INSERT dumb_file VALUES
         ( 'Mickey', 'Mouse' ),
         ( 'Mickey', 'Mouse' ),
         ( 'Mickey', 'Mouse' )
    
    SELECT *
         FROM dumb_file
    

Notice that you have duplicate rows. Relational tables are required to have unique rows. Further proof that you do not have a relational table, or any of the qualities of one.

Notice that in your report, the only thing that is unique is the ID field, which no user cares about, no user sees, because it is not data, it is some additional nonsense that some very stupid "teacher" told you to put in every file. You have record uniqueness but not row uniqueness.

In terms of the data (the real data minus the extraneous additions), the data name_last and name_first can exist without the ID field. A person has a first name and last name without an ID being stamped on their forehead.

The second thing that you are using that confuses you is the AUTOINCREMENT. If you are implementing a record filing system with no Relational capability, sure, it is helpful, you don't have to code the increment when inserting records. But if you are implementing a Relational Database, it serves no purpose at all, because you will never use it. There are many features in SQL that most people never use.

2.2  Corrective Action

So how do you upgrade, elevate, that dumb_file that is full of duplicate rows to a Relational table, in order to get some of the qualities and benefits of a Relational table ? There are three steps to this.

  1. You need to understand Keys

    And since we have progressed from ISAM files of the 1970's, to the Relational Model, you need to understand Relational Keys. That is, if you wish to obtain the benefits (integrity, power, speed) of a Relational Database.

    In Codd's Relational Model:

    a key is made up from the data

    and

    the rows in a table must be unique

    Your "key" is not made up from the data. It is some additional, non-data parasite, caused by your being infected with the disease of your "teacher". Recognise it as such, and allow yourself the full mental capacity that God gave you (notice that I do not ask you to think in isolated or fragmented or abstract terms, all the elements in a database must be integrated with each other).

    Make up a real key from the data, and only from the data. In this case, there is only one possible Key: (name_last, name_first).

  2. Try this code, declare an unique constraint on the data:

    -- [2] dumb_file fixed, elevated to table, prevents duplicate rows
    -- still dumb
    CREATE TABLE dumb_table (
        id         INT       IDENTITY PRIMARY KEY, 
        name_first CHAR(30), 
        name_last  CHAR(30),
        CONSTRAINT UK
            UNIQUE  ( name_last, name_first )
        )
    INSERT dumb_table VALUES
        ( 'Mickey', 'Mouse' ),
        ( 'Minnie', 'Mouse' )
    
    SELECT *
        FROM dumb_table
    
    INSERT dumb_table VALUES
        ( 'Mickey', 'Mouse' )
    

    Now we have row uniqueness. That is the sequence that happens to most people: they create a file which allows dupes; they have no idea why dupes are appearing in the drop-downs; the user screams; they tweak the file and add an index to prevent dupes; they go to the next bug fix. (They may do so correctly or not, that is a different story.)

  3. The second level. For thinking people who think beyond the fix-its. Since we have now row uniqueness, what in Heaven's name is the purpose of the ID field, why do we even have it ??? Oh, because the chihuahua is named Rotty and we are afraid to touch it.

    The declaration that it is a PRIMARY KEY is false, but it remains, causing confusion and false expectations. The only genuine Key there is, is the (name_last, name_fist), and it is a Alternate Key at this point.

    Therefore the ID field is totally superfluous; and so is the index that supports it; and so is the stupid AUTOINCREMENT; and so is the false declaration that it is a PRIMARY KEY; and any expectations you may have of it are false.

    Therefore remove the superfluous ID field. Try this code:

    -- [3] Relational Table
    -- Now that we have prevented duplicate data, the id field 
    -- AND its additional index serves no purpose, it is superfluous,
    -- like an udder on a bull.  If we remove the field AND the 
    -- supporting index, we obtain a Relational table.
    
    CREATE TABLE relational_table (
        name_first CHAR(30), 
        name_last  CHAR(30),
        CONSTRAINT PK
            PRIMARY KEY ( name_last, name_first )
        )
    
    INSERT relational_table VALUES
        ( 'Mickey', 'Mouse' ),
        ( 'Minnie', 'Mouse' )
    
    SELECT *
        FROM relational_table
    
    INSERT relational_table VALUES
        ( 'Mickey', 'Mouse' )
    

Works just fine, works as intended, without the extraneous fields and indices.

Please remember this, and do it right, every single time.

2.3  False Teachers

In these end times, as advised, we will have many of them. Note well, the "teachers" who propagate ID columns, by virtue of the detailed evidence in this post, simply do not understand the Relational Model or Relational Databases. Especially those who write books about it.

As evidenced, they are stuck in pre-1970 ISAM technology. That is all they understand, and that is all that they can teach. They use an SQL database container, for the ease of Access, recovery, backup, etc, but the content is pure Record Filing System with no Relational Integrity, Power, or speed. AFAIC, it is a serious fraud.

In addition to ID fields, of course, there are several items that are key Relational-or-not concepts, that taken together, cause me to form such a grave conclusion. Those other items are beyond the scope of this post.

One particular pair of idiots is currently mounting an assault on First Normal Form. They belong in the asylum.

3  Solution

Now for the rest of your question.

3.1  Answers

Is there a way that I can create a relational table without losing auto increment features?

That is a self-contradicting sentence. I trust you will understand from my explanation, Relational tables have no need for AUTOINCREMENT "features"; if the file has AUTOINCREMENT, it is not a Relational table.

AUTOINCREMENT or IDENTITY is good for one thing only: if, and only if, you want to create an Excel spreadsheet in the SQL database container, replete with fields named A, B, and C, across the top, and record numbers down the left side. In database terms, that is the result of a SELECT, a flattened view of the data, that is not the source of data, which is organised (Normalised).

Another possible (but not preferred) solution may be there is another primary key in the first table, which is the username of the user, not with an auto increment statement, of course. Is it inevitable?

In technical work, we don't care about preferences, because that is subjective, and it changes all the time. We care about technical correctness, because that is objective, and it does not change.

Yes, it is inevitable. Because it is just a matter of time; number of bugs; number of "can't dos"; number of user screams, until you face the facts, overcome your false declarations, and realise that:

  • the only way to ensure that user rows are unique, that user_names are unique, is to declare an UNIQUE constraint on it

  • and get rid of user_id or id in the user file

  • which promotes user_name to PRIMARY KEY

Yes, because your entire problem with the third table, not coincidentally, is then eliminated.

That third table is an Associative Table. The only Key required (Primary Key) is a composite of the two parent Primary Keys. That ensures uniqueness of the rows, which are identified by their Keys, not by their IDs.

I am warning you about that because the same "teachers" who taught you the error of implementing ID fields, teach the error of implementing ID fields in the Associative Table, where, just as with an ordinary table, it is superfluous, serves no purpose, introduces duplicates, and causes confusion. And it is doubly superfluous because the two keys that provide are already there, staring us in the face.

Since they do not understand the RM, or Relational terms, they call Associative Tables "link" or "map" tables. If they have an ID field, they are in fact, files.

3.2  Lookup Tables

ID fields are particularly Stupid Thing to Do for Lookup or Reference tables. Most of them have recognisable codes, there is no need to enumerate the list of codes in them, because the codes are (should be) unique.

ENUM is just as stupid, but for a different reason: it locks you into an anti-SQL method, a "feature" in that non-compliant "SQL".

Further, having the codes in the child tables as FKs, is a Good Thing: the code is much more meaningful, and it often saves an unnecessary join:

        SELECT ...
            FROM child_table           -- not the lookup table
            WHERE gender_code = "M"    -- FK in the child, PK in the lookup

instead of:

        SELECT ...
            FROM child_table
            WHERE gender_id = 6        -- meaningless to the maintainer

or worse:

        SELECT ...
            FROM child_table C         -- that you are trying to determine
            JOIN lookup_table L
                ON C.gender_id = L.gender_id
            WHERE L.gender_code = "M"  -- meaningful, known

Note that this is something one cannot avoid: you need uniqueness on the lookup code and uniqueness on the description. That is the only method to prevent duplicates in each of the two columns:

        CREATE TABLE gender (
            gender_code  CHAR(2)  NOT NULL,
            name         CHAR(30) NOT NULL

            CONSTRAINT PK 
                PRIMARY KEY ( gender_code )

            CONSTRAINT AK 
                UNIQUE ( name )
            )

3.3  Full Example

From the details in your question, I suspect that you have SQL syntax and FK definition issues, so I will give the entire solution you need as an example (since you have not given file definitions):

    CREATE TABLE user (                 -- Typical Identifying Table
        user_name  CHAR(16) NOT NULL,   -- Short PK
        name_first CHAR(30) NOT NULL,   -- Alt Key.1
        name_last  CHAR(30) NOT NULL,   -- Alt Key.2
        birth_date DATE     NOT NULL    -- Alt Key.3

        CONSTRAINT PK                   -- unique user_name
            PRIMARY KEY ( user_name )

        CONSTRAINT AK                   -- unique person identification
            PRIMARY KEY ( name_last, name_first, birth_date )
        )

    CREATE TABLE sport (                  -- Typical Lookup Table
        sport_code  CHAR(4)  NOT NULL,    -- PK Short code
        name        CHAR(30) NOT NULL     -- AK

        CONSTRAINT PK 
            PRIMARY KEY ( sport_code )

        CONSTRAINT AK 
            PRIMARY KEY ( name )
        )

    CREATE TABLE user_sport (           -- Typical Associative Table
        user_name  CHAR(16) NOT NULL,   -- PK.1, FK
        sport_code CHAR(4)  NOT NULL,   -- PK.2, FK
        start_date DATE     NOT NULL

        CONSTRAINT PK 
            PRIMARY KEY ( user_name, sport_code )

        CONSTRAINT user_plays_sport_fk
            FOREIGN KEY     ( user_name )
            REFERENCES user ( user_name )

        CONSTRAINT sport_occupies_user_fk
            FOREIGN KEY      ( sport_code )
            REFERENCES sport ( sport_code )
        )

There, the PRIMARY KEY declaration is honest, it is a Primary Key; no ID; no AUTOINCREMENT; no extra indices; no duplicate rows; no erroneous expectations; no consequential problems.

3.4  Relational Data Model

Here is the Data Model to go with the definitions.

Data Model for User::Sport

  • As a PDF

  • If you are not used to the Notation, please be advised that every little tick, notch, and mark, the solid vs dashed lines, the square vs round corners, means something very specific. Refer to the IDEF1X Notation.

  • A picture is worth a thousand words; in this case a standard-complaint picture is worth more than that; a bad one is not worth the paper it is drawn on.

  • Please check the Verb Phrases carefully, they comprise a set of Predicates. The remainder of the Predicates can be determined directly from the model. If this is not clear, please ask.

Exponential answered 19/4, 2015 at 5:21 Comment(28)
I have read a lot of your great replies regarding databases, but when creating a "Contacts Database", how would you define the PK? The contact database will be used world-wide, so a contraint on only the first and last name will be problematic. How many fields would you use to create the PK, or is an "auto id" more appropiate?Coze
@Hamster_NL. (a) Thank you. Votes are important in this "democracy". (b) I answer questions in context, no point in scaring ardent beginners who ask for a crucifix, by giving them a visitation. (c) Autoincrement is not a Key, in any table, it guarantees duplicate rows. The standard for Identifying Person (eg. think passport info) is at the bottom of this model.Exponential
@Hamster_NL. (d) Recording it in a real RDb means BirthPlace gets Normalised into at leastCountryCode + Town,and sometimes requires StateCode. I always use the latter, I have never had to upgrade. That means those tables need to be populated first. (e) Such a PK is too wide, a bad option for migrating into the children, but it cannot be abandoned because it provides row uniqueness. Therefore make the PK an AK, and create a surrogate for the PK. A PK that is too wide, is the one & only valid reason for surrogate. Ask a new question if you want more detail, and I will answer it.Exponential
@Exponential Thanks, your answer changed my perspective totally. For better clarification I tried to create tables from your 'Full Example' section. In table user_sport you have declared (name_last, name_first) as PK which is not exists in the table so it gives me error. And user_name is FK but it is not Key in user table. Please suggest whether I'm mistaken anything?Devorahdevore
@Mohit Mehta. The mistake was mine, not yours, humble apologies. Corrected. I added a small data model to the Answer.Exponential
@Hamster_NL. I added a small data model to the Answer, and provided the full quid for Person.Exponential
If you expect names to change often, you may want to use an auto increment PK so you don't need to adjust values that could be embedded in many places across all your data. Also, Arbitrary Capitalization is super annoying!Morril
I know this is an old post but are you seriously suggesting using a persons name as a primary key?!? For one: It'd not unique, lots of people have the same names (some will even have the same DOB). And two: it's not immutable. People can change their names so should be able to in software too.Basidium
Error: near line 7: table dumb_file has 3 columns but 2 values were supplied Error: near line 8: table dumb_file has 3 columns but 2 values were supplied Error: near line 9: table dumb_file has 3 columns but 2 values were suppliedPentlandite
@HackSlash. That is a copy-paste error due to lack of understanding on your part. The id is an IDENTITY or AUTO-INCREMENT. It is supplied by the platform. It cannot be supplied as a value. Again, it is not a Key by any means.Exponential
I think it's actually a limitation of the Online SQLite Compiler I used. You should avoid insulting people as your first reaction. You're the kind of person that gives StackOverflow a bad reputation. Plus, your SQLFiddle errors out with a different error on each DB engine. None of them get past "Build Schema".Pentlandite
@HackSlash. 1) SQLite is not ANSI SQL. 2) Every SQL and NON-sql platform has different syntax. 3) When sample SQL Code is given, developers understand that it is a sample, that it cannot possibly work on every platform, that they must insert the particulars for their particular platform. 4) If you need assistance, please open a new question.Exponential
I know this is an old post, but wanted to ask: Given that there are many Falsehoods Programmers Believe About Names, how do we deal with situations where it is simply impossible to identify uniqueness without an ID? In other words, two people may have the same name and DOB, and still be different people. We may not be allowed to collect further identifying info. Our only way of getting uniqueness is trusting the data inputter.Jeffryjeffy
Furthermore, a wide key is a problem even at fairly small varchar sizes used on names, never mind that names and addresses can be silly long. This is very common in most domains. Another example: sensors generate events. We cannot even use sensorName, timestamp as the PK, we may have two events generated simultaneously. Or a webrequest is made, it might be by the same IP address at the same instant, or we may not have such info, how do we uniquify that without a simple ID column? Sometimes the unique info simply doesn't exist.Jeffryjeffy
@Jeffryjeffy 1) Yes, there are, and such are rare. Yes, more identifying info is required. The point is to not use any ID` as a RecordId for all the reasons stated. That does not mean, never use it: the linked Data Model, left bottom corner, defines one. My IDEF1X Intro further explains it.Exponential
2) Performance considerations. That should never limit a logical [DM] design. All genuine SQL platforms have no problem at all with composite Keys; wide Keys; etc. Never use varAnything in a Key, that is stupid on any platform. Always use fixed width in keys. 3) DATETIME DataType has microsecond resolution on SQL platforms. On the pretenders, add a Sequence column. 4) the unique info simply doesn't exist The Relational Model* demands unique rows. So manufacture a column. 5) You have not appreciated the fact that an ID column, in any form, cannot unique-ify a row.Exponential
Great post! Identity is such a fundamental concept. This natural key vs artificial key distinction makes me think of stateless vs stateful, or R-value vs L-value, or value semantics vs reference semantics, i.e. row identified by value (its natural keys) vs row identified by location (its artificial keys). @KyleHowells raised two important questions: 1. Should the relation between a row and its meaning (the referent entity) be one-to-one (e.g. a row denoting multiple artists vs a single artist)? 2. Should a key (natural or artificial) be immutable (e.g. an immutable email key)?Velez
@Maggyero 0) Person is fully articulate in the linked data model. 1) It is the Relational Key that gives meaning to the row. Yes, 1::1 2) A Key needs to be stable, not immutable. Every thing in the universe is mutable, deal with it (Sally Brown marries Fred Smith and becomes Sally Smith from that moment). Immutability is a meaningless criteria, its only value is to artificially elevate RecordIDs.Exponential
Thanks. In Codd’s example given in my post, is the relation between employees and their children one-to-many or many-to-many?Velez
@Maggyero 0) From your post, it appears there are a few things in the RM that you do not understand (the terms are dated, not in use these days). Eg. tree means Directed Acyclic Graph. The RM is based on the Hierarchic Model, hierarchies are normal in nature. Eg. Codd uses the term domain 83 times: without contradicting the core meaning, it means distinctly different things in different contexts. 1) Fig 3(a) is a tree. 2) Non-simple domain here means repeating group or subordinate relation aka child. 3) Employee::Child is 1::0-to-n.Exponential
4) Get rid of your manufactured IDs: they are not made up from the data; and they will stuff you up.Exponential
@Exponential Yes I used ‘id’ as it is shorter than a composite key, but I really meant natural key. And also because I did not know what natural attributes can identify an artist (his artist name is not enough). Any suggestion?Velez
@Exponential What makes you say that the relation between employees and their children is not many-to-many, i.e. that couples are not considered in Codd’s example? Why would two records with the same childname attribute but different man# attributes in the children relation schema always denote two different children?Velez
@Maggyero 1) (ArtistName, Birthdate) or (CountryCode, ArtistName) 2) Because it is a textbook example, not intended to render reality. As you move from textbook to reality, yes, you will add many tables. That says precisely nothing about the textbook example.Exponential
Claiming that the relation schema with the man# and childname attributes is one-to-many instead of many-to-many means for example that the one-to-many relation {(1, ’Thomas’), (2, ‘Paul’), (2, ’Julia’)} is a possible instance of this schema, and that the many-to-many relation {(1, ’Thomas’), (2, ‘Paul’), (2, ’Julia’), (3, ’Thomas’)} is not. But nowhere does Codd write that the second relation is forbidden. The tuples (1, ’Thomas’) and (3, ’Thomas’) could denote two different children named ’Thomas’ or the same child, i.e. the relation between a tuple and its referent is many-to-one.Velez
Let us continue this discussion in chat.Exponential
@Maggyero You misunderstand the purpose of an academic paper. It is not a "claim", it is a dictate, Codd is giving us an example of his Normalisation, he is not giving a data model that reflects reality in all cases.Exponential
@Exponential Awesome post!Dinh

© 2022 - 2024 — McMap. All rights reserved.