What data type is recommended for ID columns?
Asked Answered
K

9

18

I realize this question is very likely to have been asked before, but I've searched around a little among questions on StackOverflow, and I didn't really find an answer to mine, so here goes. If you find a duplicate, please link to it.

For some reason I prefer to use Guids (uniqueidentifier in MsSql) for my primary key fields, but I really don't know why this would be better. In many of tutorials I've walked myself through lately an automatically incremented int has been used. I can see pro's and cons with both:

  • A Guid is always of the same size and length, and there is no reason to worry about running out of them, whereas there is a limit to how many records you could have before you'd run out of numbers that fit in an int.
  • int is (at least in C#) a nullable type, which opens for a couple of shortcuts when querying for data.
  • And int is easier to read.
  • I bet you could come up with at least a couple of more things here.

So, as simple as the title says it: What is the recommended data type for ID (primary key) columns in a database?

EDIT: After recieving a couple of short answer, I must also add this follow-up question. Without it, your answer is neither compelling nor educating... ;) Why do you think so, and what are the cons of the other option that make you not choose that instead?

Kenleigh answered 31/5, 2009 at 14:12 Comment(1)
it should be pointed out that a GUID and an integer are just different ways of displaying and generating a sequence of bytes. Where the ints are generated sequentially, the GUIDs are generated "randomly" and there are more bytes in them. that means you don't need to see the existing state of the database to generate one. everything can be nullable in C# with an ? on it.Pliant
M
19

Any integer type of sufficient size to store anticipated data ranges. Generally 32 bit ints are viewed as too small (rightly or wrongly) for tables with a lot of rows or changes. A 64 bit int is plenty. Many databases won't have or won't use that integer type but will use a NUMBER type with specified scale and precision. 10-15 digits is a fairly common size.

The reason for choosing integer types is twofold:

  1. Size; and
  2. Speed.

The size of an integer is:

  • 32 bit: 4 bytes;
  • 64 bit: 8 bytes;
  • Binary coded decimal: two digits per byte plus as much as a byte for sign, scale and/or precision.

Compare that to a GUID, which is 128 bits or a normal string, which is at least one byte per character (more in certain character encodings) plus an overhead that might be as little as one byte (terminating null) or could be much more in some cases.

Sorting integers is trivial and, assuming they are unique and the range is sufficiently small, can actually be done in O(n) time, compared to, at best, O(n log n).

also, just as importantly, most databases can generate unique IDs by means of auto-increment columns and/or sequences. Guaranteeing uniqueness in an application is otherwise actually quite hard and tends to result in bloated keys.

Plus auto-generated integer keys are typically either loosely or absolutely ordered (depending on database and configuration), which is a useful quality. Randomly generated GUIDs are basically unordered, which is far less useful.

Meitner answered 31/5, 2009 at 14:15 Comment(0)
L
7

Popular databases allow for larger autoincrement fields for years now, so it's much less of an issue.

As for what to use, it's always a choice. One is not clearly better than the other, they have different characteristics and each is good in different scenarios. I have used both over time, and the next schema I work with I'll consider both.

Pros for GUID:

  • Should be unique across computers.
  • Random, unmemorable goo means people are likely to use this only for its intended purpose of an opaque identifier.

Pros for autoincrement:

  • Human understandable.
  • Sequential assignment means you can use a clustered index and impact performance.
  • Suitable for data partitioning.
Luellaluelle answered 31/5, 2009 at 14:24 Comment(0)
G
6

A big disadvantage of using GUID keys is that it is difficult to perform "ad-hoc" queries by hand. Sometimes it is very useful that you can do this:

SELECT * FROM User where UserID=452245

With GUID keys this can become very annoying.

I would recommend 64 bit integers

Gellman answered 31/5, 2009 at 14:25 Comment(1)
I would like to add that GUIDs are not easily human readable, so if i call up customer support with my Transaction ID, i rather give a number than a GUID. Only machines should read GUID.Diannediannne
T
2

If you use a long, you could create over 1000 a second and not run out of primary keys for 29 million years.

Others have already mentioned some of the advantages of using an integer type instead of a UUID/GUID. One of the big advantages is the speed and compactness of the indexes.

An application I was recently involved in where I did the database design, I needed UUIDs, but didn't want to give up the advantages of using longs for primary keys, so I had a "allIds" table that mapped every primary key in the system to a UUID. All my primary keys were generated from a single sequence, so they were all unique across all tables.

Tort answered 31/5, 2009 at 14:15 Comment(0)
D
2

Tell me what criteria you think are important.

What's required is to be unique within the table.

A GUID is a global probabilistically-unique identifier. It's also big. If you need your indices to be unique to within epsilon over every other database installation in the universe, it's a good choice. Otherwise, it's using lots of space unnecessarily.

An autoincrement number is good; it's small, and sure to be unique within the table. On the other hand, it gives you no protection against duplication; two entries, identical except for the magic number, are easy to create.

Using some value that is tied to the entity being describes avoids that, but you have the problem of dealing with uniqueness.

Dona answered 31/5, 2009 at 14:26 Comment(0)
H
0

If the database is distributed, where you could get records from other databases, the primary key needs to be unique within a table across all the databases. GUID solves this issue, albeit at the cost of space. A combination of autoincrement and namespace would be a good tradeoff.

It would be nice if databases could provide inbuild support for autoincrements with "prefixes". So in one database, I get IDs like X1,X2,X3 ... and so on whereas in the other database it could be Y1,Y2,Y3 ... and so on.

Humidor answered 31/5, 2009 at 14:47 Comment(1)
And what data type do you suppose your X1 and Y2 will be? Strings? In that case you're better off using a GUID...Ordnance
F
0

I asked a similar question which has a few answers that might help. Replication seems to be the biggest advantage of using GUIDs.

Reasons not to use an auto-incrementing number for a primary key

Fossil answered 31/5, 2009 at 14:49 Comment(0)
A
0

Follow Cletus's advice, with the additional caveat of it largely depends on what your storting. Never, ever, use a GUID. GUID's have a whole bundle of downsides, and only one or two upsides.

Ardithardme answered 21/6, 2009 at 18:53 Comment(0)
S
0

I never liked integers and incremented identifiers. It makes a problem when you want to copy data across different tables (two tables same ID) or across different databases. Guid is big as a string representative and it also comes to problem when you include ids into your web application urls. So I decided to use a short string version of Guid which in the db is like varchar(16). See code bellow (method WebHash()):

public static class IdentifyGenerator
{
    private static object objLock = new object();

    private static char[] sybmols = {
                         '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
                         'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
                         'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't',
                         'u', 'v', 'w', 'x', 'y', 'z',
                     };

    /// <summary>
    /// Creates a new Unique Identity HashCode (length 16 chars)
    /// </summary>
    /// <returns></returns>
    public static string WebHash(Guid fromGuid = default(Guid))
    {
        lock (objLock)
            return RandomString(16, (fromGuid != default(Guid) ? fromGuid.ToByteArray() : null));
    }

    public static string RandomString(int length, byte[] customBytes = null)
    {
        Stack<byte> bytes = customBytes != null ? new Stack<byte>(customBytes) : new Stack<byte>();
        string output = string.Empty;

        for (int i = 0; i < length; i++)
        {
            if (bytes.Count == 0)
                bytes = new Stack<byte>(Guid.NewGuid().ToByteArray());
            byte pop = bytes.Pop();
            output += sybmols[pop % sybmols.Length];
        }
        return output;
    }
}

The only disadvantage is when you create new rows in SQL. So you have to crate a similar sql function.

Will be happy to receive any critic in my address.

Stapleton answered 6/5, 2019 at 11:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.