Why are the primary keys generated by aspnet_regsql GUIDs?

Asked 26/4, 2011 at 1:27 Answered 26/4, 2011 at 2:24

I am developing an ASP.NET website in Visual Studio 2010 (with Service Pack 1, thank you very much). I want to utilize .NET's built-in Membership and Role providers for SQL Server 2008.

Now, I've been developing Microsoft technologies for a very long time, and have rubbed elbows with some of the finest SQL Server DBA's in the business. Every one of them has told me to stay away from GUIDS as primary keys when constructing a database table that will:

Have a very high record count.
Have a high volume of inserts and deletes.

Reason: Because the primary key is a clustered index!

This basically means that each record inserted into the table must obey the constraints of the index. So if the index is sorted ASC, the record, with the newly generated GUID, must be physically wedged, in proper sequence, into the data table in question.

This would be just fine for a table with only a few thousand records or so. SQL Server would only have to re-position a handful. However if the data table has several millions of records, and finds it must insert the new record at row 216. it could take a substantial amount of time (by web standards) to accomplish. It has to physically move all those rows down, to insert the new one.

So my question is simply this. Since Microsoft, and all the DBS's we know and love, have said NO to GUIDs as primary keys... why does the ASPNET_REGSQL tool create tables using a GUID as a primary key?

Or am I missing something? Is there a new feature to the SQL Profiler engine in 2008 that no longer sees GUIDS as a task?

Brusquerie answered 26/4, 2011 at 1:27 Comment(1)

Sequential GUIDs are one solution to this problem. Still though, not a fan of GUIDs for keys – Caress 26/4, 2011 at 1:35

Guids do have some strengths; For example, if you are generating guids in the application code, they can be generated in a web farm without worrying about ending up with the same ID. Another benefit is that pages in the database can be locked without likely causing any issues since it is unlikely that two randomly chosen rows will exist in the same page of data.

As far as what you said about several million rows of data - Guids are going to be ok as long as you are always asking the SQL server to return a single row of data. The biggest problem is going to be when you ask for some large subset of the data or when you are batch inserting a large number of rows; Then you are likely to be doing a lot of random I/O to get all of the rows that match your criteria or insert all of the rows into the random locations that the guids end up pointing to. Also, SQL doesn't have to "physically move all those rows down to insert the new one"; Data is stored on pages, and SQL usually will only have to change the data on one page in the data file to insert the row, possibly with a couple other pages updated, but it is not like inserting a row into a massive text file.

All of that said - yeah, I also generally prefer an integer for a primary key, but just wanted to point out that there are definitely situations in which guids make some sense.

Birdhouse answered 26/4, 2011 at 1:51 Comment(3)

What you said is true, and proportionate to the amount of data stored in the table using a primary GUID. But it's been my experience that it's a resource hog (according to SQL Profiler) on tables with a large amount of data (2 million +). I prefer to use INT myself, making the primary an identity. No brainer because the new records are always appended.. so no need for SQL to re-build the index pointers. So the question is.. should I use a custom provider with my own schema? Never assume Microsoft has all the correct solutions (sorry Joel)... – Brusquerie 26/4, 2011 at 2:25

I would probably say test out whatever you are considering. Just as a quick note though - With 2 million rows in a table, guids will be taking approximately 24MB of extra space (16 bytes per record instead of 4 bytes per record), and realistically speaking most databases don't end up with 2 million user accounts. Now if you are creating any tables that include the guid as a foreign key then you could have a problem; But be sure to measure your problem and the cost to work around it before going too far. – Birdhouse 26/4, 2011 at 11:28

One solution I used in the past (which I admit I didn't measure performance characteristics of) was to use the default membership table AND my own user table. My user table had an int primary key which was used in all of the other tables in the DB as the foreign key, and also a guid foreign key reference to the membership user table; On login I looked up my user table record and stored the int ID in the session (which was used then throughout the app). This let me use standard membership code without the (partially assumed) drain of guids littering my DB schema. – Birdhouse 26/4, 2011 at 11:36

There is nothing wrong with using GUIDs as a primary key. They can have some detriments if not used properly, sure, but consider a scenario where you have various databases at stores or other points of sale and, each night, you need to take all of the data from each location and combine it into a single master database at corporate. GUIDs are a great option here because you don't need to worry about identity conflicts.

Every one of them has told me to stay away from GUIDS as primary keys when constructing a database ... because the primary key is a clustered index!

A primary key doesn't have to use a clustered index, that's just the default index type used when creating a primary key.

In fact, if you look at the database schema used by the SqlMembershipProvider you'll see that there is a nonclustered index on the primary key column.

The following is the SQL script from the InstallCommon.sql script in %WINDIR%\Microsoft.NET\Framework\v4.0.30319:

  CREATE TABLE [dbo].aspnet_Users (
    ApplicationId    uniqueidentifier    NOT NULL FOREIGN KEY REFERENCES [dbo].aspnet_Applications(ApplicationId),
    UserId           uniqueidentifier    NOT NULL PRIMARY KEY NONCLUSTERED DEFAULT NEWID(),
    UserName         nvarchar(256)       NOT NULL,
    LoweredUserName  nvarchar(256)       NOT NULL,
    MobileAlias      nvarchar(16)        DEFAULT NULL,
    IsAnonymous      bit                 NOT NULL DEFAULT 0,
    LastActivityDate DATETIME            NOT NULL)

   CREATE UNIQUE CLUSTERED INDEX aspnet_Users_Index ON [dbo].aspnet_Users(ApplicationId, LoweredUserName)
   CREATE NONCLUSTERED INDEX aspnet_Users_Index2 ON [dbo].aspnet_Users(ApplicationId, LastActivityDate)

Note that the primary key column (UserId) is created using the statement PRIMARY KEY NONCLUSTERED and that the table's CLUSTERED index is created as a composite index on ApplicationId and LoweredUserName.

Median answered 26/4, 2011 at 2:24 Comment(2)

Are not the row ID's supposed to handle duplicates in a sync situation? I know you can choose NON-CLUSTERED, however a CLUSTERED index yeilds the best performance in a data heavy table. In any case, I will just go with the flow and address it IF it becomes an issue. Thanks for all your input, guys!! Always a blast to learn something new! – Brusquerie 26/4, 2011 at 2:42

@dotnettex: If the primary key is a numeric integer then you run the risk of collisions when replicating the data from the stores' databases to the corporate database. And the aspnet_Users table (as an example) has a CLUSTERED index - it's on ApplicationId and LoweredUsername. This makes sense since often you will be searching on username for a particular application (e.g., Membership.GetUser()). – Median 26/4, 2011 at 3:0

Recommended topics

Hot tags