Behavior of unique index, varchar column and (blank) spaces
Asked Answered
S

1

15

I'm using Microsoft SQL Server 2008 R2 (with latest service pack/patches) and the database collation is SQL_Latin1_General_CP1_CI_AS.

The following code:

SET ANSI_PADDING ON;
GO

CREATE TABLE Test (
   Code VARCHAR(16) NULL
);
CREATE UNIQUE INDEX UniqueIndex
    ON Test(Code);

INSERT INTO Test VALUES ('sample');
INSERT INTO Test VALUES ('sample ');

SELECT '>' + Code + '<' FROM Test WHERE Code = 'sample        ';
GO

produces the following results:

(1 row(s) affected)

Msg 2601, Level 14, State 1, Line 8

Cannot insert duplicate key row in object 'dbo.Test' with unique index 'UniqueIndex'. The duplicate key value is (sample ).

The statement has been terminated.

‐‐‐‐‐‐‐‐‐‐‐‐

>sample<

(1 row(s) affected)

My questions are:

  1. I assume the index cannot store trailing spaces. Can anyone point me to official documentation that specifies/defines this behavior?
  2. Is there a setting to change this behavior, that is, make it recognize 'sample' and 'sample ' as two different values (which they are, by the way) so both can be in the index.
  3. Why on Earth is the SELECT returning a row? SQL Server must be doing something really funny/clever with the spaces in the WHERE clause because if I remove the uniqueness in the index, both INSERTs will run OK and the SELECT will return two rows!

Any help/pointer in the right direction would be appreciated. Thanks.

Sudatorium answered 27/2, 2012 at 6:1 Comment(0)
H
19

Trailing blanks explained:

SQL Server follows the ANSI/ISO SQL-92 specification (Section 8.2, , General rules #3) on how to compare strings with spaces. The ANSI standard requires padding for the character strings used in comparisons so that their lengths match before comparing them. The padding directly affects the semantics of WHERE and HAVING clause predicates and other Transact-SQL string comparisons. For example, Transact-SQL considers the strings 'abc' and 'abc ' to be equivalent for most comparison operations.

The only exception to this rule is the LIKE predicate. When the right side of a LIKE predicate expression features a value with a trailing space, SQL Server does not pad the two values to the same length before the comparison occurs. Because the purpose of the LIKE predicate, by definition, is to facilitate pattern searches rather than simple string equality tests, this does not violate the section of the ANSI SQL-92 specification mentioned earlier.

Here's a well known example of all the cases mentioned above:

DECLARE @a VARCHAR(10)
DECLARE @b varchar(10)

SET @a = '1'
SET @b = '1 ' --with trailing blank

SELECT 1
WHERE 
    @a = @b 
AND @a NOT LIKE @b
AND @b LIKE @a

Here's some more detail about trailing blanks and the LIKE clause.

Regarding indexes:

An insertion into a column whose values must be unique will fail if you supply a value that is differentiated from existing values by trailing spaces only. The following strings will all be considered equivalent by a unique constraint, primary key, or unique index. Likewise, if you have an existing table with the data below and try to add a unique restriction, it will fail because the values are considered identical.

PaddedColumn
------------
'abc'
'abc '
'abc  '
'abc    '

(Taken from here.)

Harmonics answered 27/2, 2012 at 6:12 Comment(3)
Thanks for the pointers, guys. Mea culpa for being too lazy to Google that by myself. In my opinion, the behavior defined by the standard is not intuitive. I'd imagine that 9 out of 10 developers would say that 'a' and 'a ' are NOT the same string, but oh well.Sudatorium
This is one of the most unintuitive things I've encountered in Azure SQL so far...Hudnut
You explained the problem very clearly. But what should we do to overcome this?Slurp

© 2022 - 2024 — McMap. All rights reserved.