SQL Server 2008 Empty String vs. Space
Asked Answered
Y

9

95

I ran into something a little odd this morning and thought I'd submit it for commentary.

Can someone explain why the following SQL query prints 'equal' when run against SQL 2008. The db compatibility level is set to 100.

if '' = ' '
    print 'equal'
else
    print 'not equal'

And this returns 0:

select (LEN(' '))

It appears to be auto trimming the space. I have no idea if this was the case in previous versions of SQL Server, and I no longer have any around to even test it.

I ran into this because a production query was returning incorrect results. I cannot find this behavior documented anywhere.

Does anyone have any information on this?

Yoheaveho answered 9/9, 2009 at 13:56 Comment(6)
SQL 2005: select len(' ') returns 0Lagting
It does the same on Sql Server 2000.Timbre
This is a fascinating question. It seems to return equal no matter how many spaces you put in either string whether they match or not. After more experimentation I noticed that it is effectively doing a RTRIM on both sides of the equality operator before the comparison. It looks like you got an answer on the LEN function, but I am really interested in a more thorough answer than "varchars and equality are thorny in TSQ" for the equality part of your question.Keeper
Oracle does this too, I believe.Aerugo
In general I find that storing empty string is a bad idea and this is one of the reasons. I prefer the use of Null and find many problems when people try to make null information into a value like empty string or a data way out of the normal range.Berstine
I'm so happy it does this. Of course you should trim your user input and perhaps force null when an empty string is sent, but you might be dealing with an older DB, or you might just miss that logic for a piece of code. Its nice to be able to search for '' and null and not have to worry if there is mistakenly a space in the field.Tani
S
99

varchars and equality are thorny in TSQL. The LEN function says:

Returns the number of characters, rather than the number of bytes, of the given string expression, excluding trailing blanks.

You need to use DATALENGTH to get a true byte count of the data in question. If you have unicode data, note that the value you get in this situation will not be the same as the length of the text.

print(DATALENGTH(' ')) --1
print(LEN(' '))        --0

When it comes to equality of expressions, the two strings are compared for equality like this:

  • Get Shorter string
  • Pad with blanks until length equals that of longer string
  • Compare the two

It's the middle step that is causing unexpected results - after that step, you are effectively comparing whitespace against whitespace - hence they are seen to be equal.

LIKE behaves better than = in the "blanks" situation because it doesn't perform blank-padding on the pattern you were trying to match:

if '' = ' '
print 'eq'
else
print 'ne'

Will give eq while:

if '' LIKE ' '
print 'eq'
else
print 'ne'

Will give ne

Careful with LIKE though: it is not symmetrical: it treats trailing whitespace as significant in the pattern (RHS) but not the match expression (LHS). The following is taken from here:

declare @Space nvarchar(10)
declare @Space2 nvarchar(10)

set @Space = ''
set @Space2 = ' '

if @Space like @Space2
print '@Space Like @Space2'
else
print '@Space Not Like @Space2'

if @Space2 like @Space
print '@Space2 Like @Space'
else
print '@Space2 Not Like @Space'

@Space Not Like @Space2
@Space2 Like @Space
Splanchnology answered 9/9, 2009 at 14:14 Comment(6)
Nice answer. I had not noticed that in the LEN documentation. It's not limited to LEN though. The RIGHT and LEFT function exhibits similar behavior, but there it's not documented. It seems to be the literal with a space that causes the issue. I noticed this also returns equal: if '' = SPACE(1) print 'equal' else print 'not equal' I'm not really interested in getting the true length, I was just confused why when I was looking for a space in a column, all the columns which were empty strings were returned.Yoheaveho
Also, nice information about the LIKE statement. I guess the moral of the story is try not to get yourself in the position where you need to compare a space and an empty string.Yoheaveho
The issue is bigger than comparing a space to an empty string. Comparing any two strings that end in a different number of spaces exhibits the same behavior.Keeper
@butterchicken: Sorry for such a late post, I just saw this question, but when I ran this (the last one) on my sql-server-2008 r2 I get, @Space Not Like @Space2 @Space2 Not Like @Space . Any idea why?Jacobite
Confirmed on SQL Server 2012 & SQL Server 2014, the result is @Space Not Like @Space2 @Space2 Not Like @Space Graham
@butterchicken: For step 2 "Pad with blanks until length equals that of longer string", "length equals" means number of characters equality or number of bytes equality?Graham
N
22

The = operator in T-SQL is not so much "equals" as it is "are the same word/phrase, according to the collation of the expression's context," and LEN is "the number of characters in the word/phrase." No collations treat trailing blanks as part of the word/phrase preceding them (though they do treat leading blanks as part of the string they precede).

If you need to distinguish 'this' from 'this ', you shouldn't use the "are the same word or phrase" operator because 'this' and 'this ' are the same word.

Contributing to the way = works is the idea that the string-equality operator should depend on its arguments' contents and on the collation context of the expression, but it shouldn't depend on the types of the arguments, if they are both string types.

The natural language concept of "these are the same word" isn't typically precise enough to be able to be captured by a mathematical operator like =, and there's no concept of string type in natural language. Context (i.e., collation) matters (and exists in natural language) and is part of the story, and additional properties (some that seem quirky) are part of the definition of = in order to make it well-defined in the unnatural world of data.

On the type issue, you wouldn't want words to change when they are stored in different string types. For example, the types VARCHAR(10), CHAR(10), and CHAR(3) can all hold representations of the word 'cat', and ? = 'cat' should let us decide if a value of any of these types holds the word 'cat' (with issues of case and accent determined by the collation).

Response to JohnFx's comment:

See Using char and varchar Data in Books Online. Quoting from that page, emphasis mine:

Each char and varchar data value has a collation. Collations define attributes such as the bit patterns used to represent each character, comparison rules, and sensitivity to case or accenting.

I agree it could be easier to find, but it's documented.

Worth noting, too, is that SQL's semantics, where = has to do with the real-world data and the context of the comparison (as opposed to something about bits stored on the computer) has been part of SQL for a long time. The premise of RDBMSs and SQL is the faithful representation of real-world data, hence its support for collations many years before similar ideas (such as CultureInfo) entered the realm of Algol-like languages. The premise of those languages (at least until very recently) was problem-solving in engineering, not management of business data. (Recently, the use of similar languages in non-engineering applications like search is making some inroads, but Java, C#, and so on are still struggling with their non-businessy roots.)

In my opinion, it's not fair to criticize SQL for being different from "most programming languages." SQL was designed to support a framework for business data modeling that's very different from engineering, so the language is different (and better for its goal).

Heck, when SQL was first specified, some languages didn't have any built-in string type. And in some languages still, the equals operator between strings doesn't compare character data at all, but compares references! It wouldn't surprise me if in another decade or two, the idea that == is culture-dependent becomes the norm.

Noteworthy answered 9/9, 2009 at 15:20 Comment(4)
BOL describes the = operator thusly: "Compares the equality of two expressions (a comparison operator)." Whether the behavior is correct or not, you have to admit it is extremely confusing and non-standard in terms of the usage of this operator in most programming languages. MS should at least add a warning to the documentation about this behavior.Keeper
@JohnFx: See my too-long-for-a-comment response in my answer.Noteworthy
If your idea that the semantic of ‘=‘ is “is the same word/phrase” is correct, why then is ' this' = 'this' not true in SQL?Tushy
I didn't say that = between strings is same word/phrase. I said it is "not so much not so much 'equals' as it is 'are the same word/phrase [...]'." The language designers choices were motivated by the semantic I describe, but there were certainly details to be made precise based on many considerations, including practical ones with how collations are implemented. Treating leading and trailing spaces differently was one, perhaps made to facilitate sorting or other needs.Noteworthy
K
10

I found this blog article which describes the behavior and explains why.

The SQL standard requires that string comparisons, effectively, pad the shorter string with space characters. This leads to the surprising result that N'' = N' ' (the empty string equals a string of one or more space characters) and more generally any string equals another string if they differ only by trailing spaces. This can be a problem in some contexts.

More information also available in MSKB316626

Keeper answered 9/9, 2009 at 15:3 Comment(3)
Thanks. I am surprised that it is in the standard. I'm sure somebody much smarter than I am had a good reason for this.Yoheaveho
@John: did you mean to write ≠ (not equals) in your comment?Noteworthy
The original quote had an error in it which I copied directly. I updated the quote to reflect what the original author meant.Keeper
Y
5

There was a similar question a while ago where I looked into a similar problem here

Instead of LEN(' '), use DATALENGTH(' ') - that gives you the correct value.

The solutions were to use a LIKE clause as explained in my answer in there, and/or include a 2nd condition in the WHERE clause to check DATALENGTH too.

Have a read of that question and links in there.

Yttrium answered 9/9, 2009 at 14:12 Comment(0)
H
3

To compare a value to a literal space, you may also use this technique as an alternative to the LIKE statement:

IF ASCII('') = 32 PRINT 'equal' ELSE PRINT 'not equal'
Hegyera answered 24/2, 2011 at 20:3 Comment(0)
T
1

Sometimes one has to deal with spaces in data, with or without any other characters, even though the idea of using Null is better - but not always usable. I did run into the described situation and solved it this way:

... where ('>' + @space + '<') <> ('>' + @space2 + '<')

Of course you wouldn't do that for large amount of data but it works quick and easy for some hundred lines ...

Trammell answered 16/1, 2015 at 14:37 Comment(2)
The question was why SQL server behaved as it did, not how to handle such behavior in general. jhale would probably rather not modify his program code, only his server configuration.Devoted
Still, a good solution for doing a quick check. :)Gossett
S
1

As SQL - 92 8.2 comparison predicate saying:

If the length in characters of X is not equal to the length in characters of Y, then the shorter string is effectively replaced, for the purposes of comparison, with a copy of itself that has been extended to the length of the longer string by concatenation on the right of one or more pad char- acters, where the pad character is chosen based on CS. If CS has the NO PAD attribute, then the pad character is an implementation-dependent character different from any char- acter in the character set of X and Y that collates less than any string under CS. Otherwise, the pad character is a <space>.

Sike answered 2/4, 2022 at 10:38 Comment(0)
U
0

How to distinct records on select with fields char/varchar on sql server: example:

declare @mayvar as varchar(10)

set @mayvar = 'data '

select mykey, myfield from mytable where myfield = @mayvar

expected

mykey (int) | myfield (varchar10)

1 | 'data '

obtained

mykey | myfield

1 | 'data' 2 | 'data '

even if I write select mykey, myfield from mytable where myfield = 'data' (without final blank) I get the same results.

how I solved? In this mode:

select mykey, myfield
from mytable
where myfield = @mayvar 
and DATALENGTH(isnull(myfield,'')) = DATALENGTH(@mayvar)

and if there is an index on myfield, it'll be used in each case.

I hope it will be helpful.

Urdar answered 14/4, 2015 at 15:45 Comment(0)
M
0

Another way is to put it back into a state that the space has value. eg: replace the space with a character known like the _

if REPLACE('hello',' ','_') = REPLACE('hello ',' ','_')
    print 'equal'
else
    print 'not equal'

returns: not equal

Not ideal, and probably slow, but is another quick way forward when needed quickly.

Mindszenty answered 12/4, 2019 at 3:9 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.