select top 1 * vs select top 1 1
Asked Answered
A

4

12

I know there's a lot of these questions, but I can't find one that relates to my question.

Looking at this question, Is Changing IF EXIST(SELECT 1 FROM ) to IF EXIST(SELECT TOP 1 FROM ) has any side effects?

Specifically referring to this section in the answer:

select * from sys.objects
select top 1 * from sys.objects
select 1 where exists(select * from sys.objects)
select 1 where exists(select top 1 * from sys.objects)

I'm running some of my own tests to properly understand it. As indicated in the answer:

select 1 where exists(select top 1 * from sys.objects)
select 1 where exists(select top 1 1 from sys.objects)

both cause the same execution plan and also causes the same plan as

select 1 where exists(select * from sys.objects)
select 1 where exists(select 1 from sys.objects)

From my research into questions like this one, “SELECT TOP 1 1” VS “IF EXISTS(SELECT 1”. I'm deducing that this is the agreed best practice:

select 1 where exists(select * from sys.objects)

My first question is why is this preferred over this:

select 1 where exists(select 1 from sys.objects)

In trying to understand it, I broke them down to their more basic expressions (I'm using 'top 1' to mimic an execution plan resembling exists):

select top 1 * from sys.objects
select top 1 1 from sys.objects

I now see that the first is 80% of the execution time (relative to the batch of 2) whilst the second is only 20%. Would it then not be better practice to use

select 1 where exists(select 1 from sys.objects)

as it can be applied to both scenarios and thereby reduce possible human error?

Adoration answered 11/9, 2015 at 2:50 Comment(2)
NB: There's a good write up on this here: sqlservercentral.com/articles/T-SQL/154149Spurling
Yeah, I also just read that article, it's a definitive test. I'm dying to find the time to test his results when using null instead of * or 1 just for completeness since null uses a pointer on the CPU vs memory.Adoration
K
13

SQL Server detects EXISTS predicate relatively early in the query compilation / optimisation process, and eliminates actual data retrieval for such clauses, replacing them with existence checks. So your assumption:

I now see that the first is 80% of the execution time (relative to the batch of 2) whilst the second is only 20%.

is wrong, because in the preceding comparison you have actually retrieved some data, which doesn't happen if the query is put into the (not) exists predicate.

Most of the time, there is no difference how to test for the existence of rows, except for a single yet important catch. Suppose you say:

if exists (select * from dbo.SomeTable)
...

somewhere in the code module (view, stored procedure, function etc.). Then, later, when someone else will decide to put WITH SCHEMABINDING clause into this code module, SQL Server will not allow it and instead of possibly binding to the current list of columns it will throw an error:

Msg 1054, Level 15, State 7, Procedure BoundView, Line 6
Syntax '*' is not allowed in schema-bound objects.

So, in short:

if exists (select 0 from ...)

is a safest, fastest and one-size-fits-all way for existence checks.

Kennykeno answered 11/9, 2015 at 5:3 Comment(2)
I really like this answer due to the WITH SCHEMABINDING information as that is definitely a big game changer and is the best argument for SELECT 1 vs SELECT *. Do you possibly have a link to a reference around that please?Adoration
Well, I have only heard about such a loophole, truth be told. Now that I have tested it myself, SQL Server doesn't allow you to shoot your own leg by throwing an error 1054, Syntax '*' is not allowed in schema-bound objects. Which makes my argument somewhat weak, though still valid as a one-size-fits-all approach. Oh yes, I got this error on 2012 version - don't know how older ones would respond.Kennykeno
F
5

The difference between these two:

select top 1 * from sys.objects
select top 1 1 from sys.objects

Is that in the first clause SQL server must fetch all the columns from the table (from any random row), but in the second it's just ok to fetch "1" from any index.

Things change when these clauses are inside exists clause, because in that case SQL Server knows that it doesn't actually have to fetch the data because it will not be assigned to anything, so it can handle select * the same way it would handle select 1.

Since exists checks just one row, it has internal top 1 built into it, so adding it manually doesn't change anything.

Weather to have select * or select 1 in exists clause is just based on opinion, and instead of 1 you could of course have 2 or 'X' or whatever else you like. Personally I always use ... and exists (select 1 ...

Foreboding answered 11/9, 2015 at 4:49 Comment(1)
Thanks, I get all that (most of it is in my op). My questions are around why the one is preferred over the other (since they're semantically the same when nested within and EXIST statement). For me it appears that the SELECT 1 seems to be the 'safer general rule of thumb', specially when advising technical consultants who do not study SQL as a careerAdoration
U
4

EXISTS is a type of subquery which can only return a boolean value based upon whether any rows are returned by the subquery. Selecting 1, or * or, whatever doesn't matter within this context because the result is always just true or false.

You can verify this by testing that these two statements produce the exact same plan.

select 1 where exists(select * from sys.objects)
select 1 where exists(select 1 from sys.objects)

What you select in your outer query DOES matter. As you found, these two statements produce very different execution plans:

select top 1 * from sys.objects
select top 1 1 from sys.objects

The first one will be slower because it has to actually return real data. In this case, joining to the three underlying tables: syspalnames, syssingleobjrefs, and sysschobjs.

As to the preference of what you put inside your EXISTS subqueries - SELECT 1 or SELECT * - it doesn't matter. I usually say SELECT 1, but SELECT * is just as good and you'll see it in a lot of Microsoft documentation.

Unfailing answered 11/9, 2015 at 5:43 Comment(0)
B
0

I was looking for an answer to just the actual question contained in the title. I found it at this link:

Select Top 1 or Top n basically returns the first n rows of data based on the sql query. Select Top 1 1 or Top n s will return the first n rows with data s depending on the sql query.

For example, the query below produces the first name and last name of the first 10 matches. This query will return first name and last name only.

SELECT TOP 10 FirstName, LastName
  FROM [tblUser]
  where EmailAddress like 'john%'

Now, look at this query with select top 10 'test' - this will produce the same number of rows as in the previous query (same database, same condition) but the values will be 'test'.

SELECT TOP 10 'test'
  FROM [tblUser]
  where EmailAddress like 'john%'

So, select TOP 1 * returns the first row, while select TOP 1 1 returns one row containing just "1". This if the query returns at least one row, otherwise Null will be returned in both cases.

As additional example, this:

SELECT TOP 10 'test', FirstName
  FROM [tblUser]
  where EmailAddress like 'john%'

will return a table containing a column filled with "test" and another column filled with the first name of the first 10 matches of the query.

Bizerte answered 16/9, 2015 at 12:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.