Inconsistent results with NEWID() and PERSISTED computed column
Asked Answered
E

2

6

I am getting odd results when using NEWID() in combination with a persistent computed column. Am I using some function wrong?

Not using persisted when creating the column, and therefore calculating values when selecting them, will return correct values. Updating the column (col1) will also return correct values.

DECLARE @test TABLE (
    Col1 INT,
    Contains2 AS CASE WHEN 2 IN (Col1) THEN 1 ELSE 0 END PERSISTED)

INSERT INTO @test (Col1) VALUES
    (ABS(CHECKSUM(NEWID()) % 5)),
    (ABS(CHECKSUM(NEWID()) % 5)),
    (ABS(CHECKSUM(NEWID()) % 5)),
    (ABS(CHECKSUM(NEWID()) % 5)),
    (ABS(CHECKSUM(NEWID()) % 5))

SELECT * FROM @test
UPDATE @test SET Col1 = Col1*1
SELECT * FROM @test

/*
Col1    Contains2
2   0
2   0
0   1
4   0
3   0

Col1    Contains2
2   1
2   1
0   0
4   0
3   0
*/
Exurb answered 30/6, 2016 at 10:38 Comment(4)
I think it's noteworthy that one gets the expected behavior with the PERSISTED keyword omitted. You might want to call that out in your question.Kopp
@DanGuzman Good point, updated the question.Exurb
cross-posted to dba.seApus
It's not the same question there although the code is the same.Errhine
V
4

Apparently, the query engine calculates the random number twice for each row.

First time for Col1, second time for the CASE statement of the persisted column.

Optimiser doesn't know, or doesn't care in this case that NEWID is a non-deterministic function and calls it twice.

Actually, it may even not have a choice. Do you want optimiser to create a temporary table behind the scenes, populate its Col1 with results of the expression that generates random numbers, then read that temporary table back and use these saved intermediary results for calculating the result of the CASE expression, then perform the final INSERT? In this case, it is cheaper for optimiser to calculate the expression twice without writing intermediary results to disk. In some other cases (say, when you have not 5, but 5 billion rows, or extra indexes), the estimated costs may be different and this behaviour would change.

I don't think you can do much about it. Just be aware of this behaviour. Always explicitly save the generated set of random numbers to a table, then perform further calculations based on them.

I reproduced it in SQL Server 2008 and 2014. Here is an execution plan that I got in SQL Server 2008, but it is not really interesting. In 2014 the plan is the same, except there is no Top operator.

plan 2008

Constant Scan operator outputs a Union1009 list, which is used in Compute Scalar later. I guess, it comes down to implementation details of Constant Scan and/or Compute Scalar operators.

Observed behaviour tells us that newid() is called twice per row here.

Vadim answered 30/6, 2016 at 13:17 Comment(9)
Strangely, the execution plan shows that the random values are only computed once. They are computed as part of the constant scan. After that, a compute scalar calculated the computed column.Errhine
@usr, I see a Constant scan operator and then a Compute Scalar for calculating CASE that uses output from a Constant scan. I don't see explicitly in the plan that results of Constant scan operator are stored somewhere in the memory and not recalculated as needed. In any case, the observed behaviour tells us that NEWID() is called twice per row.Vadim
I agree that's the problem. It's just not visible in the plan. Someone with deep internals knowledge probably can explain how this is implemented. Normally, scalars are computed lazily and once. No temp table needed for that.Errhine
@usr, by the way, the name of the operator - Constant Scan gives us a hint that optimiser believes that result of (ABS(CHECKSUM(NEWID()) % 5)) is a constant, so it is allowed to call this function twice.Vadim
Interesting, indeed it is NEWID(), even this generates unintuitive results: DECLARE @val TABLE (i int); INSERT INTO @val VALUES (0),(1),(2),(3),(4); SELECT * FROM @val v WHERE v.i = ABS(CHECKSUM(NEWID())) % 5;Exurb
@Kristofer, that query may produce non-intuitive result, but it is easy to see what is going on. Loop through 5 rows. Row i: generate a new random number, it can be equal to i or not, if equal, include the row in the result set. Such query can return anything between 0 and 5 rows, though with different probabilities. The effect is somewhat similar to your original question.Vadim
Vladimir, I don't see the 'values' or 'scalar operator' in my execution plan results. I'm running 2012, is it something in 2016 or an available add-on?Ommiad
In a related post, user Martin Smith commented "Already reported here BTW" connect.microsoft.com/SQLServer/Feedback/Details/2751288Ommiad
@John, I tested it on 2008 Standard and 2014 Express.Vadim
O
1

During testing, I removed the functions unrelated to NEWID, and showed the results if the NEWID were calculated ahead of time. It may be helpful to others.

DECLARE @test TABLE (
InsertType VARCHAR(30),
Col1 VARCHAR(5),
Contains2 AS CASE WHEN (Col1) LIKE '%2%' THEN 1 ELSE 0 END) --depends on Col1

INSERT INTO @test (InsertType, Col1) VALUES
    ('Compute With Insert', LEFT(NEWID(), 5)),
    ('Compute With Insert', LEFT(NEWID(), 5)),
    ('Compute With Insert', LEFT(NEWID(), 5)),
    ('Compute With Insert', LEFT(NEWID(), 5)),
    ('Compute With Insert', LEFT(NEWID(), 5))

SELECT * FROM @test

DECLARE @A VARCHAR(5) = LEFT(NEWID(), 5);
DECLARE @B VARCHAR(5) = LEFT(NEWID(), 5);
DECLARE @C VARCHAR(5) = LEFT(NEWID(), 5);
DECLARE @D VARCHAR(5) = LEFT(NEWID(), 5);
DECLARE @E VARCHAR(5) = LEFT(NEWID(), 5);

SELECT @A, @B, @C, @D, @E;

INSERT INTO @Test (InsertType, Col1) VALUES
('Compute Before Insert', @A), ('Compute Before Insert', @B), ('Compute Before Insert', @C), ('Compute Before Insert', @D), ('Compute Before Insert', @E)

SELECT * FROM @test

InsertType                 Col1        Contains2
Compute With Insert        C5507        0
Compute With Insert        C17D7        0
Compute With Insert        D9087        1
Compute With Insert        E2DB0        0
Compute With Insert        7D1AF        1
Compute Before Insert      31050        0
Compute Before Insert      2954C        1
Compute Before Insert      9E205        1
Compute Before Insert      DDF05        0
Compute Before Insert      ED708        0
Ommiad answered 30/6, 2016 at 16:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.