Entity Framework Core 8 Where IN vs Where IN OPENJSON

Asked 22/12, 2023 at 13:42 Answered 10/6, 2024 at 7:9

Solved c#entity-framework-core sql-server-2017 ef-core-8.0

EF Core 8 now inlines values in where statements using a combination of WHERE IN and OPENJSON as opposed to the previous WHERE IN(...).

This change is noted in the documentation and the stated reason is as follows:

The inlining of values here is done in such a way that there is no chance of a SQL injection attack. The change to use JSON described below is all about performance, and nothing to do with security.

Unfortunately, OPENJSON performance on our 2017 SQL Server instance is poor.

The query below, generated by EF Core 8, takes 1.8 seconds to run and results in nearly 400,000 reads:

DECLARE @__scheduleTagIds_0 nvarchar(4000) = N'[5835,5970,6563,6564,6565,6645,6835,6850,7034,7127]';

SELECT  [s].[ScheduleTagId]
       ,[s].[MustStartProdBy]
FROM    [ScheduleTagMustStartBy] AS [s]
WHERE   [s].[ScheduleTagId] IN (
    SELECT  [s0].[value]
    FROM    OPENJSON(@__scheduleTagIds_0) WITH ([value] int '$') AS [s0]
)

If I refactor the query to use a standard WHERE IN(...), the execution time drops to 120ms and 29,000 reads:

SELECT  [s].[ScheduleTagId]
       ,[s].[MustStartProdBy]
FROM    [ScheduleTagMustStartBy] AS [s]
WHERE   [s].[ScheduleTagId] IN (5835,5970,6563,6564,6565,6645,6835,6850,7034,7127)

I have hundreds of queries in my application that use .Where(x => [collection].Contains(x.Id)) and I'm very concerned about the performance degradation I'm seeing in select queries.

Question

What can I do to to mitigate this issue? I'm open to options either in EF or on the SQL Server side (though don't want to change the compatibility level of the database).

Sangria answered 22/12, 2023 at 13:42 Comment(10)

SQL Server 2016 is out of mainstream support (even 2017 is), so the EF 8 team doesn't have to limit itself to only stuff that works fine with 2016 – Macle 22/12, 2023 at 13:48

In my experience it's worth using it when you have large amounts of items. With these small amounts you better generate queries with chained "or" predicates. Ideally, EF would have some switch in their query translator that does that automatically, but it hasn't. All in all, I've experienced that it's a huge improvement with large amounts, performance-wise, but also because it prevents query plan pollution. – Chapfallen 22/12, 2023 at 13:49

What's the patch level of the server? – Macle 22/12, 2023 at 13:49

I looked at the wrong SQL Server. This one is running 2017. I updated the question/tags to reflect that. I think the comments are still valid, however. Current version is Microsoft SQL Server 2017 (RTM-GDR) (KB5029375) – Sangria 22/12, 2023 at 13:53

Please share the execution plan for both slow and fast queries, please also show the tables and index definitions. I would imagine you would need an index (ScheduleTagId) INCLUDE (MustStartProdBy) – Axum 22/12, 2023 at 14:8

@Axum why would index requirements be different for the queries? They are both asking the same question, right? – Sangria 22/12, 2023 at 14:19

I didn't say they would be different,, they probably are the same, but seeing the existing access patterns might inform us better. Eg query 1 might use a naive index scan with a nested loop against the JSON, and query 2 might use a hash join. Both are inefficient, but query 2 is much more inefficient. Point is: we need more info. – Axum 22/12, 2023 at 14:23

See also github.com/dotnet/efcore/issues/32394 – Chapfallen 22/12, 2023 at 15:23

So the problem is that OPENJSON gives a single execution plan that is used for any size collection but this has a fixed cardinality guess that it will emit 50 rows. Maybe they should have adopted Nick Craver's original suggested solution and just parameterise the actual values with variants for 5, 10, 50, 100, 500, 1000. github.com/dotnet/efcore/issues/13617 – Carolinecarolingian 22/12, 2023 at 22:10

We are seeing the exact same problem. Updating to EF8 has caused signficant performance issues because of OpenJSON. Yes, it saves the query plan, but the query itself has caused significant performance issues. Wish this were an option, something like .ContainsWithOpenJson or .ContainsAsParameterized We can't reverse compatibility mode for other reasons but we use .Contians a lot. This is a big problem for us. – Tinishatinker 31/12, 2023 at 20:41

Based on the documentation, the only way to prevent the use of OPENJSON(... is to change the DB compatibility level to 120 (lower than SQL Server 2016). This is not an ideal solution. Is there any other way to limit/prevent the usage of OPENJSON?

You don't have to change the DB compatibility level to 120. You can call UseCompatibilityLevel(120) to tell EF to generate the old style SQL regardless of what the compat level is actually set to.

There will also be an option to avoid this global setting and just constant-ize the specific queries that are problematic. But I don't think this EF.Constant approach has made its way to the official release yet so you would need to use a daily build.

Is there something configuration-wise on the SQL Server that would improve the performance of OPENJSON? It seems clear from the profiler data that SQL Server is not able to properly use index(es) when using OPENJSON.

It certainly can use indexes with OPENJSON it just might choose not to.

When you pass constants it can determine

How many items are you passing?
How many duplicates are you passing? If any?
What are the exact values you are passing? (which may be useful if you have skewed cardinalities)

With OPENJSON it can't tell any of that so it just falls back to guesses. This may lead to a different (worse) plan.

Carolinecarolingian answered 3/1, 2024 at 8:35 Comment(2)

Thanks for summing all of this up. I've been following the issue on github since someone shared it in a comment. Kind of a shocking regression from the EF team. I hope they deal with it quickly. – Sangria 3/1, 2024 at 10:58

EF.Constant was already relased in EF Core 8.0.2. – Hintze 19/2, 2024 at 6:43

I have experienced this issue too. Seems like using TOP you can achieve a separation between queries having a few items in the list where it can be very bad to assume ~50 items and queries having many items.

Modifying the SQL is for instance possible using a DbCommandInterceptor. An example implementation could look like this:

/// <summary>Contains query optimize interceptor</summary>
/// <remarks>
/// A problem with contains queries translated into OPENJSON are that SQL server cannot determine how many rows the query will yield.
/// We will therefore divide queries into groups of different sizes to provide SQL server with a clear max count to allow for better
/// query paths to be taken
/// </remarks>
public partial class SqlServerContainsQueryOptimizeInterceptor : DbCommandInterceptor {
    private static readonly int[] sizeGroups = [1, 2, 5, 10, 20, 50];

    /// <inheritdoc/>
    public override InterceptionResult<DbDataReader> ReaderExecuting(DbCommand command, CommandEventData eventData, InterceptionResult<DbDataReader> result) {
        ManipulateCommand(command);
        return result;
    }

    /// <inheritdoc/>
    public override ValueTask<InterceptionResult<DbDataReader>> ReaderExecutingAsync(DbCommand command, CommandEventData eventData, InterceptionResult<DbDataReader> result, CancellationToken cancellationToken = default) {
        ManipulateCommand(command);
        return new ValueTask<InterceptionResult<DbDataReader>>(result);
    }

    /// <inheritdoc/>
    public override InterceptionResult<object> ScalarExecuting(DbCommand command, CommandEventData eventData, InterceptionResult<object> result) {
        ManipulateCommand(command);
        return result;
    }

    /// <inheritdoc/>
    public override ValueTask<InterceptionResult<object>> ScalarExecutingAsync(DbCommand command, CommandEventData eventData, InterceptionResult<object> result, CancellationToken cancellationToken = default) {
        ManipulateCommand(command);
        return new ValueTask<InterceptionResult<object>>(result);
    }

    /// <summary>Handle the actual command manipulation</summary>
    private static void ManipulateCommand(DbCommand command) {
        var sql = command.CommandText;
        var newSql = openJsonMatcher.Replace(sql, m => {
            try {
                // Read the parameter value to determine the size of the array
                var paramName = m.Groups[1].Value;
                var paramValue = command.Parameters[paramName].Value?.ToString();
                if (paramValue == null) return m.Value;
                var items = JsonNode.Parse(paramValue)!.AsArray();

                // Replace the select statement with a top statement to help the query planner
                var group = sizeGroups.FirstOrDefault(x => x >= items.Count);
                return group != 0
                    ? m.Value.Replace("SELECT", "SELECT TOP " + group)
                    : m.Value;
            }
            catch {
                return m.Value;
            }
        });

        if (newSql != sql) {
            command.CommandText = newSql;
        }
    }

    private static readonly Regex openJsonMatcher = GetOpenJsonRegEx();
    [GeneratedRegex(@"IN\s+\(\s*SELECT\s+[^\s]+\s+FROM\s+OPENJSON\(([^)]+)\)(?: WITH \([^)]+\))?[^)]+\)")]
    private static partial Regex GetOpenJsonRegEx();
}

An interceptor can then be added to the opts builder when setting up SQL server, e.g.

optionsBuilder.UseSgsSql(connectionStr);
optionsBuilder.AddInterceptors(new SqlServerContainsQueryOptimizeInterceptor());

This seems like a reasonable solution to me, but I of course have been wrong before. Any feedback would be appreciated.

Apollonian answered 10/6, 2024 at 7:9 Comment(0)

Recommended topics

Hot tags