Why is Linq-to-sql incorrectly removing fields in my union?
Asked Answered
S

2

8

I am trying to aggregate several queries together to provide a latest updates display for my users using a common data structure for union support. In my first function I have the following select clause:

.Select(x => new PlayerUpdateInfo
{
    FirstName = x.PodOptimizedSearch.FirstName,
    LastName = x.PodOptimizedSearch.LastName,
    RecruitId = x.RecruitId,
    Date = x.Date,
    UpdateMessage = x.IsAddedAction
      ? "Player was added to this recruiting board by " + x.Person.FirstName
        + " " + x.Person.LastName
      : "Player was removed from this recruiting board by " + 
        x.Person.FirstName + " " + x.Person.LastName,

    // Defaults for union support
    Team = string.Empty,
    UpdateComments = x.Comments,
    TeamId = 0,
    State = string.Empty
});

When called this correctly generates a query that returns 9 fields. The second method's select is:

select new PlayerUpdateInfo
{
    FirstName = recruit.FirstName,
    LastName = recruit.LastName,
    RecruitId = recruit.RecruitId,
    Date = asset.CreateDate,
    UpdateMessage = "New Full Game Added",

    // Defaults for union support
    Team = null,
    UpdateComments = null,
    TeamId = 0,
    State = null
    };

When this query is run by itself it correctly returns 9 values. However, when I try to do

var query = GetFirstQuery();
query = query.union(GetSecondQuery()); 

I get a sql exception that the query failed because all the queries do not have the same number of fields. Investigating the generated SQL shows that the first query is correct, but Linq is generating the 2nd (unioned) query to only have 7 fields. After testing it appears that Linq seems to be "optimizing" out the nulls so it's only returning one null column instead of 3, thus causing a mis-match.

Why is Linq-to-sql incorrectly generating the union and how can I work around this?


Edit:

Ok the issue seems to deal with union chaining in Linq-to-Sql for .Net 3.5. This does not happen in 4 apparently. With the following code:

    protected IQueryable<PlayerUpdateInfo> Test1()
    {
        return PodDataContext.Assets
                             .Select(x => new PlayerUpdateInfo
                             {
                                 Date = DateTime.Now,
                                 FirstName = x.Title,
                                 LastName = string.Empty,
                                 RecruitId = 0,
                                 State = string.Empty,
                                 Team = string.Empty,
                                 TeamId = 0,
                                 UpdateComments = string.Empty,
                                 UpdateMessage = string.Empty
                             });
    }

    protected IQueryable<PlayerUpdateInfo> Test2()
    {
        return PodDataContext.SportPositions
                             .Select(x => new PlayerUpdateInfo
                             {
                                 Date = DateTime.Now,
                                 FirstName = string.Empty,
                                 LastName = x.Abbreviation,
                                 RecruitId = 0,
                                 State = string.Empty,
                                 Team = string.Empty,
                                 TeamId = 0,
                                 UpdateComments = string.Empty,
                                 UpdateMessage = string.Empty
                             });
    }

I then union chain via: var q2 = Test1().Union(Test2()).Union(Test1());

In .Net 3.5 I get the following sql, which has a mis-match and fails

SELECT [t4].[value] AS [RecruitId], [t4].[Title] AS [FirstName], [t4].[value2] AS [LastName], [t4].[value22] AS [Team], [t4].[value3] AS [Date]
FROM (
    SELECT [t2].[value], [t2].[Title], [t2].[value2], [t2].[value2] AS [value22], [t2].[value3]
    FROM (
        SELECT @p0 AS [value], [t0].[Title], @p1 AS [value2], @p2 AS [value3]
        FROM [dbo].[Assets] AS [t0]
        UNION
        SELECT @p3 AS [value], @p4 AS [value2], [t1].[Abbreviation], @p5 AS [value3]
        FROM [dbo].[SportPositions] AS [t1]
        ) AS [t2]
    UNION
    SELECT @p6 AS [value], [t3].[Title], @p7 AS [value2], @p8 AS [value3]
    FROM [dbo].[Assets] AS [t3]
    ) AS [t4]

In .net 4 the following code is generated:

SELECT [t4].[value] AS [RecruitId], [t4].[Title] AS [FirstName], [t4].[value2] AS [LastName], [t4].[value3] AS [Team], [t4].[value4] AS [TeamId], [t4].[value5] AS [State], [t4].[value6] AS [UpdateMessage], [t4].[value7] AS [UpdateComments], [t4].[value8] AS [Date]
FROM (
    SELECT [t2].[value], [t2].[Title], [t2].[value2], [t2].[value3], [t2].[value4], [t2].[value5], [t2].[value6], [t2].[value7], [t2].[value8]
    FROM (
        SELECT @p0 AS [value], [t0].[Title], @p1 AS [value2], @p2 AS [value3], @p3 AS [value4], @p4 AS [value5], @p5 AS [value6], @p6 AS [value7], @p7 AS [value8]
        FROM [dbo].[Assets] AS [t0]
        UNION
        SELECT @p8 AS [value], @p9 AS [value2], [t1].[Abbreviation], @p10 AS [value3], @p11 AS [value4], @p12 AS [value5], @p13 AS [value6], @p14 AS [value7], @p15 AS [value8]
        FROM [dbo].[SportPositions] AS [t1]
        ) AS [t2]
    UNION
    SELECT @p16 AS [value], [t3].[Title], @p17 AS [value2], @p18 AS [value3], @p19 AS [value4], @p20 AS [value5], @p21 AS [value6], @p22 AS [value7], @p23 AS [value8]
    FROM [dbo].[Assets] AS [t3]
    ) AS [t4]

That is valid sql and works. As we are unable to bring our production systems up to .net 4 for various reasons, does anyone know a way I can work around this in .net 3.5?

Shortfall answered 25/6, 2012 at 22:5 Comment(7)
Interesting question. Have you tried to make a super-simple minimal example? Can you reproduce the same problem on the simplified example?Regulate
Try using a constructor with 9 parameters instead. Maybe LINQ behaves differently then.Sharla
Could you union on Anons, and Cast to PlayerUpdateInfo after "unioning" (this is not an answer, but i'd be curious to know if that's the select "as" PlayerUpdateInfo which creates the "optimization".Orris
I couldnt get the error in a simple example, but it looks like null and string.Empty literals in the select are treated somewhat differently ('select null as [field]' for nulls, and 'select @p0 as [field]' with a parameter for string.Empty). I guess GetFirstQuery and GetSecondQuery both return IQueryable?Touchdown
Just got home, I'll see if I can whip up a simple example of this happening. And yes, both functions return IQueryable<PlayerUpdateInfo>Shortfall
Unfortunately, getting a simple example to repro this is proving hard. I have figured out that this is apparently fixed in L2S in .net 4, as it only happens when my app pool is set to v2.0, and it works when it's v4, which is unfortunately not a solution since we can't move production to .net 4 :(Shortfall
I found a way to reproduce the issue and put it in the main questionShortfall
A
3

This is an updated answer - it would seem that in 3.5 the optimization cannot be switched off. I've tried numerous things in LinqPad, and as soon as there's a C# expression that's not derived from a column, the query compiler takes it out of the SQL.

So we need something that'll be derived from a column expression but which we can force always to be null.

Here's one solution I've been able to get to work - write a conditional expression that returns null when a column value is equal to itself; thus always returning null. It's not going to be pretty, because Linq to Sql in 3.5 appears to be very aggressive in optimizing a query (we have to use unique comparisons for each null we want, e.g. use == in one, then != on the next and so on); and it'll get uglier when we modify it to deal with nulls if you have to do this hack on a nullable column:

I'm just picking out column names that I can see from your code - but you might want to use different ones:

select new PlayerUpdateInfo  
{  
  FirstName = recruit.FirstName,  
  LastName = recruit.LastName,  
  RecruitId = recruit.RecruitId,  
  Date = asset.CreateDate,  
  UpdateMessage = "New Full Game Added",  

  // Defaults for union support  
  Team = recruit.FirstName = recruit.FirstName ? null : "",  
  UpdateComments = recruit.FirstName != recruit.FirstName ? "" : null,  
  TeamId = recruit.LastName = recruit.LastName ? null : "",  
  State = recruit.LastName != recruit.LastName ? "" : null
};

In LinqPad I get a column expression in the generated SQL for each of the statements where I use a ? :. The "" on the other side of the expression is because SQL moans about CASE statements having more than one null.

Clearly - this solution, apart from being ugly as sin, is bounded by how many columns are available for you to do these fake comparisons on, and indeed how many unique comparisons are mapped by L2S.

Avens answered 25/6, 2012 at 22:44 Comment(6)
Yeah, that's how I know it's an optimization relating to the constants, because making them different for each in the same select statement does change the number of fields. The problem is that it requires me to do something like make one of the fields a space, which changes the meaning of the queryShortfall
oh missed that sorry. I'll give that a shotShortfall
The nullstring trick didn't work, it's still getting optimized out :(Shortfall
Damnit we'll have to try a different value thenAvens
@Shortfall - have updated my answer. The Linq to Sql 3.5 query compiler is ridiculously aggressive in its optimisations; I think I've found a solution, but it's dirty as hell.Avens
Ugly as all hell, but it works and it's better than me creating custom stored procs/views for accomplishing this! Thank you very much!Shortfall
J
0
  1. Team, UpdateComments,State are null string or null. So you should give same value for both cases.
  2. If you use default comparer, UpdateMessage is different so objects are different. So you can concat instead of union.
Jezabella answered 25/6, 2012 at 22:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.