LINQ Expression to return Property value?
Asked Answered
S

5

23

I'm trying to create a generic function to help me select thousands of records using LINQ to SQL from a local list. SQL Server (2005 at least) limits queries to 2100 parameters and I'd like to select more records than that.

Here would be a good example usage:

var some_product_numbers = new int[] { 1,2,3 ... 9999 };

Products.SelectByParameterList(some_product_numbers, p => p.ProductNumber);

Here is my (non-working) implementation:

public static IEnumerable<T> SelectByParameterList<T, PropertyType>(Table<T> items, 

IEnumerable<PropertyType> parameterList, Expression<Func<T, PropertyType>> property) where T : class
{
    var groups = parameterList
        .Select((Parameter, index) =>
            new
            {
                GroupID = index / 2000, //2000 parameters per request
                Parameter
            }
        )
        .GroupBy(x => x.GroupID)
        .AsEnumerable();

    var results = groups
    .Select(g => new { Group = g, Parameters = g.Select(x => x.Parameter) } )
    .SelectMany(g => 
        /* THIS PART FAILS MISERABLY */
        items.Where(item => g.Parameters.Contains(property.Compile()(item)))
    );

    return results;
}

I have seen plenty of examples of building predicates using expressions. In this case I only want to execute the delegate to return the value of the current ProductNumber. Or rather, I want to translate this into the SQL query (it works fine in non-generic form).

I know that compiling the Expression just takes me back to square one (passing in the delegate as Func) but I'm unsure of how to pass a parameter to an "uncompiled" expression.

Thanks for your help!

**** EDIT:** Let me clarify further:

Here is a working example of what I want to generalize:

var local_refill_ids = Refills.Select(r => r.Id).Take(20).ToArray();

var groups = local_refill_ids
    .Select((Parameter, index) =>
        new
        {
            GroupID = index / 5, //5 parameters per request
            Parameter
        }
    )
    .GroupBy(x => x.GroupID)
    .AsEnumerable();

var results = groups
.Select(g => new { Group = g, Parameters = g.Select(x => x.Parameter) } )
.SelectMany(g => 
    Refills.Where(r => g.Parameters.Contains(r.Id))
)
.ToArray()
;

Results in this SQL code:

SELECT [t0].[Id], ... [t0].[Version]
FROM [Refill] AS [t0]
WHERE [t0].[Id] IN (@p0, @p1, @p2, @p3, @p4)

... That query 4 more times (20 / 5 = 4)
Sair answered 20/2, 2009 at 1:27 Comment(0)
S
9

Easiest way to do this: Use LINQKit (Free, non-restrictive license)

Working version of code:

public static IEnumerable<T> SelectByParameterList<T, PropertyType>(this Table<T> items, IEnumerable<PropertyType> parameterList, Expression<Func<T, PropertyType>> propertySelector, int blockSize) where T : class
{
    var groups = parameterList
        .Select((Parameter, index) =>
            new
            {
                GroupID = index / blockSize, //# of parameters per request
                Parameter
            }
        )
        .GroupBy(x => x.GroupID)
        .AsEnumerable();

    var selector = LinqKit.Linq.Expr(propertySelector);

    var results = groups
    .Select(g => new { Group = g, Parameters = g.Select(x => x.Parameter) } )
    .SelectMany(g => 
        /* AsExpandable() extension method requires LinqKit DLL */
        items.AsExpandable().Where(item => g.Parameters.Contains(selector.Invoke(item)))
    );

    return results;
}

Example usage:

    Guid[] local_refill_ids = Refills.Select(r => r.Id).Take(20).ToArray();

    IEnumerable<Refill> results = Refills.SelectByParameterList(local_refill_ids, r => r.Id, 10); //runs 2 SQL queries with 10 parameters each

Thanks again for all your help!

Sair answered 20/2, 2009 at 16:1 Comment(4)
I'd be interested in what the TSQL for that does, compared to my InRange reply...Striking
SELECT [t0].[Id], ... [t0].[Version] FROM [Refill] AS [t0] WHERE [t0].[Id] IN (@p0, @p1, @p2, @p3, @p4, @p5, @p6, @p7, @p8, @p9) ... That query 2times (20 / 10 = 2)Sair
What is your suggestion for blockSize in order to optimize queries using LinqToSql? Or, to put it differently, is it better to have less queries with bigger blocks, or more queries with smaller blocks?Chew
You generally want as few blocks as possible. The main issue is that SQL Server only allows 2100 parameters per query/command. I'll usually use a blockSize of 2000.Sair
S
46

I've come up with a way to chunk the query into pieces - i.e. you give it 4000 values, so it might do 4 requests of 1000 each; with full Northwind example. Note that this might not work on Entity Framework, due to Expression.Invoke - but is fine on LINQ to SQL:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Linq.Expressions;
using System.Reflection;

namespace ConsoleApplication5 {
    /// SAMPLE USAGE
    class Program {
        static void Main(string[] args) {
            // get some ids to play with...
            string[] ids;
            using(var ctx = new DataClasses1DataContext()) {
                ids = ctx.Customers.Select(x => x.CustomerID)
                    .Take(100).ToArray();
            }

            // now do our fun select - using a deliberately small
            // batch size to prove it...
            using (var ctx = new DataClasses1DataContext()) {
                ctx.Log = Console.Out;
                foreach(var cust in ctx.Customers
                        .InRange(x => x.CustomerID, 5, ids)) {
                    Console.WriteLine(cust.CompanyName);
                }
            }
        }
    }

    /// THIS IS THE INTERESTING BIT
    public static class QueryableChunked {
        public static IEnumerable<T> InRange<T, TValue>(
                this IQueryable<T> source,
                Expression<Func<T, TValue>> selector,
                int blockSize,
                IEnumerable<TValue> values) {
            MethodInfo method = null;
            foreach(MethodInfo tmp in typeof(Enumerable).GetMethods(
                    BindingFlags.Public | BindingFlags.Static)) {
                if(tmp.Name == "Contains" && tmp.IsGenericMethodDefinition
                        && tmp.GetParameters().Length == 2) {
                    method = tmp.MakeGenericMethod(typeof (TValue));
                    break;
                }
            }
            if(method==null) throw new InvalidOperationException(
                "Unable to locate Contains");
            foreach(TValue[] block in values.GetBlocks(blockSize)) {
                var row = Expression.Parameter(typeof (T), "row");
                var member = Expression.Invoke(selector, row);
                var keys = Expression.Constant(block, typeof (TValue[]));
                var predicate = Expression.Call(method, keys, member);
                var lambda = Expression.Lambda<Func<T,bool>>(
                      predicate, row);
                foreach(T record in source.Where(lambda)) {
                    yield return record;
                }
            }
        }
        public static IEnumerable<T[]> GetBlocks<T>(
                this IEnumerable<T> source, int blockSize) {
            List<T> list = new List<T>(blockSize);
            foreach(T item in source) {
                list.Add(item);
                if(list.Count == blockSize) {
                    yield return list.ToArray();
                    list.Clear();
                }
            }
            if(list.Count > 0) {
                yield return list.ToArray();
            }
        }
    }
}
Striking answered 20/2, 2009 at 8:49 Comment(5)
This handles the case of queryable.Where(o => values.Contains(o.propertyToTest)) by replacing it with queryable.InRange(o => o.propertyToTest, blockSize, values) (if I understand it correctly), but I'm looking at a similar overflow on the 2100 parameters limit, with e.g. queryable.Where(o => !values.Contains(o.propertyToTest)). I'm trying to modify InRange() to obtain a NotInRange() equivalent, and I'm not sure how to do the boolean negation. My thinking was at the foreach (T record in source.Where(lambda)) line?Denticulation
Actually, after much hunting, I think I've found what's needed, appropriately from an answer you'd given a month previously to this one: #457816, ref "This also works well to negate a single operation:"Denticulation
@Marc - How would this be handled in VB? yield return obviously doesn't exist for us.Behre
What is the reason for making the batch size configurable? Wouldn't I always want this to be like 2090? So I'm just under the 2100 limit with a bit of play?Dogeared
The Invoke isn't necessary. You can re-use the selector lambda and parameter, although I admit it's a bit dirty. Avoiding the invoke will also make it usable on other ORMs I think. (I made it working on LLBLGen Pro using: gist.github.com/FransBouma/5e7031fe557df4b5b688Judiejudith
S
9

Easiest way to do this: Use LINQKit (Free, non-restrictive license)

Working version of code:

public static IEnumerable<T> SelectByParameterList<T, PropertyType>(this Table<T> items, IEnumerable<PropertyType> parameterList, Expression<Func<T, PropertyType>> propertySelector, int blockSize) where T : class
{
    var groups = parameterList
        .Select((Parameter, index) =>
            new
            {
                GroupID = index / blockSize, //# of parameters per request
                Parameter
            }
        )
        .GroupBy(x => x.GroupID)
        .AsEnumerable();

    var selector = LinqKit.Linq.Expr(propertySelector);

    var results = groups
    .Select(g => new { Group = g, Parameters = g.Select(x => x.Parameter) } )
    .SelectMany(g => 
        /* AsExpandable() extension method requires LinqKit DLL */
        items.AsExpandable().Where(item => g.Parameters.Contains(selector.Invoke(item)))
    );

    return results;
}

Example usage:

    Guid[] local_refill_ids = Refills.Select(r => r.Id).Take(20).ToArray();

    IEnumerable<Refill> results = Refills.SelectByParameterList(local_refill_ids, r => r.Id, 10); //runs 2 SQL queries with 10 parameters each

Thanks again for all your help!

Sair answered 20/2, 2009 at 16:1 Comment(4)
I'd be interested in what the TSQL for that does, compared to my InRange reply...Striking
SELECT [t0].[Id], ... [t0].[Version] FROM [Refill] AS [t0] WHERE [t0].[Id] IN (@p0, @p1, @p2, @p3, @p4, @p5, @p6, @p7, @p8, @p9) ... That query 2times (20 / 10 = 2)Sair
What is your suggestion for blockSize in order to optimize queries using LinqToSql? Or, to put it differently, is it better to have less queries with bigger blocks, or more queries with smaller blocks?Chew
You generally want as few blocks as possible. The main issue is that SQL Server only allows 2100 parameters per query/command. I'll usually use a blockSize of 2000.Sair
S
3

LINQ-to-SQL still works via standard SQL parameters, so writing a fancy expression isn't going to help. There are 3 common options here:

  • pack the ids into (for example) csv/tsv; pass down as a varchar(max) and use a udf to split it (at the server) into a table variable; join to the table variable
  • use a table-valued-parameter in SQL Server 2008
  • have a table on the server that you could push the ids into (perhaps via SqlBulkCopy) (perhaps with a "session guid" or similar); join to this table

The first is the simplest; getting a "split csv udf" is trivial (just search for it). Drag the udf onto the data-context and consume from there.

Striking answered 20/2, 2009 at 4:34 Comment(1)
This isn't necessary. See my answer below.Sair
A
0

Pass IQuerable to the Contains function instead of list or array. please see the below example

var df_handsets = db.DataFeed_Handsets.Where(m => m.LaunchDate != null).
                  Select(m => m.Name);
var Make = (from m in db.MobilePhones
    where (m.IsDeleted != true || m.IsDeleted == null)
        && df_handsets.Contains(m.Name)
    orderby m.Make
    select new { Value = m.Make, Text = m.Make }).Distinct();

when you pass list or array it is passed in form of parameters and its exceed the counts when the list items count is greater than 2100.

Alonso answered 13/4, 2012 at 10:22 Comment(1)
You are assuming the collection to match against is coming from the database itself. This is not always the case.Sair
C
0

You can create your own QueryProvider

public class QueryProvider : IQueryProvider
{
    // Translates LINQ query to SQL.
    private readonly Func<IQueryable, DbCommand> _translator;

    // Executes the translated SQL and retrieves results.
    private readonly Func<Type, string, object[], IEnumerable> _executor;

    public QueryProvider(
        Func<IQueryable, DbCommand> translator,
        Func<Type, string, object[], IEnumerable> executor)
    {

        this._translator = translator;
        this._executor = executor;
    }

    #region IQueryProvider Members

    public IQueryable<TElement> CreateQuery<TElement>(Expression expression)
    {
        return new Queryable<TElement>(this, expression);
    }

    public IQueryable CreateQuery(Expression expression)
    {
        throw new NotImplementedException();
    }

    public TResult Execute<TResult>(Expression expression)
    {
        bool isCollection = typeof(TResult).IsGenericType &&
            typeof(TResult).GetGenericTypeDefinition() == typeof(IEnumerable<>);
        var itemType = isCollection
            // TResult is an IEnumerable`1 collection.
            ? typeof(TResult).GetGenericArguments().Single()
            // TResult is not an IEnumerable`1 collection, but a single item.
            : typeof(TResult);
        var queryable = Activator.CreateInstance(
            typeof(Queryable<>).MakeGenericType(itemType), this, expression) as IQueryable;

        IEnumerable queryResult;

        // Translates LINQ query to SQL.
        using (var command = this._translator(queryable))
        {
            var parameters = command.Parameters.OfType<DbParameter>()
                .Select(parameter => parameter)
                .ToList();

            var query = command.CommandText;
            var newParameters = GetNewParameterList(ref query, parameters);

            queryResult = _executor(itemType,query,newParameters);
        }

        return isCollection
            ? (TResult)queryResult // Returns an IEnumerable`1 collection.
            : queryResult.OfType<TResult>()
                         .SingleOrDefault(); // Returns a single item.
    }       

    public object Execute(Expression expression)
    {
        throw new NotImplementedException();
    }

    #endregion

     private static object[] GetNewParameterList(ref string query, List<DbParameter> parameters)
    {
        var newParameters = new List<DbParameter>(parameters);

        foreach (var dbParameter in parameters.Where(p => p.DbType == System.Data.DbType.Int32))
        {
            var name = dbParameter.ParameterName;
            var value = dbParameter.Value != null ? dbParameter.Value.ToString() : "NULL";
            var pattern = String.Format("{0}[^0-9]", dbParameter.ParameterName);
            query = Regex.Replace(query, pattern, match => value + match.Value.Replace(name, ""));
            newParameters.Remove(dbParameter);
        }

        for (var i = 0; i < newParameters.Count; i++)
        {
            var parameter = newParameters[i];
            var oldName = parameter.ParameterName;
            var pattern = String.Format("{0}[^0-9]", oldName);
            var newName = "@p" + i;
            query = Regex.Replace(query, pattern, match => newName + match.Value.Replace(oldName, ""));
        }      

        return newParameters.Select(x => x.Value).ToArray();
    }
}


    static void Main(string[] args)
    {
        using (var dc=new DataContext())
        {
            var provider = new QueryProvider(dc.GetCommand, dc.ExecuteQuery);

            var serviceIds = Enumerable.Range(1, 2200).ToArray();

            var tasks = new Queryable<Task>(provider, dc.Tasks).Where(x => serviceIds.Contains(x.ServiceId) && x.CreatorId==37 && x.Creator.Name=="12312").ToArray();

        }

    }
Christenachristendom answered 7/2, 2014 at 16:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.