Pass LINQ expression to another QueryProvider
Asked Answered
E

1

14

I have a simple custom QueryProvider that takes an expression, translates it to SQL and queries an sql database.

I want to create a small cache in the QueryProvider that stores commonly accessed objects so retrieval can happen without a database hit.

The QueryProvider has the method

public object Execute(System.Linq.Expressions.Expression expression)
{
    /// Builds an SQL statement from the expression, 
    /// executes it and returns matching objects
}

The cache sits as a field in this QueryProvider class and is a simple generic List.

If I use the List.AsQueryable method and pass the above expression into the List.AsQueryable's Provider's Execute method it doesn't work as desired. It looks like when an expression gets compiled the initial QueryProvider becomes an integral part.

Is it possible to pass an expression to a subsequent QueryProvider and execute the expression as desired?

The calling code looks vaguely as follows:

public class QueryProvider<Entity>()
{
    private List<TEntity> cache = new List<Entity>();

    public object Execute(System.Linq.Expressions.Expression expression)
    {
        /// check whether expression expects single or multiple result
        bool isSingle = true;

        if (isSingle)
        {
            var result = this.cache.AsQueryable<Entity>().Provider.Execute(expression);
            if (result != null) 
                return result;
        }

        /// cache failed, hit database
        var qt = new QueryTranslator();
        string sql = qt.Translate(expression);
        /// .... hit database
    }
} 

It doesn't return an error, instead it gets stuck in loop where this same provider is called over and over again.

Here's some more code showing what I'm trying to do:

Collection:

class Collection<Entity>
{

    internal List<Entity> cacheOne { get; private set; }
    internal Dictionary<Guid, Entity> cacheTwo { get; private set; }

    internal Collection()
    {
        this.cacheOne = new List<Entity>();
        this.cacheTwo = new Dictionary<Guid, Entity>();
    }

    public IQueryable<Entity> Query()
    {
        return new Query<Entity>(this.cacheOne, this.cacheTwo);
    }

}

Query:

class Query<Entity> : IQueryable<Entity>
{
    internal Query(List<Entity> cacheOne, Dictionary<Guid, Entity> cacheTwo)
    {
        this.Provider = new QueryProvider<Entity>(cacheOne, cacheTwo);
        this.Expression = Expression.Constant(this);
    }

    internal Query(IQueryProvider provider, Expression expression)
    {
        this.Provider = provider;
        if (expression != null)
            this.Expression = expression;
    }

    public IEnumerator<Entity> GetEnumerator()
    {
        return this.Provider.Execute<IEnumerator<Entity>>(this.Expression);
    }

    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return this.GetEnumerator();
    }

    public Type ElementType
    {
        get { return typeof(Entity); }
    }

    public System.Linq.Expressions.Expression Expression { get; private set; }

    public IQueryProvider Provider { get; private set; }
}

QueryProvider:

class QueryProvider<Entity> : IQueryProvider
{

    private List<Entity> cacheOne;
    private Dictionary<Guid, Entity> cacheTwo;

    internal QueryProvider(List<Entity> cacheOne, Dictionary<Guid, Entity> cacheTwo)
    {
        this.cacheOne = cacheOne;
        this.cacheTwo = cacheTwo;   
    }

    public IQueryable<TElement> CreateQuery<TElement>(System.Linq.Expressions.Expression expression)
    {
        return new Query<TElement>(this, expression);
    }

    public IQueryable CreateQuery(System.Linq.Expressions.Expression expression)
    {
        throw new NotImplementedException();
    }

    public TResult Execute<TResult>(System.Linq.Expressions.Expression expression)
    {
        return (TResult)this.Execute(expression);
    }

    public object Execute(System.Linq.Expressions.Expression expression)
    {
        Iterator<Entity> iterator = new Iterator<Entity>(expression, cacheOne, cacheTwo);
        return (iterator as IEnumerable<Entity>).GetEnumerator();
    }
}

Iterator:

class Iterator<Entity> : IEnumerable<Entity>
{
    private Expression expression;
    private List<Entity> cacheOne;
    private Dictionary<Guid, Entity> cacheTwo;

    internal Iterator(Expression expression, List<Entity> cacheOne, Dictionary<Guid, Entity> cacheTwo)
    {
        this.expression = expression;
        this.cacheOne = cacheOne;
        this.cacheTwo = cacheTwo;
    }

    public IEnumerator<Entity> GetEnumerator()
    {
        foreach (var result in (IEnumerable<Entity>)this.cacheOne.AsQueryable<Entity>().Provider.Execute(expression))
        {
            yield return result;
        }

        foreach (var more in (IEnumerable<Entity>)this.cacheTwo.Values.AsQueryable<Entity>().Provider.Execute(expression))
        {
            yield return more;
        }
    }

    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return this.GetEnumerator();
    }
}

Program:

class Program
{
    static void Main(string[] args)
    {
        /// Create collection + caches
        var collection = new Collection<Giraffe>();
        collection.cacheOne.AddRange(new Giraffe[] {
            new Giraffe() { Id = Guid.NewGuid(), DateOfBirth = new DateTime(2011, 03, 21), Height = 192, Name = "Percy" },
            new Giraffe() { Id = Guid.NewGuid(), DateOfBirth = new DateTime(2005, 12, 25), Height = 188, Name = "Santa" },
            new Giraffe() { Id = Guid.NewGuid(), DateOfBirth = new DateTime(1999, 04, 01), Height=144, Name="Clown" }
        });
        var cachetwo = new List<Giraffe>(new Giraffe[] {
            new Giraffe() { Id = Guid.NewGuid(), DateOfBirth = new DateTime(1980, 03,03), Height = 599, Name="Big Ears" },
            new Giraffe() { Id = Guid.NewGuid(), DateOfBirth = new DateTime(1985, 04, 02), Height= 209, Name="Pug" }
        });
        foreach (var giraffe in cachetwo)
            collection.cacheTwo.Add(giraffe.Id, giraffe);

        /// Iterate through giraffes born before a certain date
        foreach (var result in collection.Query().Where(T => T.DateOfBirth < new DateTime(2006, 01, 01)))
        {
            Console.WriteLine(result.Name);
        }

    }
}

Giraffe:

class Giraffe
{
    public Guid Id { get; set; }
    public string Name { get; set;  }
    public long Height { get; set; }
    public DateTime DateOfBirth { get; set; }
}

Special cases e.g. SingleAndDefault, etc are left out. The part I'm wanting to work happens in Iterator, where it first of all executes the List's QueryProvider before executing the Dictionary's.

One of the two Queryable objects might be a database, or something else.

Embroidery answered 22/5, 2012 at 14:23 Comment(3)
Could you add the calling code?Sacken
Could you give an example of the Linq expression that you are using to call your QueryProvider? (I am trying to reconstruct your code locally). Also, do you implement the generic version of Execute too? public TResult Execute<TResult>(System.Linq.Expressions.Expression expression) { ... }Spokane
Updated with example of the LINQ expression and other code in Query, QueryProvider and Collection classes.Embroidery
P
7

No, a query does not become bound to a provider. That's why you have the IQueryable interface: it provides both the Expression and the Provider, so LINQ can call the provider to execute the expression.

The problem in your implementation is in the way Query<Entity> represents itself: you're setting the root expression to Expression.Constant(this), where this is the query (not the collection).

So when you execute the query with LINQ-to-Objects, it will call GetEnumerator on Query<>, which then calls LINQ-to-Objects to execute Expression, which has a root expression Expression.Constant(this) (of type Query<>), and LINQ-to-Objects then iterates this root expression by calling GetEnumerator on this Query<>, etc.

The problem lies in

(IEnumerable<Entity>)this.cacheOne.AsQueryable<Entity>().Provider.Execute(expression)

which is basically equal to

new Entity[0].AsQueryable().Provider.Execute(expression)

or

linqToObjectsProvider.Execute(expression)

The Provider returned by a query is not linked to the source (this.cacheOne), so you're just re-executing the expression, not querying over your cache.

What's wrong with the following?

class Collection<Entity>
{
    ...

    public IQueryable<Entity> Query()
    {
        return this.cacheOne.Concat(this.cacheTwo.Values).AsQueryable();
    }
}

Note that Concat uses delayed evaluation, so only when you execute the query are cacheOne and cacheTwo concatenated and then manipulated using the additional LINQ operators.

(In which case, I'd make Collection<Entity> an IQueryablewithExpressionequal toExpression.Constant(this.cacheOne.Concat(this.cacheTwo.Values))`. I think you can do away with all the other classes.)


Original answer

However, I don't think this way of piggy-backing LINQ to Objects will ever be able to do what you think it should.

At the very least, you should keep the original query provider so you can call that one when you have a cache miss. If you don't, and use your own query provider (you did not show the code you are using to do the actual call), your query provider one will call itself again, and again.

So you'll need to create a CachingQueryProvider and a CachingQuery:

class CachingQuery<T> : IQueryable<T>
{
    private readonly CachingQueryProvider _provider;
    private readonly Expression _expression;

    public CachingQuery(CachingQueryProvider provider, Expression expression)
    {
        _provider = provider;
        _expression = expression;
    }

    // etc.
}

class CachingQueryProvider : IQueryProvider
{
    private readonly IQueryProvider _original;

    public CachingQueryProvider(IQueryProvider original)
    {
        _original = original;
    }

    // etc.
}

public static class CachedQueryable
{
    public static IQuerable<T> AsCached(this IQueryable<T> source)
    {
        return new CachingQuery<T>(
             new CachingQueryProvider(source.Provider), 
             source.Expression);
    }
}

Also if you want to cache a result, you'll need to materialize the result before you cache it, otherwise you cache the query, not the result. And the result itself should never be executed again, as it is already the data you should return.

The direction I would head into is as follows:

class CachingQueryProvider : IQueryProvider
{
    public object Execute(Expression expression)
    {
        var key = TranslateExpressionToCacheKey(expression);

        object cachedValue;
        if (_cache.TryGetValue(key, out cachedValue))
            return cachedValue;

        object result = _originalProvider.Execute(expression);

        // Won't compile because we don't know T at compile time
        IEnumerable<T> sequence = result as IEnumerable<T>;
        if (sequence != null && !(sequence is ICollection<T>)) 
        {
            result = sequence.ToList<T>();
        }

        _cache[key] = result; 

        return result;
    }
}

For the part marked as Won't compile, you'll have to do some reflection trickery.

And caution: string implements IEnumerable, so be careful not to try to materialize a single string result value.

Pryce answered 25/5, 2012 at 15:3 Comment(18)
Thanks Ruben, that's helpful but why do you think piggy-backing LINQ-to-Objects won't ever do what I'm hoping? Your workaround is good but I'm curious as to why you don't feel piggy-backing can work.Embroidery
LINQ to Objects is meant for enumerating in-memory collections (like arrays). That's the only thing it can do. So if you let LINQ to Objects execute a query like from entity in table where ... select entity, it will ask table to return all elements, and then apply the where to the result. And table will use it's own data context to do so (and thus execute SELECT * FROM Table every time you use it). So you must execute the query and transform the result into an in memory structure and cache that. I don't see where L2O fits in here.Pryce
Also IQueryProvider.Execute should always return the result of the query. It should not return an intermediate representation. Perhaps that's the confusion?Pryce
@Pryce Could IQueryProvider.PartialEval be used in this instance?Colpitis
@Colpitis That's probably something you need to calculate the cache key, because all locals and fields in the query expression will be represented by MemberExpressions. Or I'm completely and utterly misunderstanding the original question. My train of though here is: how can I cache a random query (with a separate cache entry per distinct query). If you want to cache the complete table but apply all query operations in memory, you'll need a different implementation alltogether.Pryce
@Pryce I see what you mean if it's an expression with table references, but if it's just a .Where(T => T.Id == 34) for example, I would've thought that should be transferrable to a L2O query?Embroidery
@Anthony Correct, but a .Where( => ) is always called on something, and that something is also part of the expression. So you can either have an expression like T => T.Id == 34 without context (i.e., the argument to a .Where or even a .Select), or a query like table.Where(T => T.Id == 34). You can't have something in the middle, such as .Where(T => T.Id == 34) which is only a fragment.Pryce
I think Ruben I don't see why the TranslateExpressionToCacheKey(expression) has to happen, when Linq-to-Objects should be able to do it.Embroidery
@Anthony How do you mean "Linq-to-Objects should be able to do it". What should it do for you? LINQ-to-Objects does not care about caching or anything like that. When its Execute is called, it will always re-iterate the collection passed, at the deepest level of your LINQ query, which for queries over Entity Framework and LINQ-to-SQL table objects would mean re-execute a SELECT * FROM Table.Pryce
@Pryce Linq-To-Objects is the method of querying the cache. I'm calling the Execute on Linq-To-Objects, not on the Linq-To-Sql provider. They're two different collections, with two different Providers. Given that the expression isn't bound to a provider, I don't understand why the Linq-To-Objects Execute method suddenly jumps execution back to the custom Linq-To-Sql's provider.Embroidery
@Anthony The expression/query may not be bound to a provider, but your source data may very wel be. For instance, in Entity Framework, a query like from e in dbcontext.SomeTable where e.Id == 3 select e when executed by LINQ-to-Objects will execute a SELECT * FROM SomeTable, because the EntitySet must retrieve the data. The where is handled by L2O later. And it is likely it will have to re-execute the query when you re-iterate the EntitySet. LINQ-to-Objects has no knowledge of the query, it just executes dbcontext.SomeTable.GetEnumerator() every single iteration.Pryce
It doesn't really make sense to cache queries as such because queries are not data. They're just the description of how to retrieve the data (and it's up to the provider to make sense of that). You should cache the result of a query. Note that caching a List<SomeType> rather than an EntityQuery<SomeTime> (simply by calling .ToList() on your query), you are still able to execute additional queries on the cached data, simply by calling .AsQueryable() on the list, which returns a LINQ-to-Objects query, not over EntitySets and such, but over the list.Pryce
@Anthony I've updated my answer to reflect your actual implementation rather than some hypothetical situation.Pryce
@Pryce Thanks, the non-hypothetical situation is starting to make sense, I think. It looks like it's the lines Expression.Constant(this) and the fact the provider's not linked to the source that's causing me the confusion. So it's Linq-to-Objects that calls GetEnumerator() on the original query which is where it's all going wrong? Your suggestion to concatenate the cache's would work but in the interest of simplicity in the example I replaced what was a database recordset with cacheTwo. Concatenating the recordset I think would mean that all the records will be pulled from the database?Embroidery
@Anthony Correct, concatenating the two would cause the entire recordset to be pulled (every time again if you don't cache/materialize the recordset). If you want more control, you could rewrite the expression inside your query provider to replace the Expression.Constant(source) with the actual (cached) data. Mind you, it can get pretty complex when you re-execute the query over two separate sets. For instance, OrderBy and Take/Skip will not work properly in the current design, as these are executed over both sets independently, so a Take(10) will return up to 20 results.Pryce
I'll give that a go Ruben and check the results. I thought it might get complex and planned to check the expression if there's an OrderBy to pass this straight to the database and completely bypass the cache. If there's a SingleOrDefault(), for example, that would be an occasion where it would check the cache first. I'll post back and let you know how I get on.Embroidery
Thanks Ruben it seemed to do the trick - it was the Expression.Constant(this) line change to the cache that did it. I'm not too sure what's happened to the bounty I added to this question, I don't have the points any more and i can't see where to award it. Have the points disappeared into the ether?Embroidery
@Anthony I didn't get any as you only accepted my answer just now, after the bounty had expired. And there were too few upvotes to reward me with half of the bounty before it expired. So into the ether they went!Pryce

© 2022 - 2024 — McMap. All rights reserved.