How can I use Nhibernate to retrieve data when the "WHERE IN()" have thousands of values? (too many parameters in the sql)
Asked Answered
T

7

22

The problem: Nhibernate parses each value in the "WHERE IN()" sql as parameters and MS SQL server doesn't support enough parameters (over 2000).

I am using Nhibernate with Linq to retrive my data from the SQL server and I need to load alot of entities based on already known ID's.

My code looks something like this:

int[] knownIds = GetIDsFromFile();
var loadedEntities = _Repository.GetAll()
                                .Where(x => knownIds.Contains(x.ID))
                                .ToList();

Which give an sql like this:

SELECT id, name FROM MyTable 
WHERE id IN (1 /* @p0 */,2 /* @p1 */,3 /* @p2 */,4 /* @p3 */, 5 /* @p4 */)

If there is too many values in knownIds, then this code will throw an Exception because of the many parameters that NHibernate uses.

I think the best solution would be if I could make NHibernate use only 1 parameter for the whole "WHERE IN()", but I don't know how to do this:

SELECT id, name FROM MyTable WHERE id IN (1, 2, 3, 4, 5 /* @p0 */)

I'll be glad to hear any ideas of how to solve this - either by extending the LINQ provider or by other means. One solution is to simple do the query x times (knownIds.Count / 1000), but I rather want a generic solution which would work for all my entities.

I have tried looking at extending the LINQ provider by searching google and Stackoverflow, however I can't find a solution and I dont have any experience with either HQL or the treebuilder. Here are a few of the sites I have been at:

UPDATE: I know it ain't good practice by having so many values in the IN clause, but I don't know a better solution for what I want to do.
Consider a company where all the customers pay for the company's services one time each month. The company don't handle the payments itself, but have another company to collect the money. One time each month the company receives a file containing the status of these payments: if they have been paid or not. The file only contains the ID of the specific payment, and not the ID of the customer. A company with 3000 monthly customers, will then make 3000 LogPayments each month, where the status need to be updated. After 1 year there will be around 36.000 LogPayments, so just loading them all doesn't seem like a good solution either.

MY SOLUTION: Thanks for all the usefull answers. In the end I choosed to use a combination of the answers. For this specific case I did something like Fourth suggested, as that would increase performance a great deal. However I have allso implemented the generic method Stefan Steinegger suggested, because I like that I can do this, if that is what I really want. Besides, I don't want my program to crash with an exception, so in the future I will allso use this ContainsAlot-method as a safeguard.

Thromboplastic answered 9/6, 2011 at 13:16 Comment(5)
You can't. And having thousands of items for IN statement has really bad code smell and seems like your data layer isn't designed correctly.Changeless
If you could store the knownids in the database somehow, the you could use a simple inner join across the two columnsHorologium
Are you using NHibernate for doing an ETL like process?Undesigning
@Euphoric: Please look at my update for the reason. @Undesigning : Sorry, but what do you mean about ETL? @Horologium : Thanks, I will consider that.Thromboplastic
MS SQL server doesn't support enough parameters (over 2000). I think it does support enough parameters. Honestly, 640 parameters out to be enough for everybody. Seriously though: if you need > 100 parameters you're probably doing it wrong.Bobsledding
C
13

See this similar question: NHibernate Restrictions.In with hundreds of value

I usually set up several queries, which all get for instance 1000 entries. Just split you array of ids into several pieces.

Something like this:

// only flush the session once. I have a using syntax to disable
// autoflush within a limited scope (without direct access to the
// session from the business logic)
session.Flush();
session.FlushMode = FlushMode.Never;

for (int i = 0; i < knownIds; i += 1000)
{
  var page = knownIds.Skip(i).Take(1000).ToArray();
  loadedEntities.AddRange(
    Repository.GetAll()
      .Where(x => page.Contains(x.ID)));
}

session.FlushMode = FlushMode.Auto;

Generic implementation using criteria (only filtering a single property, which is a common case):

public IList<T> GetMany<TEntity, TProp>(
  Expression<Func<TEntity, TProp>> property,
  IEnumerable<TProp> values)
{
    string propertyName = ((System.Linq.Expressions.MemberExpression)property.Body).Member.Name;

    List<T> loadedEntities = new List<T>();

    // only flush the session once. 
    session.Flush();
    var previousFlushMode = session.FlushMode;
    session.FlushMode = FlushMode.Never;

    for (int i = 0; i < knownIds; i += 1000)
    {
      var page = knownIds.Skip(i).Take(1000).ToArray();

      loadedEntities.AddRange(session
        .CreateCriteria(typeof(T))
        .Add(Restriction.PropertyIn(propertyName, page)
        .List<TEntity>();
    }

    session.FlushMode = previousFlushMode;
    return loadedEntities;
}

To be used like this:

int[] ids = new [] {1, 2, 3, 4, 5 ....};
var entities = GetMany((MyEntity x) => x.Id, ids);

string[] names = new [] {"A", "B", "C", "D" ... };
var users = GetMany((User x) => x.Name, names);
Cupidity answered 9/6, 2011 at 13:55 Comment(4)
I thought of that too, but I would like a generic way of doing this, so I don't have to write the logic each time. Do you by any chance have an idea to how I can do this? I have tried a method which looks like this: public static IList<T> ContainsAlot<T, TRelated>(this IQueryable<T> query, Expression<Func<T, int>> exp, int[] bigList) where T : Entity But the problem is that I don't think this part would work with NHibernate: query.Where(x => bigListPart.Any(y => y == exp.Compile().Invoke(x))).ToList();Thromboplastic
You could implement the generic way using criteria or QueryOver or even HQL. (Why bother about a type-safe interface when implementing generic stuff?)Cupidity
I added a generic implementation in criteria.Cupidity
Sorry, just saw that you can pass an arbitrary linq expression as argument, so my implementation is more restrictive. I can't provide a generic implementation for that, I'm not using the linq provider yet.Cupidity
L
2

http://ayende.com/blog/2583/nhibernates-xml-in has a possible solution, passing the parameters as XML (unfortunately most of the links in the page are broken..)

Lerma answered 9/6, 2011 at 13:31 Comment(0)
E
2

WHERE IN should not be the norm and should be used only in specific, and limited, cases. If you find yourself using it a lot, it probably indicates a problem with your data model. What I would probably do in your case is get ALL the entities from the database in a lazy-load, and then as I iterate through the IDs that I have, pull them out of the entity collection. This way the performance hit is distributed across many queries and you dont hit the WHERE IN threshold.

Just to note, if the IDs will represent most of the entities rather than a small subset (that is you know you are going to wind up getting them all, or most, anyway) then dont lazy-load.

Edit based on your update

If you are talking about 36,000 records after 1 year, BUT you are only dealing with loads in the recent time, then eager load the recent records you care about. I would do something like: create a criteria to load the records for the past... month? Then I will have all the records I might need, match them up to the IDs in the file via code, and bingo-bango-bongo.

The table will definitely grow in size over time so it doesn't make sense to always pull back everything, but if you have a way of saying "i only care about these records" then SQL can do that constraint for you.

Eudemon answered 9/6, 2011 at 14:1 Comment(1)
Please see my update where I describe my scenario. How could I lazy-load them, won't it result in around 3000 queries? (Lazy-loading each)Thromboplastic
M
1

The only place I've seen code like that before where the Ids extend into the thousands, was where that list of IDs had just been loaded from the database as a separate query. It should instead by created as a DetachedCriteria, and then consumed using a Subqueries.PropertyNotIn or PropertyIn criteria query (rather than LINQ).


Another way to look at this sort of thing is - 2100 parameters feels like an arbitrary limit. I'm sure SQL Server could be modified to accept more parameters (but I'm sure a Connect request would get closed almost immediately), or you can use workarounds (such as sending XML, or pre-populating a table) to pass this many parameters. But if you're hitting this limit, shouldn't you step back and consider that there's something else broken in what you're doing?

Monophony answered 9/6, 2011 at 13:50 Comment(0)
A
0

You can't make the IN list only one parameter (e.g. an array), because that is not supported by SQL. The only way I know of to have more than 1000 elements in the IN list is to put a sub query there.
Having said that, one workaround would be to put the known IDs into a temporary table and change your NHibernate statement to use this table so it will result in a subquery in the SQL statement.

Almaraz answered 9/6, 2011 at 13:28 Comment(0)
C
0

I would definitely recomend using temporary table for this kind of data.

You will get ability to check if the parameters are correct, by querying this temporary table. And you can have foreign key constraints, so you are baked up before bad IDs. And you can have history of them in your database.

Changeless answered 10/6, 2011 at 7:31 Comment(0)
C
0

I faced the same problem in Oracle which doesn't allow more than 1000 elements inside IN condition as well. The error is: "ORA-01795: maximum number of expressions in a list is 1000". Here is my solution:

    //partition an IEnumerable into fixed size IEnumerables
    public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> source, int partitionSize)
    {
        return source
            .Select((value, index) => new { Index = index, Value = value })
            .GroupBy(i => i.Index / partitionSize)
            .Select(i => i.Select(i2 => i2.Value));
    }

    public IEnumerable<T> Get(List<long> listOfIDs)
    {
        var partitionedList = listOfIDs.Partition(1000).ToList();
        List<ICriterion> criterions = new List<ICriterion>();
        foreach (var ids in partitionedList)
        {
            criterions.Add(Restrictions.In("Id", ids.ToArray()));
        }
        var criterion = criterions.Aggregate(Restrictions.Or);
        var criteria = session.CreateCriteria<T>().Add(criterion);
        return criteria.Future<T>();
    }

First part is an extension method to IEnumerable, to partition a big list into fixed size lists. Second part uses NHibernate Criterions to dynamically generate multiple IN conditions to be later joined with OR conditions.

Currycomb answered 5/12, 2017 at 12:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.