Is the order of execution of Linq the reason for this catch?
Asked Answered
P

3

1

I have this function to repeat a sequence:

public static List<T> Repeat<T>(this IEnumerable<T> lst, int count)
{
    if (count < 0)
        throw new ArgumentOutOfRangeException("count");

    var ret = Enumerable.Empty<T>();

    for (var i = 0; i < count; i++)
        ret = ret.Concat(lst);

    return ret.ToList();
}

Now if I do:

var d = Enumerable.Range(1, 100);
var f = d.Select(t => new Person()).Repeat(10); 
int i = f.Distinct().Count();

I expect i to be 100, but its giving me 1000! My question strictly is why is this happening? Shouldn't Linq be smart enough to figure out that it's the first selected 100 persons I need to concatenate with variable ret? I'm getting a feeling that here the Concat is being given preference when it's used with a Select when its executed at ret.ToList()..

Edit:

If I do this I get the correct result as expected:

var f = d.Select(t => new Person()).ToList().Repeat(10); 
int i = f.Distinct().Count(); //prints 100

Edit again:

I have not overridden Equals. I'm just trying to get 100 unique persons (by reference of course). My question is can someone elucidate to me why is Linq not doing the select operation first and then concatenation (of course at the time of execution)?

Palmirapalmistry answered 21/11, 2012 at 19:26 Comment(2)
@BrianRasmussen yes, but the distinct count would still be 100 right?Palmirapalmistry
Distint is applied to a list of 1000 instances of Person. I assume Person is a class they are not reference equal.Oilstone
E
4

The problem is that unless you call ToList, the d.Select(t => new Person()) is re-enumerated each time the Repeat goes through the list, creating duplicate Persons. The technique is known as the deferred execution.

In general, LINQ does not assume that each time it enumerates a sequence it would get the same sequence, or even a sequence of the same length. If this effect is not desirable, you can always "materialize" the sequence inside your Repeat method by calling ToList right away, like this:

public static List<T> Repeat<T>(this IEnumerable<T> lstEnum, int count) {
    if (count < 0)
        throw new ArgumentOutOfRangeException("count");

    var lst = lstEnum.ToList(); // Enumerate only once
    var ret = Enumerable.Empty<T>();

    for (var i = 0; i < count; i++)
        ret = ret.Concat(lst);

    return ret.ToList();
}
Examen answered 21/11, 2012 at 19:33 Comment(8)
Ok, that is the only possible explanation for this which I was thinking. But in the end don't you think it is a bug!! I feel so :)Palmirapalmistry
@Palmirapalmistry No, that's definitely a feature. LINQ keeps you in control of enumerating your sequences, rather than grabbing that control for itself. There is a cost in terms of memory for "materializing" a sequence; LINQ lets you decide to pay it or use slightly more CPU cycles.Examen
I am not clear of your explanation. Can you explain what you meant by the d.Select(t => new Person()) is re-enumerated each time the Repeat goes through the list I am not understanding the each time part. Repeat is called just once? The entire expression is enumerated just once since its executed only once at ret.ToList()Palmirapalmistry
@Palmirapalmistry inside Repeat you have a loop of ret.Concat(lst) that creates essentially an expression that looks like this: Enumerable.Empty<T>.Concat(list).Concat(list).Concat(list) repeated count times. Each of these Concat calls re-enumerates lst individually, separately from the Concats created in prior iterations.Examen
Das, I get your point. I'll do some thinking now and re-edit my question if something comes up. Will let u know again if so :)Palmirapalmistry
Das, I answered myself to this question. Can you kindly look into it and tell me which of our answer is more accurate and helpful to answering my question? I will accept that answer.Palmirapalmistry
@Palmirapalmistry I think both answers are good - yours incorporates my last comment as an illustration, but either one explains the problem.Examen
Ok accepted yours since it has more votes, but I found my code snippet simpler to follow than your technical explanation :) I was more worried about my answer being wrong conceptually..Palmirapalmistry
P
1

I could break down my problem to something less trivial:

var d = Enumerable.Range(1, 100);
var f = d.Select(t => new Person());

Now essentially I am doing this:

f = f.Concat(f);

Mind you query hasn't been executed till now. At the time of execution f is still d.Select(t => new Person()) unexecuted. So the last statement at the time of execution can broken down to:

f = f.Concat(f); 
//which is 
f = d.Select(t => new Person()).Concat(d.Select(t => new Person()));

which is obvious to create 100 + 100 = 200 new instances of persons. So

f.Distinct().ToList(); //yields 200, not 100

which is the correct behaviour.

Edit: I could rewrite the extension method as simple as,

public static IEnumerable<T> Repeat<T>(this IEnumerable<T> source, int times)
{
    source = source.ToArray();
    return Enumerable.Range(0, times).SelectMany(_ => source);
}

I used dasblinkenlight's suggestion to fix the issue.

Palmirapalmistry answered 21/11, 2012 at 21:32 Comment(0)
C
0

Each Person object is a separate object. All 1000 are distinct.

What is the definition of equality for the Person type? If you don't override it, that definition will be reference equality, meaning all 1000 objects are distinct.

Cockup answered 21/11, 2012 at 19:32 Comment(2)
See my edit. I understand that. My question is why it happening. It definitely has got to do with execution thing in LinqPalmirapalmistry
Reference equality is what the function is checking since I have not overridden anything. But isn't it strange that Linq is getting this as 1000 different objects? :oPalmirapalmistry

© 2022 - 2024 — McMap. All rights reserved.