How to define IEnumerable behavior by contract?
Asked Answered
W

5

4

Consider this 2 methods that returns IEnumerable:

    private IEnumerable<MyClass> GetYieldResult(int qtResult)
    {
        for (int i = 0; i < qtResult; i++)
        {
            count++;
            yield return new MyClass() { Id = i+1 };
        }
    }

    private IEnumerable<MyClass> GetNonYieldResult(int qtResult)
    {
        var result = new List<MyClass>();

        for (int i = 0; i < qtResult; i++)
        {
            count++;
            result.Add(new MyClass() { Id = i + 1 });
        }

        return result;
    }

This code shows 2 different behaviors when calling some method of IEnumerable:

    [TestMethod]
    public void Test1()
    {
        count = 0;

        IEnumerable<MyClass> yieldResult = GetYieldResult(1);

        var firstGet = yieldResult.First();
        var secondGet = yieldResult.First();

        Assert.AreEqual(1, firstGet.Id);
        Assert.AreEqual(1, secondGet.Id);

        Assert.AreEqual(2, count);//calling "First()" 2 times, yieldResult is created 2 times
        Assert.AreNotSame(firstGet, secondGet);//and created different instances of each list item
    }

    [TestMethod]
    public void Test2()
    {
        count = 0;

        IEnumerable<MyClass> yieldResult = GetNonYieldResult(1);

        var firstGet = yieldResult.First();
        var secondGet = yieldResult.First();

        Assert.AreEqual(1, firstGet.Id);
        Assert.AreEqual(1, secondGet.Id);

        Assert.AreEqual(1, count);//as expected, it creates only 1 result set
        Assert.AreSame(firstGet, secondGet);//and calling "First()" several times will always return same instance of MyClass
    }

It's simple to choose which behavior I want when my code returns IEnumerables, but how can I explicitly define that some method gets an IEnumerable as parameter that creates a single result set dispite of how many times it calls "First()" method.

Of course, I don't want to force all itens to be created unnecessarily and I want to define the parameter as IEnumerable to say that no item will be included or removed from the collection.

EDIT: Just to be clear, the question is not about how yield works or why IEnumerable can return different instances for each call. The question is how can I specify that a parameter should be a "search only" collection that returns same instances of MyClass when I call methods like "First()" or "Take(1)" several times.

Any ideas?

Thanks in advance!

Wilhoit answered 29/10, 2010 at 13:13 Comment(0)
G
2

Of course, I don't want to force all itens to be created unnecessarily

In which case you need to allow the method to create them on demand, and if objects are created on demand (and without some form of cache) they will be different objects (at least in the sense of being different references—the default definition of equality for non-value objects).

If your objects are inherently unique (i.e. they don't define some value based equality) then each call to new will create a different object (whatever the constructor parameters).

So the answer to

but how can I explicitly define that some method gets an IEnumerable as parameter that creates a single result set dispite of how many times it calls "First()" method.

is "you can't" except by creating one set of objects and repeatedly returning the same set, or by defining equality to be something different.


Additional (based on comments). If you really want to be able to replay (for want of a better term) the same set of objects without building the whole collection you could cache want has already been generated and replay that first. Something like:

private static List<MyData> cache = new List<MyData>();
public IEnumerable<MyData> GetData() {
  foreach (var d in cache) {
    yield return d;
  }

  var position = cache.Count;

  while (maxItens < position) {
    MyData next = MakeNextItem(position);
    cache.Add(next);
    yield return next;
  }
}

I expect it would be possible to build such a caching wrapper around an iterator as well (while would become foreach over underlying iterator, but you would need to cache that iterator or Skip to the require position if the caller iterated beyond the cahing List).

NB any caching approach would be hard to make thread safe.

Greenback answered 29/10, 2010 at 13:26 Comment(5)
Oh yeah that's the third option, redefine how they test that they are equal.Indecent
@Wilhoit That kind of caching can only be done by the method you are calling. A caller can impose that kind of policy on methods it calls without their cooperation.Greenback
Comparing those objects isn't really the problem. The problem is that reinstantiating MyClass items I lost the ref to those instances. So, if some method iterates some itens and changes some os their properties and then I send this IEnumerable to another method, it gets new instances of MyClass with original properties values. So, I hope there is a way to enjoy the benefits of filling the collection by demand without risking to create multiple instances of the same collection item.Wilhoit
@Wilhoit I've expanded with an idea that I think maybe what you are looking for -- a hybrid approach.Greenback
@Richard: thanks. I've changed my code to avoid this situation. I'm still worried about not having a language/framework support to this contract. But fortunately I could change my code as it would be undesirable to disseminate one more "remember to" note to the team.Wilhoit
P
1

Unless I'm misreading you, your question may be caused by a misunderstanding.. Nothing ever returns an IEnumerable. The first case returns an Enumerator, which implements foreach, allowing you to get instances of MyClass, one at a time. It, (the function return value) is typed as IEnumerable to indicate that it supports the foreach behavior (and a few others)

The second function actually returns a List, which of course also supports IEnumerable (foreach behavior). But it is an actual concrete collection of MyClass Objects, created by the method you called (the second one)

The first method doesn't return any MyClass Objects at all, it returns that enumerator object, which is created by the dotNet framework and coded behind the scenes to instantiate a new MyClass object each time you iterate against it.

EDIT: More detail A more important distinction is whether or not you want the items to be statefully held in place for you within the class, while you iterate, or whether you want them created for you when you iterate.

Another consideration is.. are the items you wish returned to you already in existence somewhere else? i.e., is this method going to iterate through a set (or filtered subset) of some existing collection? or is it creating the items on the fly? if the latter, does it matter if the item is the exact same instance each time you "get" it? For objects defined t orepresent things that could be called an entity - ssomething with a defined identity, you probably want successive fetches to return the same instance.

But maybe another instance with the same state is totally equivilent? (This would be called a value type object, like a telephone Number, or an address, or a point on the screen. Such objects have no identity except that implied by their state. In this latter case, it doesn't matter if the enumerator returns the same instance or a newly created identical copy each time you "get" it... Such objects are generally immutable, they are the same, they stay the same, and they function identically.

Presentment answered 29/10, 2010 at 13:28 Comment(6)
Pedantically: an iterator methods return IEnumerable<T> so they certainly return an enumerable, from which one can obtain an enumerator (implementing IEnumerator or, better, IEnumerator<T>).Greenback
Pedantically, There is NO SUCH OBJECT as IEnumerable or IEnumerable<T>. These terms are interfaces that define a contract. All they do is specify that whatever object the method actually returns, it must satisfy that contract. This is exactly the gist of the point I am making.Presentment
Thank you for the answer. I understand that with the first method MyClass are instantiated when I iterate the enumerator and that second functions returns a List, that is already the collection instance. But I want to use IEnumerable parameter to say "hey, I will only read this collection, so don't worry about itens been added or removed". But I still want to be sure that I'm not getting new collection items instead of calling several times the same collection that is filled by demand.Wilhoit
The more important distinction is whether or not you want the items to be statefully held in place for you within the class, while you iterate, or whether you want them created for you when you iterate. See expanded answer.Presentment
@Charles: actually I want to be sure that my test method can partially iterate the IEnumerable several times (calling "First()", "Take(n)" or whatever) and always get same instances of MyClass without changing the strategy the IEnumerable source chose. Using IEnumerable as parameter Type I made clear I will not add or remove items to/from it but I'm not specifying it as a "search only" collection with immutable search results, that is what I need.Wilhoit
@Antonio, then you have to "hold on" to each instance in the getResult() method so that the next `Fetch (from a foreach Movenext, or MoveFirst()) can simply grab that already created instance... For a method that is Typed as IEnumerable, that means you would need to create an internal storage collection object to hold these instanecs, and have the yield return operation fetch a designated item from that internal collection. This would be sorta pointless, you might as well just return the collection to the client code directly.Presentment
C
1

I've been trying to find an elegant solution to the problem for a while now. I wish that the framework designers had added a little "IsImmutable" or similar property getter to IEnumerable so that one could easily add an Evaluate (or similar) extension method that doesn't do anything for an IEnumerable that is already in its "fully evaluated" state.

However, since that doesn't exist, here's the best I've been able to come up with:

  1. I've created my own interface that exposes the immutability property, and I implement it in all of my custom collection types.
  2. My implementation of the Evaluate extension method is aware of this new interface as well as the immutability of the subset of relevant BCL types that I consume most frequently.
  3. I avoid returning "raw" BCL collection types from my APIs in order to increase the efficiency of my Evaluate method (at least when running against my own code).

It's rather kludgy, but it's the least intrusive approach I've been able to find so far to address the problem of allowing an IEnumerable consumer to create a local copy only when this is actually necessary. I very much hope that your question lures some more interesting solutions out of the woodwork...

Cuyp answered 29/10, 2010 at 14:8 Comment(2)
You in truth one can create an wrapper class, as I suggested in my answer, and use extension methods to easily convert any IEnumerables to the new interface that marks the immutability and is implemented by the wrapper class, then you can simply specify at the function parameter level that the new marking interface is required.Berlioz
They're much the same style of approach, and they have pretty much the same weaknesses. First, except in a few rare cases, the extension method has to make a potentially unnecessarily expensive guess regarding immutability of the source. Second, in order to opt into the "don't copy me" path, an IEnumerable<T> publisher has to jump through some otherwise unnecessary hoops (some of which may end up causing a perf hit).Cuyp
B
1

You can mix the suggestions, you can implement an wrapper class, generics-based, that takes the IEnumerable and returns a new one that constructs a cache on each next, and reuses the partial cache as needed on further enumerations. It is not easy, but will create objects (in truth only for Iterators that construct objects on-the-fly) only once and as needed. The hardest part is to be sure when to switch from the partial cache back to the original enumerator and how to make it transactional (consistent).

Update with tested code:

public interface ICachedEnumerable<T> : IEnumerable<T>
{
}

internal class CachedEnumerable<T> : ICachedEnumerable<T>
{
    private readonly List<T> cache = new List<T>();
    private readonly IEnumerator<T> source;
    private bool sourceIsExhausted = false;

    public CachedEnumerable(IEnumerable<T> source)
    {
        this.source = source.GetEnumerator();
    }

    public T Get(int where)
    {
        if (where < 0)
            throw new InvalidOperationException();
        SyncUntil(where);
        return cache[where];
    }

    private void SyncUntil(int where)
    {
        lock (cache)
        {
            while (where >= cache.Count && !sourceIsExhausted)
            {
                sourceIsExhausted = source.MoveNext();
                cache.Add(source.Current);
            }
            if (where >= cache.Count)
                throw new InvalidOperationException();
        }
    }

    public bool GoesBeyond(int where)
    {
        try
        {
            SyncUntil(where);
            return true;
        }
        catch (InvalidOperationException)
        {
            return false;
        }
    }

    public IEnumerator<T> GetEnumerator()
    {
        return new CachedEnumerator<T>(this);
    }

    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return new CachedEnumerator<T>(this);
    }

    private class CachedEnumerator<T> : IEnumerator<T>, System.Collections.IEnumerator
    {
        private readonly CachedEnumerable<T> parent;
        private int where;

        public CachedEnumerator(CachedEnumerable<T> parent)
        {
            this.parent = parent;
            Reset();
        }

        public object Current
        {
            get { return Get(); }
        }

        public bool MoveNext()
        {
            if (parent.GoesBeyond(where))
            {
                where++;
                return true;
            }
            return false;
        }

        public void Reset()
        {
            where = -1;
        }

        T IEnumerator<T>.Current
        {
            get { return Get(); }
        }

        private T Get()
        {
            return parent.Get(where);
        }

        public void Dispose()
        {
        }
    }
}

public static class CachedEnumerableExtensions
{
    public static ICachedEnumerable<T> AsCachedEnumerable<T>(this IEnumerable<T> source)
    {
        return new CachedEnumerable<T>(source);
    }
}

With this you can now add a new Test that shows it works:

    [Test]
    public void Test3()
    {
        count = 0;

        ICachedEnumerable<MyClass> yieldResult = GetYieldResult(1).AsCachedEnumerable();

        var firstGet = yieldResult.First();
        var secondGet = yieldResult.First();

        Assert.AreEqual(1, firstGet.Id);
        Assert.AreEqual(1, secondGet.Id);

        Assert.AreEqual(1, count);//calling "First()" 2 times, yieldResult is created 2 times
        Assert.AreSame(firstGet, secondGet);//and created different instances of each list item
    }

Code will be incorporated at my project http://github.com/monoman/MSBuild.NUnit , may later appear in the Managed.Commons project too

Berlioz answered 29/10, 2010 at 14:15 Comment(0)
I
0

Then you need to cache the result, an IEnumerable is always re-executed when you call something that iterates over it. I tend to use:

private List<MyClass> mEnumerable;
public IEnumerable<MyClass> GenerateEnumerable()
{
    mEnumerable = mEnumerable ?? CreateEnumerable()
    return mEnumerable;
}
private List<MyClass> CreateEnumerable()
{
    //Code to generate List Here
}

Granted on the other side (say for your example) you can have the ToList Call at the end here will iterate and create a list that is stored, and yieldResult will still be an IEnumerable without an issue.

[TestMethod]
public void Test1()
{
    count = 0;


    IEnumerable<MyClass> yieldResult = GetYieldResult(1).ToList();

    var firstGet = yieldResult.First();
    var secondGet = yieldResult.First();

    Assert.AreEqual(1, firstGet.Id);
    Assert.AreEqual(1, secondGet.Id);

    Assert.AreEqual(2, count);//calling "First()" 2 times, yieldResult is created 1 time
    Assert.AreSame(firstGet, secondGet);
}
Indecent answered 29/10, 2010 at 13:26 Comment(1)
Both suggestions avoid creating MyClass twice but they also involve creating all items before calling "First()" method.Wilhoit

© 2022 - 2024 — McMap. All rights reserved.