Caching IEnumerable
Asked Answered
D

7

25
public IEnumerable<ModuleData> ListModules()
{
    foreach (XElement m in Source.Descendants("Module"))
    {
        yield return new ModuleData(m.Element("ModuleID").Value);
    }
}

Initially the above code is great since there is no need to evaluate the entire collection if it is not needed.

However, once all the Modules have been enumerated once, it becomes more expensive to repeatedly query the XDocument when there is no change.

So, as a performance improvement:

public IEnumerable<ModuleData> ListModules()
{
    if (Modules == null)
    {
        Modules = new List<ModuleData>();
        foreach (XElement m in Source.Descendants("Module"))
        {
            Modules.Add(new ModuleData(m.Element("ModuleID").Value, 1, 1));
        }
    }
    return Modules;
}

Which is great if I am repeatedly using the entire list but not so great otherwise.

Is there a middle ground where I can yield return until the entire list has been iterated, then cache it and serve the cache to subsequent requests?

Dipstick answered 8/10, 2009 at 10:50 Comment(2)
Am I getting sth. wrong? Your code seems to do exactly what you ask for...Cadaverine
The second code block will always iterate the entire enumerable even though it may not be required to do so.Dipstick
G
9

You can look at Saving the State of Enumerators which describes how to create lazy list (which caches once iterated items).

Gneiss answered 8/10, 2009 at 11:9 Comment(5)
very cool! thanks for the link this totally solved a similar problem i was having with a query that read from disk.Weirdo
For posterity, could you include the relevant parts of the link which you found useful in your answer? That way, if the link goes down, changes, etc., your answer won't be rendered useless. Many thanks.Ep
link is broken, if only SO had a rule against link only answers...Atencio
The post from Wes Dyer can still be found at web.archive.org/web/20190120224839/https://… but the interesting content should be copied into the answer.Galway
Beware that other answers offer more performant solutions, as the solution proposed in this article is recursive and allocates an object for each element of the enumeration.Yoheaveho
H
9

I like @tsemer's answer. But I would like to propose my solutions, which has nothing to do with FP. It's naive approach, but it generates a lot less of allocations. And it is not thread safe.

public class CachedEnumerable<T> : IEnumerable<T>, IDisposable
{
    IEnumerator<T> _enumerator;
    readonly List<T> _cache = new List<T>();

    public CachedEnumerable(IEnumerable<T> enumerable) 
        : this(enumerable.GetEnumerator())
    {
    }

    public CachedEnumerable(IEnumerator<T> enumerator)
    {
        _enumerator = enumerator;
    }

    public IEnumerator<T> GetEnumerator()
    {
        // The index of the current item in the cache.
        int index = 0;

        // Enumerate the _cache first
        for (; index < _cache.Count; index++)
        {
            yield return _cache[index];
        }

        // Continue enumeration of the original _enumerator, 
        // until it is finished. 
        // This adds items to the cache and increment 
        for (; _enumerator != null && _enumerator.MoveNext(); index++)
        {
            var current = _enumerator.Current;
            _cache.Add(current);
            yield return current;
        }

        if (_enumerator != null)
        {
            _enumerator.Dispose();
            _enumerator = null;
        }

        // Some other users of the same instance of CachedEnumerable
        // can add more items to the cache, 
        // so we need to enumerate them as well
        for (; index < _cache.Count; index++)
        {
            yield return _cache[index];
        }
    }

    public void Dispose()
    {
        if (_enumerator != null)
        {
            _enumerator.Dispose();
            _enumerator = null;
        }
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

This is how the matrix test from @tsemer's answer will work:

var ints = new [] { 1, 2, 3, 4, 5 };
var cachedEnumerable = new CachedEnumerable<int>(ints); 
foreach (var x in cachedEnumerable)
{
    foreach (var y in cachedEnumerable)
    {
        //Do something
    }
}
  1. The outer loop (x) skips first for, because _cache is empty;
  2. x fetches one item from the _enumerator to the _cache;
  3. x pauses before second for loop;
  4. The inner loop (y) enumerates one element from the _cache;
  5. y fetches all elements from the _enumerator to the _cache;
  6. y skips the third for loop, because its index variable equals 5;
  7. x resumes, its index equals 1. It skips the second for loop because _enumerator is finished;
  8. x enumerates one element from the _cache using the third for loop;
  9. x pauses before the third for;
  10. y enumerates 5 elements from the _cache using first for loop;
  11. y skips the second for loop, because _enumerator is finished;
  12. y skips the third for loop, because index of y equals 5;
  13. x resumes, increments index. It fetches one element from the _cache using the third for loop.
  14. x pauses.
  15. if index variable of x is less than 5 then go to 10;
  16. end.
Higgs answered 6/1, 2016 at 12:42 Comment(6)
Nice and clean, and I also like that this solution doesn't enumerate the first item upon instantiationSandstrom
Looks clean and straightforward. Please could you add an explanation of why the third for block is needed?Dipstick
@Dipstick I added some infoHiggs
Your code edits do not compile, you should remove the readonly from _enumerator. As a side comment, the disposing code, while useful, is part of the boilerplate code I was trying to avoid. Plus, now consumers are implicitly advised to use this class inside a using directive (or otherwise dispose manually), which increases usage complexity.Sandstrom
@Sandstrom thanks for the correction. This is what happens when you write code in a text box:) I know, about usings, and etc. This is intentionally, because otherwise the resource can be leaking (we are doing this caching enumerable for havy resources, aren't we?). So a DB connection or a huge XML file opened for me is much bigger problem, than the "using" keyword overhead.Higgs
Looks like your code could return enumerated elements in different order.Scurvy
M
7

Check out MemoizeAll() in the Reactive Extensions for .NET library (Rx). As it is evaluated lazily you can safely set it up during construction and just return Modules from ListModules():

Modules = Source.
    Descendants("Module").
    Select(m => new ModuleData(m.Element("ModuleID").Value, 1, 1)).
    MemoizeAll();

There's a good explanation of MemoizeAll() (and some of the other less obvious Rx extensions) here.

Microampere answered 3/11, 2010 at 9:3 Comment(2)
This is very nice, I like the use of Rx. I'm still trying to find time and an excuse to play around with it more thoroughly.Dipstick
For future reference -- as an Rx newbie I lost 20 minutes trying to find where MemoizeAll() is: not in the main Reactive nuget (System.Reactive), but in Reactive's System.Interactive nuget package, and the function MemoizeAll() does not look existing anymore (not even by looking at their repo), instead I could only find Memoize().Extinct
S
5

I've seen a handful of implementations out there, some older and not taking advantage of newest .Net classes, some too elaborate for my needs. I ended up with the most concise and declarative code I could muster, which added up to a class with roughly 15 lines of (actual) code. It seems to align well with OP's needs:

Edit: Second revision, better support for empty enumerables

/// <summary>
/// A <see cref="IEnumerable{T}"/> that caches every item upon first enumeration.
/// </summary>
/// <seealso cref="http://blogs.msdn.com/b/matt/archive/2008/03/14/digging-deeper-into-lazy-and-functional-c.aspx"/>
/// <seealso cref="http://blogs.msdn.com/b/wesdyer/archive/2007/02/13/the-virtues-of-laziness.aspx"/>
public class CachedEnumerable<T> : IEnumerable<T> {
  private readonly bool _hasItem; // Needed so an empty enumerable will not return null but an actual empty enumerable.
  private readonly T _item;
  private readonly Lazy<CachedEnumerable<T>> _nextItems;

  /// <summary>
  /// Initialises a new instance of <see cref="CachedEnumerable{T}"/> using <paramref name="item"/> as the current item
  /// and <paramref name="nextItems"/> as a value factory for the <see cref="CachedEnumerable{T}"/> containing the next items.
  /// </summary>
  protected internal CachedEnumerable(T item, Func<CachedEnumerable<T>> nextItems) {
    _hasItem = true;
    _item = item;
    _nextItems = new Lazy<CachedEnumerable<T>>(nextItems);
  }

  /// <summary>
  /// Initialises a new instance of <see cref="CachedEnumerable{T}"/> with no current item and no next items.
  /// </summary>
  protected internal CachedEnumerable() {
    _hasItem = false;
  }

  /// <summary>
  /// Instantiates and returns a <see cref="CachedEnumerable{T}"/> for a given <paramref name="enumerable"/>.
  /// Notice: The first item is always iterated through.
  /// </summary>
  public static CachedEnumerable<T> Create(IEnumerable<T> enumerable) {
    return Create(enumerable.GetEnumerator());
  }

  /// <summary>
  /// Instantiates and returns a <see cref="CachedEnumerable{T}"/> for a given <paramref name="enumerator"/>.
  /// Notice: The first item is always iterated through.
  /// </summary>
  private static CachedEnumerable<T> Create(IEnumerator<T> enumerator) {
    return enumerator.MoveNext() ? new CachedEnumerable<T>(enumerator.Current, () => Create(enumerator)) : new CachedEnumerable<T>();
  }

  /// <summary>
  /// Returns an enumerator that iterates through the collection.
  /// </summary>
  public IEnumerator<T> GetEnumerator() {
    if (_hasItem) {
      yield return _item;

      var nextItems = _nextItems.Value;
      if (nextItems != null) {
        foreach (var nextItem in nextItems) {
          yield return nextItem;
        }
      }
    }
  }

  /// <summary>
  /// Returns an enumerator that iterates through a collection.
  /// </summary>
  IEnumerator IEnumerable.GetEnumerator() {
    return GetEnumerator();
  }
}

A useful extension method could be:

public static class IEnumerableExtensions {
  /// <summary>
  /// Instantiates and returns a <see cref="CachedEnumerable{T}"/> for a given <paramref name="enumerable"/>.
  /// Notice: The first item is always iterated through.
  /// </summary>
  public static CachedEnumerable<T> ToCachedEnumerable<T>(this IEnumerable<T> enumerable) {
    return CachedEnumerable<T>.Create(enumerable);
  }
}

And for the unit testers amongst you: (if you don't use resharper just take out the [SuppressMessage] attributes)

/// <summary>
/// Tests the <see cref="CachedEnumerable{T}"/> class.
/// </summary>
[TestFixture]
public class CachedEnumerableTest {
  private int _count;

  /// <remarks>
  /// This test case is only here to emphasise the problem with <see cref="IEnumerable{T}"/> which <see cref="CachedEnumerable{T}"/> attempts to solve.
  /// </remarks>
  [Test]
  [SuppressMessage("ReSharper", "PossibleMultipleEnumeration")]
  [SuppressMessage("ReSharper", "ReturnValueOfPureMethodIsNotUsed")]
  public void MultipleEnumerationAreNotCachedForOriginalIEnumerable() {
    _count = 0;

    var enumerable = Enumerable.Range(1, 40).Select(IncrementCount);

    enumerable.Take(3).ToArray();
    enumerable.Take(10).ToArray();
    enumerable.Take(4).ToArray();

    Assert.AreEqual(17, _count);
  }

  /// <remarks>
  /// This test case is only here to emphasise the problem with <see cref="IList{T}"/> which <see cref="CachedEnumerable{T}"/> attempts to solve.
  /// </remarks>
  [Test]
  [SuppressMessage("ReSharper", "PossibleMultipleEnumeration")]
  [SuppressMessage("ReSharper", "ReturnValueOfPureMethodIsNotUsed")]
  public void EntireListIsEnumeratedForOriginalListOrArray() {
    _count = 0;
    Enumerable.Range(1, 40).Select(IncrementCount).ToList();
    Assert.AreEqual(40, _count);

    _count = 0;
    Enumerable.Range(1, 40).Select(IncrementCount).ToArray();
    Assert.AreEqual(40, _count);
  }

  [Test]
  [SuppressMessage("ReSharper", "ReturnValueOfPureMethodIsNotUsed")]
  public void MultipleEnumerationsAreCached() {
    _count = 0;

    var cachedEnumerable = Enumerable.Range(1, 40).Select(IncrementCount).ToCachedEnumerable();

    cachedEnumerable.Take(3).ToArray();
    cachedEnumerable.Take(10).ToArray();
    cachedEnumerable.Take(4).ToArray();

    Assert.AreEqual(10, _count);
  }

  [Test]
  public void FreshCachedEnumerableDoesNotEnumerateExceptFirstItem() {
    _count = 0;

    Enumerable.Range(1, 40).Select(IncrementCount).ToCachedEnumerable();

    Assert.AreEqual(1, _count);
  }

  /// <remarks>
  /// Based on Jon Skeet's test mentioned here: http://www.siepman.nl/blog/post/2013/10/09/LazyList-A-better-LINQ-result-cache-than-List.aspx
  /// </remarks>
  [Test]
  [SuppressMessage("ReSharper", "LoopCanBeConvertedToQuery")]
  public void MatrixEnumerationIteratesAsExpectedWhileStillKeepingEnumeratedValuesCached() {
    _count = 0;

    var cachedEnumerable = Enumerable.Range(1, 5).Select(IncrementCount).ToCachedEnumerable();

    var matrixCount = 0;

    foreach (var x in cachedEnumerable) {
      foreach (var y in cachedEnumerable) {
        matrixCount++;
      }
    }

    Assert.AreEqual(5, _count);
    Assert.AreEqual(25, matrixCount);
  }

  [Test]
  public void OrderingCachedEnumerableWorksAsExpectedWhileStillKeepingEnumeratedValuesCached() {
    _count = 0;

    var cachedEnumerable = Enumerable.Range(1, 5).Select(IncrementCount).ToCachedEnumerable();

    var orderedEnumerated = cachedEnumerable.OrderBy(x => x);
    var orderedEnumeratedArray = orderedEnumerated.ToArray(); // Enumerated first time in ascending order.
    Assert.AreEqual(5, _count);

    for (int i = 0; i < orderedEnumeratedArray.Length; i++) {
      Assert.AreEqual(i + 1, orderedEnumeratedArray[i]);
    }

    var reorderedEnumeratedArray = orderedEnumerated.OrderByDescending(x => x).ToArray(); // Enumerated second time in descending order.
    Assert.AreEqual(5, _count);

    for (int i = 0; i < reorderedEnumeratedArray.Length; i++) {
      Assert.AreEqual(5 - i, reorderedEnumeratedArray[i]);
    }
  }

  private int IncrementCount(int value) {
    _count++;
    return value;
  }
}
Sandstrom answered 6/1, 2016 at 11:36 Comment(0)
F
5

I quite like hazzik's answer...nice and simple is always the way. BUT there is a bug in GetEnumerator

it sort of realises there is a problem, and thats why there is a strange 3rd loop after the 2nd enumerator loop....but it isnt quite as simple as that. The problem that triggers the need for the 3rd loop is general...so it needs to be recursive.

The answer though looks even simpler.

    public IEnumerator<T> GetEnumerator()
    {
        int index = 0;

        while (true)
        {
            if (index < _cache.Count)
            {
                yield return _cache[index];
                index = index + 1;
            }
            else
            {
                if (_enumerator.MoveNext())
                {
                    _cache.Add(_enumerator.Current);
                }
                else
                {
                    yield break;
                }
            }
        }
    }

yes you can make it a tiny bit more efficient by yielding current...but I'll take the microsecond hit...it only ever happens once per element.

and its not threadsafe...but who cares about that.

Freshet answered 19/12, 2016 at 12:48 Comment(1)
See https://mcmap.net/q/539199/-thread-safe-cached-enumerator-lock-with-yield/5683904 for a (attempted?) thread-safe versionYounglove
Y
2

Just to sum up things a bit:

  • In this answer a solution is presented, complete with an extension method for easy use and unit tests. However, as it uses recursion, performance can be expected to be worse than the other non recursive solution due to fewer allocations.
  • In this answer a non recursive solution is presented, including some code to account for the case where the enumerable is being enumerated twice. In this situation, however, it might not maintain the order of the original enumerable and it does not scale to more than two concurrent enumerations.
  • In this answer the enumerator method is rewritten to generalize the solution for the case of multiple concurrent enumeration while preserving the order of the original enumerable.

Combining the code from all answers we get the following class. Beware that this code is not thread safe, meaning that concurrent enumeration is safe only from the same thread.

public class CachedEnumerable<T> : IEnumerable<T>, IDisposable
{
    private readonly IEnumerator<T> enumerator;
    private readonly List<T> cache = new List<T>();

    public CachedEnumerable(IEnumerable<T> enumerable) : this(enumerable.GetEnumerator()) { }

    public CachedEnumerable(IEnumerator<T> enumerator)
        => this.enumerator = enumerator ?? throw new ArgumentNullException(nameof(enumerator));

    public IEnumerator<T> GetEnumerator()
    {
        int index = 0;
        while (true) {
            if (index < cache.Count) {
                yield return cache[index];
                index++;
            }
            else if (enumerator.MoveNext())
                cache.Add(enumerator.Current);
            else
                yield break;
        }
    }

    public void Dispose() => enumerator.Dispose();

    IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}

With the static extension method for easy use:

public static class EnumerableUtils
{
    public static CachedEnumerable<T> ToCachedEnumerable<T>(this IEnumerable<T> enumerable) 
        => new CachedEnumerable<T>(enumerable);
}

And the corresponding unit tests:

public class CachedEnumerableTest
{
    private int _count;

    [Test]
    public void MultipleEnumerationAreNotCachedForOriginalIEnumerable()
    {
        _count = 0;

        var enumerable = Enumerable.Range(1, 40).Select(incrementCount);

        enumerable.Take(3).ToArray();
        enumerable.Take(10).ToArray();
        enumerable.Take(4).ToArray();

        Assert.AreEqual(17, _count);
    }

    [Test]
    public void EntireListIsEnumeratedForOriginalListOrArray()
    {
        _count = 0;
        Enumerable.Range(1, 40).Select(incrementCount).ToList();
        Assert.AreEqual(40, _count);

        _count = 0;
        Enumerable.Range(1, 40).Select(incrementCount).ToArray();
        Assert.AreEqual(40, _count);
    }

    [Test]
    public void MultipleEnumerationsAreCached()
    {
        _count = 0;

        var cachedEnumerable = Enumerable.Range(1, 40).Select(incrementCount).ToCachedEnumerable();

        cachedEnumerable.Take(3).ToArray();
        cachedEnumerable.Take(10).ToArray();
        cachedEnumerable.Take(4).ToArray();

        Assert.AreEqual(10, _count);
    }

    [Test]
    public void FreshCachedEnumerableDoesNotEnumerateExceptFirstItem()
    {
        _count = 0;

        Enumerable.Range(1, 40).Select(incrementCount).ToCachedEnumerable();

        Assert.That(_count <= 1);
    }

    [Test]
    public void MatrixEnumerationIteratesAsExpectedWhileStillKeepingEnumeratedValuesCached()
    {
        _count = 0;

        var cachedEnumerable = Enumerable.Range(1, 5).Select(incrementCount).ToCachedEnumerable();

        var matrixCount = 0;

        foreach (var x in cachedEnumerable) {
            foreach (var y in cachedEnumerable) {
                matrixCount++;
            }
        }

        Assert.AreEqual(5, _count);
        Assert.AreEqual(25, matrixCount);
    }

    [Test]
    public void OrderingCachedEnumerableWorksAsExpectedWhileStillKeepingEnumeratedValuesCached()
    {
        _count = 0;

        var cachedEnumerable = Enumerable.Range(1, 5).Select(incrementCount).ToCachedEnumerable();

        var orderedEnumerated = cachedEnumerable.OrderBy(x => x);
        var orderedEnumeratedArray = orderedEnumerated.ToArray(); // Enumerated first time in ascending order.
        Assert.AreEqual(5, _count);

        for (int i = 0; i < orderedEnumeratedArray.Length; i++) {
            Assert.AreEqual(i + 1, orderedEnumeratedArray[i]);
        }

        var reorderedEnumeratedArray = orderedEnumerated.OrderByDescending(x => x).ToArray(); // Enumerated second time in descending order.
        Assert.AreEqual(5, _count);

        for (int i = 0; i < reorderedEnumeratedArray.Length; i++) {
            Assert.AreEqual(5 - i, reorderedEnumeratedArray[i]);
        }
    }

    private int incrementCount(int value)
    {
        _count++;
        return value;
    }
}
Yoheaveho answered 25/6, 2020 at 15:9 Comment(0)
A
-1

I don't see any serious problem with the idea to cache results in a list, just like in the above code. Probably, it would be better to construct the list using ToList() method.

public IEnumerable<ModuleData> ListModules()
{
    if (Modules == null)
    {
        Modules = Source.Descendants("Module")
                      .Select(m => new ModuleData(m.Element("ModuleID").Value, 1, 1)))
                      .ToList();
    }
    return Modules;
}
Animadvert answered 8/10, 2009 at 11:12 Comment(1)
That's much tidier that mine but calling ToList() iterates the entire enumerable anyway so it doesn't solve my problem.Dipstick

© 2022 - 2024 — McMap. All rights reserved.