Must IList be finite?

Asked 11/7, 2012 at 15:14 Answered 11/7, 2012 at 20:39

Must .NET's IList be finite? Suppose I write a class FibonacciList implementing IList<BigInteger>

The property Item[n] returns the nth Fibonacci number.
The property IsReadOnly returns true.
The methods IndexOf and Contains we can implement easily enough because the Fibonacci sequence is increasing - to test if the number m is Fibonacci, we need only to compute the finite sequence of Fibonacci numbers up to m.
The method GetEnumerator() doing the right thing

We've now implemented all the methods expected of read-only ILists except Count().

Is this cool, or an abuse of IList?

Fibonacci numbers get impractically big quickly (hence IList<BigInteger> above) . A bounded infinite sequence might be more sensible, it could implement IList<long> or IList<double>.

Addendum II: Fibonacci sequence may have been a bad example, because computing distant values is expensive - to find the nth value one has to compute all earlier values. Thus as Mošmondor said, one might as well make it an IEnumerable and use .ElementAt. However there exist other sequences where one can compute distant values quickly without computing earlier values. (Surprisingly the digits of pi are such a sequence). These sequences are more 'listy', they truly support random access.

Edit: No-one argues against infinite IEnumerables. How do they handle Count()?

Lysias answered 11/7, 2012 at 15:14 Comment(12)

You cannot implement Count(), so this indeed looks like an abuse of IList<T>. After all, you cannot partially implement an interface. – Photooffset 11/7, 2012 at 15:17

In a similar vein, some streams don't fulfill all of the base Stream properties, throwing exceptions where the method cannot be implemented (e.g. unknown length streams). – Jumble 11/7, 2012 at 15:17

I'd simply return int.MaxValue. Indices are limited by the size of int too, and enumeration won't get there in realistic timeframes anyways. – Teahouse 11/7, 2012 at 15:19

In java collections with a size >Integer.MAX_VALUE should return Integer.MAX_VALUE for size: docs.oracle.com/javase/6/docs/api/java/util/… – Teahouse 11/7, 2012 at 15:24

Is there a mathmatical shortcut to the nth member of the fibonnaci series or is it simply case of summing intermediate members? i.e. would the indexer be faster than a summed enumeration? – Halifax 11/7, 2012 at 15:25

Really interesting question, I like it :) – Jumble 11/7, 2012 at 15:26

@Halifax look at this. It's amazingly simple. – Guardhouse 11/7, 2012 at 15:28

@Colonel Panic There is an explicit formula for the fibonacci sequence. – Guardhouse 11/7, 2012 at 15:47

The explicit formula uses real numbers (namely sqrt(5)), so it will be hard to use precisely. – Lysias 11/7, 2012 at 15:52

@phg it is pretty amazing, a c# implementation of Binet's formula is listed here, among others en.wikibooks.org/wiki/Algorithm_Implementation/Mathematics/… – Halifax 11/7, 2012 at 15:54

Fibonacci[500] is 139423224561697880139724382870407283950070256587697307264108962948325571622863290691557658876222521294125 – Lysias 11/7, 2012 at 16:35

@AustinSalonen I miscalculated. – Teahouse 11/7, 2012 at 16:36

To most developers, IList and ICollection imply that you have a pre-evaluated, in-memory collection to work with. With IList specifically, there is an implicit contract of constant-time Add^* and indexing operations. This is why LinkedList<T> does not implement IList<T>. I would consider a FibonacciList to be a violation of this implied contract.

Note the following paragraph from a recent MSDN Magazine article discussing the reasons for adding read-only collection interfaces to .NET 4.5:

IEnumerable<T> is sufficient for most scenarios that deal with collections of types, but sometimes you need more power than it provides:

Materialization: IEnumerable<T> does not allow you to express whether the collection is already available (“materialized”) or whether it’s computed every time you iterate over it (for example, if it represents a LINQ query). When an algorithm requires multiple iterations over the collection, this can result in performance degradation if computing the sequence is expensive; it can also cause subtle bugs because of identity mismatches when objects are being generated again on subsequent passes.

As others have pointed out, there is also the question of what you would return for .Count.

It's perfectly fine to use IEnumerable or IQueryable in for such collections of data, because there is an expectation that these types can be lazily evaluated.

Regarding Edit 1: .Count() is not implemented by the IEnumerable<T> interface: it is an extension method. As such, developers need to expect that it can take any amount of time, and they need to avoid calling it in cases where they don't actually need to know the number of items. For example, if you just want to know whether an IEnumerable<T> has any items, it's better to use .Any(). If you know that there's a maximum number of items you want to deal with, you can use .Take(). If a collection has more than int.MaxValue items in it, .Count() will encounter an operation overflow. So there are some workarounds that can help to reduce the danger associated with infinite sequences. Obviously if programmers haven't taken these possibilities into account, it can still cause problems, though.

Regarding Edit 2: If you're planning to implement your sequence in a way that indexing is constant-time, that addresses my main point pretty handily. Sixlettervariables's answer still holds true, though.

^{*Obviously there's more to this: Add is only expected to work if IList.IsFixedSize returns false. Modification is only possible if IsReadOnly returns false, etc. IList was a poorly-thought-out interface in the first place: a fact which may finally be remedied by the introduction of read-only collection interfaces in .NET 4.5.}

Update

Having given this some additional thought, I've come to the personal opinion that IEnumerable<>s should not be infinite either. In addition to materializing methods like .ToList(), LINQ has several non-streaming operations like .OrderBy() which must consume the entire IEnumerable<> before the first result can be returned. Since so many methods assume IEnumerable<>s are safe to traverse in their entirety, it would be a violation of the Liskov Substitution Principle to produce an IEnumerable<> that is inherently unsafe to traverse indefinitely.

If you find that your application often requires segments of the Fibonacci sequence as IEnumerables, I'd suggest creating a method with a signature similar to Enumerable.Range(int, int), which allows the user to define a starting and ending index.

If you'd like to embark on a Gee-Whiz project, you could conceivably develop a Fibonacci-based IQueryable<> provider, where users could use a limited subset of LINQ query syntax, like so:

// LINQ to Fibonacci!
var fibQuery = from n in Fibonacci.Numbers // (returns an IQueryable<>)
               where n.Index > 5 && n.Value < 20000
               select n.Value;
var fibCount = fibQuery.Count();
var fibList = fibQuery.ToList();

Since your query provider would have the power to evaluate the where clauses as lambda expressions, you could have enough control to implement Count methods and .GetEnumerator() in a way as to ensure that the query is restrictive enough to produce a real answer, or throw an exception as soon as the method is called.

But this reeks of being clever, and would probably be a really bad idea for any real-life software.

Aphyllous answered 11/7, 2012 at 15:16 Comment(17)

Well, an IList<T> can be ReadOnly, so Add isn't required. – Atalanta 11/7, 2012 at 15:17

Ever heard of arrays? :P (IList<T> like most .net collection interfaces is simply badly designed) – Teahouse 11/7, 2012 at 15:17

"To most developers"? Why? And where is this contract stated? Why would anyone assume this? – Smacker 11/7, 2012 at 15:18

The big thing here, in my eyes, is that the indexer is quite inneficient, which isn't expected of an IList. – Rufus 11/7, 2012 at 15:19

@Rufus I think fast random access to Fibonacci numbers is possible. But of course after a few hundred numbers single fibonacci numbers will exceed the 2GB limit. – Teahouse 11/7, 2012 at 15:21

@CodeInChaos The numbers aren't pre-computed; they're computed on the fly, so thisWeirdList[10000] will need to generate the 10,000 values just to give me one. – Rufus 11/7, 2012 at 15:23

@luiscubal: I updated my answer with a reference showing that there is an implicit expectation (at least for MSDN magazine) that a "collection" will be materialized if it implements more than just the IEnumerable<T> interface. – Aphyllous 11/7, 2012 at 15:27

"adding read-only collection interfaces to .NET 4.5" Hell it's about time. They're just 2.5 versions late... – Teahouse 11/7, 2012 at 15:28

@sixlettervariables: Good point. I updated my answer with the appropriate caveats. – Aphyllous 11/7, 2012 at 15:38

It's too bad there's no way to retroactively improve some of the core interfaces in .net. Among other things, IEnumerable should include a property indicate various abilities and characteristics (e.g. will repeated enumerations always yield the same data, should getting a count be considered fast, slow, or impossible, etc., along with a GetCount() function (whose utility could be ascertained via the aforementioned property). As it is, a lot of lists get defensively duplicated needlessly. – Moller 28/9, 2012 at 23:44

@supercat: The problems you cite should mostly be addressed by the new read-only interfaces in .NET 4.5. If you really only expect to be given a cheap, stable series of values, your parameter type can be an IReadOnlyCollection<T>. Compile-time solutions like this are, in my opinion, the correct way to go about this. Using a property like the IList.IsReadOnly property just leads to more opportunities for runtime exceptions and more complicated code. – Aphyllous 1/10, 2012 at 15:52

@StriplingWarrior: The new IReadOnlyCollection<out T> makes it possible for code which expects a sequentially-indexed collection of Animal to accept a collection of SiameseCat, but I didn't notice anything in there about immutability. Suppose Enumerable.Range(1,1000000) is passed to a class which wants to persist it. Later, that class passes it to another class, which passes it to a third. Net result: each class holds a million-item array, and if two classes want to check whether they hold the same sequence, they need to compare a million items. – Moller 1/10, 2012 at 16:20

@StriplingWarrior: If instead there were a way for Enumerable.Range(1,1000000) to report itself as immutable, and the classes either checked before copying the sequence whether it was immutable, or else called an AsImmutable method which did so, then all the objects could end up with the same instance returned by Enumerable.Range. The SequenceEqual method could note that the sequences being compared were the same immutable object, and return true without having to compare anything. Major performance win. – Moller 1/10, 2012 at 16:23

@StriplingWarrior: Having some "Immutable" interfaces defined could be helpful as well, but in many cases code would want to accept both mutable and non-mutable objects but simply treat them differently (e.g. by making defensive copies of non-immutable ones). Further, in some cases it may be possible for some instances of a class to be immutable even though others aren't (e.g. because a class implements "popsicle immunity"). Thus, an instance method would seem advantageous. – Moller 1/10, 2012 at 16:29

@StriplingWarrior: Oops--I didn't notice which question I was commenting on. See #12654867 (a recent question of mine, which is very strongly related) – Moller 1/10, 2012 at 16:31

@supercat: How often (in real-life code used by actual users) have you created multiple IEnumerable.Range(1,1000000)s, and then compared them to each other to see if they hold the same sequence? The added complexity of the system you're describing isn't worth it for basic, everyday interfaces like IEnumerable<>. See my response to your other question. – Aphyllous 1/10, 2012 at 19:6

@StriplingWarrior: I've created and used a variety of immutable objects which don't store all of the state that can be read out via their interface. It's a pretty common pattern, actually. I mentioned IEnumerable.Range not because people would often call ToArray on it, but because it's probably the most widely known example of a class whose state, when read out via the normal method, would appear to be much larger than its internal representation. – Moller 1/10, 2012 at 22:14

I would imagine that a conforming implementation must be finite, otherwise what would you return for ICollection<T>.Count?

/// <summary>
/// Gets the number of elements contained in the <see cref="ICollection{T}" />.
/// </summary>
int Count { get; }

Another consideration is CopyTo, which under its normal overload would never stop in a Fibonacci case.

What this means is an appropriate implementation of a Fibonacci Sequence would be simply IEnumerable<int> (using a generator pattern). (Ab)use of an IList<T> would just cause problems.

Atalanta answered 11/7, 2012 at 15:16 Comment(4)

One thought would be to return the max index calculated up until that point. So if you've done up to fib(10), it might be reasonable to return 10 as the current count. If my next request asks for fib(100), you'll have to add all those intermediate values. – Phloem 11/7, 2012 at 15:17

@duffymo: supposing my first usage is if (fib.Count > 0) ..., what now? – Atalanta 11/7, 2012 at 15:20

@cjk: that would likely break any L2O optimizations. – Atalanta 11/7, 2012 at 15:20

@Jumble NotImplementedException is wrong. It should be NotSupportedException. – Teahouse 11/7, 2012 at 15:22

In your case, I would rather 'violate' IEnumerable and have my way with yield return.

Dacca answered 11/7, 2012 at 15:19 Comment(2)

IEnumerable actually supports infinite sequences quite well, and there is a precedence for it. It's not abusive in the least. – Rufus 11/7, 2012 at 15:20

Personally, I wouldn't call that a rape. I'd even expect that a lazy list can hold potentially infinite data (aka "stream"). – Guardhouse 11/7, 2012 at 15:22

An infinite collection would probably best be implemented as an IEnumerable<T>, not an IList<T>. You could also make use of the yield return syntax when implementing, like so (ignore overflow issues, etc.):

public IEnumerable<long> Fib()
{
    yield return 1;
    yield return 1;
    long l1 = 1;
    long l2 = 1;
    while (true)
    {
        long t = l1;
        l1 = l2;
        l2 = t + l1;
        yield return l2;
    }
}

Tennilletennis answered 11/7, 2012 at 15:23 Comment(0)

EDIT

Having had a day to reflect on my answer and, in light of @StriplingWarrior's comment. I fear I have to make a reversal. I started trying this out last night and now I wonder what would I really lose by abandoning IList?

I think it would wiser to implement just IEnumerable and, declare a local Count() method that throws a NotSupportedException method to prevent the enumerator running until an OutOfMemoryException occurs. I would still add an IndexOf and Contains method and Item indexer property to expose higher performance alternatives like Binet's Formula but, I'd be free to change the signatures of these members to use extended datatypes potentially, even System.Numerics.BigInteger.

If I were implementing multiple series I would declare an ISeries interface for these members. Who know's, perhaps somthing like this will eventually be part of the framework.

I disagree with what appears to be a consensus view. Whilst IList has many members that cannot be implemented for an infinite series it does have an IsReadOnly member. It seems acceptable, certainly in the case of ReadOnlyCollection<>, to implement the majority of members with a NotSupportedException. Following this precedent, I don't see why this should be unacceptable if it is a side effect of some other gain in function.

In this specific Fibbonaci series case, there are established algortihms, see here and here, for shortcircuiting the normal cumalitive enumeration approach which I think would yield siginifcant performance benefits. Exposing these benefits through IList seems warranted to me.

Ideally, .Net would support some other, more appropriate super class of interface, somewhat closer to IEnumerable<> but, until that arrives in some future version, this has got to be a sensible approach.

I'm working on an implementation of IList<BigInteger> to illustrate

Halifax answered 11/7, 2012 at 17:9 Comment(3)

It is worth noting that the documentation for the ICollection<T>.Add method and the IList<T>.Item indexer both specify that a NotSupportedException will be thrown if the collection is read-only, and there is an IsReadOnly property that users can check to determine whether the implementation is read-only before they call one of these methods. The CopyTo method and the Count properties, on the other hand, make no special concessions for infinite collections. – Aphyllous 12/7, 2012 at 15:29

@StriplingWarrior, noted, on reflection I've edited my answer. Just glad I'm not a politician. – Halifax 12/7, 2012 at 16:34

I know what you mean. By the way, declaring a local Count() method will not be sufficient to prevent execution if people call the LINQ extension method with your object cast as an IEnumerable<>. Your class would have to implement ICollection<> for that to work. :-( – Aphyllous 12/7, 2012 at 17:57

As @CodeInChaos pointed out in the comments, the Item property of IList has signature

T this[ int index ] { get; set; }

We see ILists are indexed by ints, so their length is bounded by Int32.MaxValue . Elements of greater index would be inaccessible. This occurred to me when writing the question, but I left it out, because the problem is fun to think about otherwise.

Lysias answered 11/7, 2012 at 20:39 Comment(0)

Summarising what I've seen so far:

You can fulfil 5 out of 6, throwing a NotSupportedException on Count()

I would have said this is probably good enough to go for it, however as servy has pointed out, the indexer is incredibly inefficient for any non-calculated and cached number.

In this case, I would say the only contract that fits your continual stream of calculations is IEnumerable.

The other option you have is to create something that looks a lot like an IList but isn't actually.

Jumble answered 11/7, 2012 at 15:25 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Update

Recommended topics

Hot tags