Must IList be finite?
Asked Answered
L

7

15

Must .NET's IList be finite? Suppose I write a class FibonacciList implementing IList<BigInteger>

  • The property Item[n] returns the nth Fibonacci number.
  • The property IsReadOnly returns true.
  • The methods IndexOf and Contains we can implement easily enough because the Fibonacci sequence is increasing - to test if the number m is Fibonacci, we need only to compute the finite sequence of Fibonacci numbers up to m.
  • The method GetEnumerator() doing the right thing

We've now implemented all the methods expected of read-only ILists except Count().

Is this cool, or an abuse of IList?


Fibonacci numbers get impractically big quickly (hence IList<BigInteger> above) . A bounded infinite sequence might be more sensible, it could implement IList<long> or IList<double>.

Addendum II: Fibonacci sequence may have been a bad example, because computing distant values is expensive - to find the nth value one has to compute all earlier values. Thus as Mošmondor said, one might as well make it an IEnumerable and use .ElementAt. However there exist other sequences where one can compute distant values quickly without computing earlier values. (Surprisingly the digits of pi are such a sequence). These sequences are more 'listy', they truly support random access.

Edit: No-one argues against infinite IEnumerables. How do they handle Count()?

Lysias answered 11/7, 2012 at 15:14 Comment(12)
You cannot implement Count(), so this indeed looks like an abuse of IList<T>. After all, you cannot partially implement an interface.Photooffset
In a similar vein, some streams don't fulfill all of the base Stream properties, throwing exceptions where the method cannot be implemented (e.g. unknown length streams).Jumble
I'd simply return int.MaxValue. Indices are limited by the size of int too, and enumeration won't get there in realistic timeframes anyways.Teahouse
In java collections with a size >Integer.MAX_VALUE should return Integer.MAX_VALUE for size: docs.oracle.com/javase/6/docs/api/java/util/…Teahouse
Is there a mathmatical shortcut to the nth member of the fibonnaci series or is it simply case of summing intermediate members? i.e. would the indexer be faster than a summed enumeration?Halifax
Really interesting question, I like it :)Jumble
@Halifax look at this. It's amazingly simple.Guardhouse
@Colonel Panic There is an explicit formula for the fibonacci sequence.Guardhouse
The explicit formula uses real numbers (namely sqrt(5)), so it will be hard to use precisely.Lysias
@phg it is pretty amazing, a c# implementation of Binet's formula is listed here, among others en.wikibooks.org/wiki/Algorithm_Implementation/Mathematics/…Halifax
Fibonacci[500] is 139423224561697880139724382870407283950070256587697307264108962948325571622863290691557658876222521294125Lysias
@AustinSalonen I miscalculated.Teahouse
A
17

To most developers, IList and ICollection imply that you have a pre-evaluated, in-memory collection to work with. With IList specifically, there is an implicit contract of constant-time Add* and indexing operations. This is why LinkedList<T> does not implement IList<T>. I would consider a FibonacciList to be a violation of this implied contract.

Note the following paragraph from a recent MSDN Magazine article discussing the reasons for adding read-only collection interfaces to .NET 4.5:

IEnumerable<T> is sufficient for most scenarios that deal with collections of types, but sometimes you need more power than it provides:

  • Materialization: IEnumerable<T> does not allow you to express whether the collection is already available (“materialized”) or whether it’s computed every time you iterate over it (for example, if it represents a LINQ query). When an algorithm requires multiple iterations over the collection, this can result in performance degradation if computing the sequence is expensive; it can also cause subtle bugs because of identity mismatches when objects are being generated again on subsequent passes.

As others have pointed out, there is also the question of what you would return for .Count.

It's perfectly fine to use IEnumerable or IQueryable in for such collections of data, because there is an expectation that these types can be lazily evaluated.

Regarding Edit 1: .Count() is not implemented by the IEnumerable<T> interface: it is an extension method. As such, developers need to expect that it can take any amount of time, and they need to avoid calling it in cases where they don't actually need to know the number of items. For example, if you just want to know whether an IEnumerable<T> has any items, it's better to use .Any(). If you know that there's a maximum number of items you want to deal with, you can use .Take(). If a collection has more than int.MaxValue items in it, .Count() will encounter an operation overflow. So there are some workarounds that can help to reduce the danger associated with infinite sequences. Obviously if programmers haven't taken these possibilities into account, it can still cause problems, though.

Regarding Edit 2: If you're planning to implement your sequence in a way that indexing is constant-time, that addresses my main point pretty handily. Sixlettervariables's answer still holds true, though.

*Obviously there's more to this: Add is only expected to work if IList.IsFixedSize returns false. Modification is only possible if IsReadOnly returns false, etc. IList was a poorly-thought-out interface in the first place: a fact which may finally be remedied by the introduction of read-only collection interfaces in .NET 4.5.

Update

Having given this some additional thought, I've come to the personal opinion that IEnumerable<>s should not be infinite either. In addition to materializing methods like .ToList(), LINQ has several non-streaming operations like .OrderBy() which must consume the entire IEnumerable<> before the first result can be returned. Since so many methods assume IEnumerable<>s are safe to traverse in their entirety, it would be a violation of the Liskov Substitution Principle to produce an IEnumerable<> that is inherently unsafe to traverse indefinitely.

If you find that your application often requires segments of the Fibonacci sequence as IEnumerables, I'd suggest creating a method with a signature similar to Enumerable.Range(int, int), which allows the user to define a starting and ending index.

If you'd like to embark on a Gee-Whiz project, you could conceivably develop a Fibonacci-based IQueryable<> provider, where users could use a limited subset of LINQ query syntax, like so:

// LINQ to Fibonacci!
var fibQuery = from n in Fibonacci.Numbers // (returns an IQueryable<>)
               where n.Index > 5 && n.Value < 20000
               select n.Value;
var fibCount = fibQuery.Count();
var fibList = fibQuery.ToList();

Since your query provider would have the power to evaluate the where clauses as lambda expressions, you could have enough control to implement Count methods and .GetEnumerator() in a way as to ensure that the query is restrictive enough to produce a real answer, or throw an exception as soon as the method is called.

But this reeks of being clever, and would probably be a really bad idea for any real-life software.

Aphyllous answered 11/7, 2012 at 15:16 Comment(17)
Well, an IList<T> can be ReadOnly, so Add isn't required.Atalanta
Ever heard of arrays? :P (IList<T> like most .net collection interfaces is simply badly designed)Teahouse
"To most developers"? Why? And where is this contract stated? Why would anyone assume this?Smacker
The big thing here, in my eyes, is that the indexer is quite inneficient, which isn't expected of an IList.Rufus
@Rufus I think fast random access to Fibonacci numbers is possible. But of course after a few hundred numbers single fibonacci numbers will exceed the 2GB limit.Teahouse
@CodeInChaos The numbers aren't pre-computed; they're computed on the fly, so thisWeirdList[10000] will need to generate the 10,000 values just to give me one.Rufus
@luiscubal: I updated my answer with a reference showing that there is an implicit expectation (at least for MSDN magazine) that a "collection" will be materialized if it implements more than just the IEnumerable<T> interface.Aphyllous
"adding read-only collection interfaces to .NET 4.5" Hell it's about time. They're just 2.5 versions late...Teahouse
@sixlettervariables: Good point. I updated my answer with the appropriate caveats.Aphyllous
It's too bad there's no way to retroactively improve some of the core interfaces in .net. Among other things, IEnumerable should include a property indicate various abilities and characteristics (e.g. will repeated enumerations always yield the same data, should getting a count be considered fast, slow, or impossible, etc., along with a GetCount() function (whose utility could be ascertained via the aforementioned property). As it is, a lot of lists get defensively duplicated needlessly.Moller
@supercat: The problems you cite should mostly be addressed by the new read-only interfaces in .NET 4.5. If you really only expect to be given a cheap, stable series of values, your parameter type can be an IReadOnlyCollection<T>. Compile-time solutions like this are, in my opinion, the correct way to go about this. Using a property like the IList.IsReadOnly property just leads to more opportunities for runtime exceptions and more complicated code.Aphyllous
@StriplingWarrior: The new IReadOnlyCollection<out T> makes it possible for code which expects a sequentially-indexed collection of Animal to accept a collection of SiameseCat, but I didn't notice anything in there about immutability. Suppose Enumerable.Range(1,1000000) is passed to a class which wants to persist it. Later, that class passes it to another class, which passes it to a third. Net result: each class holds a million-item array, and if two classes want to check whether they hold the same sequence, they need to compare a million items.Moller
@StriplingWarrior: If instead there were a way for Enumerable.Range(1,1000000) to report itself as immutable, and the classes either checked before copying the sequence whether it was immutable, or else called an AsImmutable method which did so, then all the objects could end up with the same instance returned by Enumerable.Range. The SequenceEqual method could note that the sequences being compared were the same immutable object, and return true without having to compare anything. Major performance win.Moller
@StriplingWarrior: Having some "Immutable" interfaces defined could be helpful as well, but in many cases code would want to accept both mutable and non-mutable objects but simply treat them differently (e.g. by making defensive copies of non-immutable ones). Further, in some cases it may be possible for some instances of a class to be immutable even though others aren't (e.g. because a class implements "popsicle immunity"). Thus, an instance method would seem advantageous.Moller
@StriplingWarrior: Oops--I didn't notice which question I was commenting on. See #12654867 (a recent question of mine, which is very strongly related)Moller
@supercat: How often (in real-life code used by actual users) have you created multiple IEnumerable.Range(1,1000000)s, and then compared them to each other to see if they hold the same sequence? The added complexity of the system you're describing isn't worth it for basic, everyday interfaces like IEnumerable<>. See my response to your other question.Aphyllous
@StriplingWarrior: I've created and used a variety of immutable objects which don't store all of the state that can be read out via their interface. It's a pretty common pattern, actually. I mentioned IEnumerable.Range not because people would often call ToArray on it, but because it's probably the most widely known example of a class whose state, when read out via the normal method, would appear to be much larger than its internal representation.Moller
A
2

I would imagine that a conforming implementation must be finite, otherwise what would you return for ICollection<T>.Count?

/// <summary>
/// Gets the number of elements contained in the <see cref="ICollection{T}" />.
/// </summary>
int Count { get; }

Another consideration is CopyTo, which under its normal overload would never stop in a Fibonacci case.

What this means is an appropriate implementation of a Fibonacci Sequence would be simply IEnumerable<int> (using a generator pattern). (Ab)use of an IList<T> would just cause problems.

Atalanta answered 11/7, 2012 at 15:16 Comment(4)
One thought would be to return the max index calculated up until that point. So if you've done up to fib(10), it might be reasonable to return 10 as the current count. If my next request asks for fib(100), you'll have to add all those intermediate values.Phloem
@duffymo: supposing my first usage is if (fib.Count > 0) ..., what now?Atalanta
@cjk: that would likely break any L2O optimizations.Atalanta
@Jumble NotImplementedException is wrong. It should be NotSupportedException.Teahouse
D
1

In your case, I would rather 'violate' IEnumerable and have my way with yield return.

:)

Dacca answered 11/7, 2012 at 15:19 Comment(2)
IEnumerable actually supports infinite sequences quite well, and there is a precedence for it. It's not abusive in the least.Rufus
Personally, I wouldn't call that a rape. I'd even expect that a lazy list can hold potentially infinite data (aka "stream").Guardhouse
T
1

An infinite collection would probably best be implemented as an IEnumerable<T>, not an IList<T>. You could also make use of the yield return syntax when implementing, like so (ignore overflow issues, etc.):

public IEnumerable<long> Fib()
{
    yield return 1;
    yield return 1;
    long l1 = 1;
    long l2 = 1;
    while (true)
    {
        long t = l1;
        l1 = l2;
        l2 = t + l1;
        yield return l2;
    }
}
Tennilletennis answered 11/7, 2012 at 15:23 Comment(0)
H
1

EDIT

Having had a day to reflect on my answer and, in light of @StriplingWarrior's comment. I fear I have to make a reversal. I started trying this out last night and now I wonder what would I really lose by abandoning IList?

I think it would wiser to implement just IEnumerable and, declare a local Count() method that throws a NotSupportedException method to prevent the enumerator running until an OutOfMemoryException occurs. I would still add an IndexOf and Contains method and Item indexer property to expose higher performance alternatives like Binet's Formula but, I'd be free to change the signatures of these members to use extended datatypes potentially, even System.Numerics.BigInteger.

If I were implementing multiple series I would declare an ISeries interface for these members. Who know's, perhaps somthing like this will eventually be part of the framework.


I disagree with what appears to be a consensus view. Whilst IList has many members that cannot be implemented for an infinite series it does have an IsReadOnly member. It seems acceptable, certainly in the case of ReadOnlyCollection<>, to implement the majority of members with a NotSupportedException. Following this precedent, I don't see why this should be unacceptable if it is a side effect of some other gain in function.

In this specific Fibbonaci series case, there are established algortihms, see here and here, for shortcircuiting the normal cumalitive enumeration approach which I think would yield siginifcant performance benefits. Exposing these benefits through IList seems warranted to me.

Ideally, .Net would support some other, more appropriate super class of interface, somewhat closer to IEnumerable<> but, until that arrives in some future version, this has got to be a sensible approach.


I'm working on an implementation of IList<BigInteger> to illustrate

Halifax answered 11/7, 2012 at 17:9 Comment(3)
It is worth noting that the documentation for the ICollection<T>.Add method and the IList<T>.Item indexer both specify that a NotSupportedException will be thrown if the collection is read-only, and there is an IsReadOnly property that users can check to determine whether the implementation is read-only before they call one of these methods. The CopyTo method and the Count properties, on the other hand, make no special concessions for infinite collections.Aphyllous
@StriplingWarrior, noted, on reflection I've edited my answer. Just glad I'm not a politician.Halifax
I know what you mean. By the way, declaring a local Count() method will not be sufficient to prevent execution if people call the LINQ extension method with your object cast as an IEnumerable<>. Your class would have to implement ICollection<> for that to work. :-(Aphyllous
L
1

As @CodeInChaos pointed out in the comments, the Item property of IList has signature

T this[ int index ] { get; set; }

We see ILists are indexed by ints, so their length is bounded by Int32.MaxValue . Elements of greater index would be inaccessible. This occurred to me when writing the question, but I left it out, because the problem is fun to think about otherwise.

Lysias answered 11/7, 2012 at 20:39 Comment(0)
J
0

Summarising what I've seen so far:

You can fulfil 5 out of 6, throwing a NotSupportedException on Count()

I would have said this is probably good enough to go for it, however as servy has pointed out, the indexer is incredibly inefficient for any non-calculated and cached number.

In this case, I would say the only contract that fits your continual stream of calculations is IEnumerable.

The other option you have is to create something that looks a lot like an IList but isn't actually.

Jumble answered 11/7, 2012 at 15:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.