Is C# type system sound and decidable?
Asked Answered
F

3

57

I know that Java's type system is unsound (it fails to type check constructs that are semantically legal) and undecidable (it fails to type check some construct).

For instance, if you copy/paste the following snippet in a class and compile it, the compiler will crash with a StackOverflowException (how apt). This is undecidability.

static class ListX<T> {}
static class C<P> extends ListX<ListX<? super C<C<P>>>> {}
ListX<? super C<Byte>> crash = new C<Byte>();

Java uses wildcards with type bounds, which are a form of use-site variance. C# on the other hand, uses declaration site variance annotation (with the in and out keywords). It is known that declaration-site variance is weaker than use-site variance (use-site variance can express everything declaration-site variance can, and more -- on the down side, it's much more verbose).

So my question is: Is C# type system sound and decidable? If not, why?

Fernand answered 29/5, 2014 at 17:20 Comment(4)
Actually, a bit more search turned out this: research.microsoft.com/en-us/um/people/akenn/generics/… which states that C# type system is undecidable (as well as that of Java and Scala) and goes on to give some insight on why.Fernand
Not sure if this quite qualifies as the type system as being unsound, but you can cause the C# compiler to emit arbitrarily large assemblies with only a few lines. See this question.Mosora
@mikez type checking doesn't have anything to do with code generation. Anyway, Norswap, since you found your answer you should post it, with a reference to that link (yes it is perfectly acceptable).Nubianubian
Actually, unsound means accepting programs that can "go wrong" (for some agreed-upon definition of what it means for a program's execution to "go wrong"); you seem to be describing an incomplete type system, which rejects programs that can't actually "go wrong".Errhine
G
63

Is C# type system decidable?

A type system is "decidable" if the compiler is in theory always able to decide whether the program type checks or not in finite time.

The C# type system is not decidable.

C# has "nominal" subtyping -- that is, you give classes and interfaces names and say what the base classes and interfaces are by name when you declare a class.

C# also has generic types, and, as of C# 4, covariance and contravariance of generic interfaces.

Those three things -- nominal subtyping, generic interfaces, and contravariance -- are sufficient to make a type system undecidable (in the absence of other restrictions on the ways that subtypes may mention each other.)

When this answer was originally written in 2014, that was suspected but not known. The history of this discovery is interesting.

First, the designers of the C# generic type system wondered the same thing, and wrote a paper in 2007 describing different ways in which type checking can go wrong, and what restrictions one can put on a nominal subtyping system that make it decidable.

https://www.microsoft.com/en-us/research/publication/on-decidability-of-nominal-subtyping-with-variance/

A more gentle introduction to the problem can be found on my blog, here:

https://ericlippert.com/2008/05/07/covariance-and-contravariance-part-11-to-infinity-but-not-beyond/

I have written about this subject on SE sites before; a researcher noticed the problem mentioned in that posting and solved it; we now know that nominal subtyping is in general undecidable if there is generic contravariance thrown into the mix. You can encode a Turing Machine into the type system and force the compiler to emulate its operation, and since the question "does this TM halt?" is undecidable, so must type checking be undecidable.

See https://arxiv.org/abs/1605.05274 for the details.

Is the C# type system sound?

A type system is "sound" if we are guaranteed that a program which type checks at compile time has no type errors at runtime.

The C# type system is not sound.

There are many reasons why it is not, but my least favourite is array covariance:

Giraffe[] giraffes = new[] { new Giraffe() };
Animal[] animals = giraffes; // This is legal!
animals[0] = new Tiger(); // crashes at runtime with a type error

The idea here is that most methods that take arrays only read the array, they do not write it, and it is safe to read an animal out of an array of giraffes. Java allows this, and so the CLR allows it because the CLR designers wanted to be able to implement variations on Java. C# allows it because the CLR allows it. The consequence is that every time you write anything into an array of a base class, the runtime must do a check to verify that the array is not an array of an incompatible derived class. The common case gets slower so that the rare error case can get an exception.

That brings up a good point though: C# is at least well-defined as to the consequences of a type error. Type errors at runtime produce sane behaviour in the form of exceptions. It's not like C or C++ where the compiler can and will blithely generate code that does arbitrarily crazy things.

There are a few other ways in which the C# type system is unsound by design.

  • If you consider getting a null reference exception to be a kind of runtime type error, then C# pre C# 8 is very unsound in that it does almost nothing to prevent this kind of error. C# 8 has many improvements in support for detecting nullity errors statically, but the null reference type checking is not sound; it has both false positives and false negatives. The idea is that some compile-time checking is better than none, even if it is not 100% reliable.

  • Many cast expressions allow the user to override the type system and declare "I know this expression will be of a more specific type at runtime, and if I'm wrong, throw an exception". (Some casts mean the opposite: "I know this expression is of type X, please generate code to convert it to an equivalent value of type Y". Those are generally safe.) Since this is a place where the developer is specifically saying that they know better than the type system, one can hardly blame the type system for the resulting crash.

There are also a handful of features that generate cast-like behaviour even though there is no cast in the code. For example, if you have a list of animals you can say

foreach(Giraffe g in animals)

and if there is a tiger in there, your program will crash. As the specification notes, the compiler simply inserts a cast on your behalf. (If you want to loop over all the giraffes and ignore the tigers, that's foreach(Giraffe g in animals.OfType<Giraffe>()).)

  • The unsafe subset of C# makes all bets off; you can break the rules of the runtime arbitrarily with it. Turning off a safety system turns a safety system off, so it should not be surprising that C# is not sound when you turn off soundness checking.
Gogol answered 31/5, 2014 at 8:36 Comment(6)
As far as I can tell this answers the 'decidability' part but not the 'soundness' part. If I understand correctly, the C# type system is unsound because of nulls?Highbinder
@yawar C# is not null safe yes. Also array covariance is unsound. But it is sound in the sense that you are guaranteed that you get an exception when you hit a type error at runtime.Gogol
The gentle introduction link doesn't work. Is there a chance to fix it or access that blog post some other way for those of us interested?Sarcasm
@TKharaishvili: Thanks for the note. For reasons unknown to me, a number of my MSDN blog articles are missing from the Microsoft archive, and that is one of them. I've ported the original article to my own server and added some new commentary. ericlippert.com/2008/05/07/…Gogol
@EricLippert: not related to this answer but Method Hiding Apologia, learn.microsoft.com/en-us/archive/blogs/ericlippert/…, doesn’t seem to be on your website and searching for sentences via google fails to find it on MS’s archive (that last is a complaint against google, obviously you aren’t responsible for their failures)Harter
This is an exciting topic that recently caught my interest. I tried the example from the blog but modified it to interface IC<T> : IN<IN<IC<T>>> (omitting the last IC), because that generates an endless non-expanding recursion and is still rejected by the compiler. Is the assignment here actually safe (no runtime exceptions related to typing?), even though the compiler can't prove it? In my mind, it should be, because the type checker enters a closed cycle (i.e. potential endless recursion at runtime, but no type errors...).Aim
W
4

It's not particularly hard to create problems that the C# complier cannot solve in a reasonable amount of time. Some of the problems it is posed with (often related to generics/type inference) are NP-hard problems. Eric Lippert describes one such example here:

class MainClass
{
    class T{}
    class F{}
    delegate void DT(T t);
    delegate void DF(F f);
    static void M(DT dt)
    {
        System.Console.WriteLine("true");
        dt(new T());
    }
    static void M(DF df)
    {
        System.Console.WriteLine("false");
        df(new F());
    }
    static T Or(T a1, T a2, T a3){return new T();}
    static T Or(T a1, T a2, F a3){return new T();}
    static T Or(T a1, F a2, T a3){return new T();}
    static T Or(T a1, F a2, F a3){return new T();}
    static T Or(F a1, T a2, T a3){return new T();}
    static T Or(F a1, T a2, F a3){return new T();}
    static T Or(F a1, F a2, T a3){return new T();}
    static F Or(F a1, F a2, F a3){return new F();}
    static T And(T a1, T a2){return new T();}
    static F And(T a1, F a2){return new F();}
    static F And(F a1, T a2){return new F();}
    static F And(F a1, F a2){return new F();}
    static F Not(T a){return new F();}
    static T Not(F a){return new T();}
    static void MustBeT(T t){}
    static void Main()
    {
        // Introduce enough variables and then encode any Boolean predicate:
        // eg, here we encode (!x3) & ((!x1) & ((x1 | x2 | x1) & (x2 | x3 | x2)))
        M(x1=>M(x2=>M(x3=>MustBeT(
          And(
            Not(x3), 
            And(
              Not(x1), 
              And(
                Or(x1, x2, x1), 
                Or(x2, x3, x2))))))));
    }
}
Windsor answered 29/5, 2014 at 17:25 Comment(20)
I don't see how your answer is related to the question. The OP asked about decidibility and soundness, not complexity of type checking in C#.Nubianubian
@Nubianubian The code in question won't actually finish when given a non-trivial amount of input; it'll just sit there spinning for some technically finite, but extremely long period of time. For practical purposes, you can assert that it won't finish at all.Windsor
And who cares for practical purposes? The OP didn't ask "is C#'s type-system sound and decidable in a 10-minutes coffee break time?".Nubianubian
@Nubianubian Who cares for theoretical purposes? I'd expect just about everyone to care for practical purposes. If the program I want to compile is going to take 20 years to complete then as far as I care, it isn't sound. The fact that if I wait long enough it must, in theory, complete isn't really relevant to me.Windsor
@Windsor Yes, but you are answering a question posed by someone else who never mentioned complexity. What you wrote as answer could have been a good comment over the practical aspects that the OP never asked about but that are related to the topic with a different perspective. Posting this as an answer, without further development of the subject, is simply an incomplete answer.Nubianubian
@Nubianubian That is an appropriate comment, which is in start contrast to your previous comment that nobody would care. Most everyone cares whether the compiler is practical. Very few people are interested in whether it will theoretically complete at some point in the distant future. Since that's the information that's useful for the vast majority of readers, I considered it worthy of an answer. If you disagree, that's your right.Windsor
dotnetfiddle.net/crCCCE - This code compiles and runs fine with .Net 5.0.Chiasma
@TheSharpNinja Yeah, that's the demonstration of the pattern that you'd need to follow which scales extremely poorly. To produce code that doesn't run, you'd need to add some more overloads.Windsor
The point is that with .net 5.0 the performance is very good.Chiasma
@TheSharpNinja The scaling is inherent to the problem it's trying to solve, not one specific implementation. And of course the runtime version is going to have no bearing on the compiler's ability to compile stuff.Windsor
You do know that the compiler got massively upgrade with .net 5.0...Chiasma
@TheSharpNinja No, it was upgraded with C# 6.0. The compiler and runtime are two very different things. The runtime has nothing to do with compiler behavior. And again, the problem here is not about the implementation, it's about the requirements of the problem even if implemented theoretically perfectly.Windsor
Yes, quit aware there are more than one piece of the puzzle. But the .Net 5.0 compiler is very much changed from the version of Roslyn shipped with .Net Core 3.x and .NET Framework 4.8. One of the changes is a new binary format that is not backwards compatible with previous runtimes which required both the compiler and runtime be changed. The .Net 5.0 SDK ships with matching compiler and common language runtime while the .Net 5.0 Runtime package excludes a public interface to the compiler (which is still there and callable at runtime).Chiasma
@TheSharpNinja The runtime is not relevant to a discussion about compiler behavior. This entire discussion is entirely about getting code to even compile, not how it behaves when it's running. And again, the changes to the C# compiler since this answer were written doesn't change anything about this answer; everything that was true before is still true, as they aren't dependent on implementation details.Windsor
@Windsor - OK, give an example that actually takes a long time to compile on .NET 5.0.Chiasma
@TheSharpNinja .NET isn't a compiler, so you can't compile anything on it.Windsor
@TheSharpNinja That both are shipped together by an installer doesn't mean you use the runtime to compile your code. You use the compiler to do that, you use the runtime to run already compiled code. This is a discussion about the compiler, not about dynamically generating and compiling code at runtime by an unspecified program calling out to a compiler in an unknown way, which is well beyond the scope of this question.Windsor
You don't use any generics in your example, and the link is broken... I'm not really convinced that this is a demonstration that the compiler will solve 3-SAT, it looks more like type-checking in the simply-typed lambda calculus, which is well known to be P complete.Partain
@NathanChappell The link was to a blog post of Eric who, at the time he posted that, was a developer of the C# compiler. 10 years later I don't remember much of the explanation, but he was describing the actual implementation of the feature at the time. I don't remember at this point if the explanation forced the complier into that level of complexity in any implementation, or if that was just the current implementation at the time. My earlier comments suggests the former.Windsor
@Windsor thanks for trying to clarify, but all of what you said is essentially obvious other than you not remembering the explanation. Like I said in a comment on another answer, I'm skeptical of the claims made about the C# typing system because they are quite restricted when compared to java, for which typing is known to be undecidable. I wouldn't be too surprised if it could be coerced into solving NP-Hard or complete problems, but I'm not convinced that the example above demonstrates that.Partain
D
4

Of course the answer of @Eric-Lippert is authoritative. I would just like to stress though that the Variance Problem above applies only to Arrays.

It goes away when using Generics, because then you can have only Co-, Contra- or In-Variance. This disallows one of these applications: either Member Assignment, Member Queries or Collection Assignment:

InVariance disallows Collection Assignment:

IList<Giraffe> giraffes3 = new List<Giraffe>{new()};
IList<Animal> animals3 = giraffes3; // ! does NOT compile!

Co-Variance disallows Member Assignment:

IReadOnlyList<Giraffe> giraffes1 = new List<Giraffe>{new()};
IReadOnlyList<Animal> animals1 = giraffes1; // This is legal!
animals1[0] = new Tiger(); // ! does NOT compile!

Contra-Variance disallows passing other Subtypes:

IObserver<Animal> animals2 = new MyObserver<Animal>(); 
IObserver<Giraffe> giraffes2 = animals2; // This is legal!
giraffes2.OnNext(new Giraffe());
animals2.OnNext(new Tiger());
giraffes2.OnNext(new Tiger()); // ! does NOT compile!

Full Variance allows everything but fails at Runtime (which is the worst):

Giraffe[] giraffes = {new()};
Animal[] animals = giraffes; // This is legal!
animals[0] = new Tiger(); // ! Runtime Exception !

So as long as you try to avoid using arrays in APIs and use them only internally e.g. for performance, you should be quite fine.

Disinter answered 2/1, 2021 at 23:39 Comment(1)
Yea something is bothering me about the claim that C# typing is undecidable due to generics... I understand this is the case for Java, but C# restricts the use of contravariance pretty significantly...Partain

© 2022 - 2024 — McMap. All rights reserved.