Problem understanding C# type inference as described in the language specification

The C# language specification describes type inference in Section §7.5.2. There is a detail in it that I don’t understand. Consider the following case:

// declaration
void Method<T>(T obj, Func<string, T> func);

// call
Method("obj", s => (object) s);

Both the Microsoft and Mono C# compilers correctly infer T = object, but my understanding of the algorithm in the specification would yield T = string and then fail. Here is how I understand it:

The first phase

If Ei is an anonymous function, an explicit parameter type inference (§7.5.2.7) is made from Ei to Ti

⇒ has no effect, because the lambda expression has no explicit parameter types. Right?
Otherwise, if Ei has a type U and xi is a value parameter then a lower-bound inference is made from U to Ti.

⇒ the first parameter is of static type string, so this adds string to the lower bounds for T, right?

The second phase

All unfixed type variables Xi which do not depend on (§7.5.2.5) any Xj are fixed (§7.5.2.10).

⇒ T is unfixed; T doesn’t depend on anything... so T should be fixed, right?

§7.5.2.11 Fixing

The set of candidate types Uj starts out as the set of all types in the set of bounds for Xi.

⇒ { string (lower bound) }
We then examine each bound for Xi in turn: [...] For each lower bound U of Xi all types Uj to which there is not an implicit conversion from U are removed from the candidate set. [...]

⇒ doesn’t remove anything from the candidate set, right?
If among the remaining candidate types Uj there is a unique type V from which there is an implicit conversion to all the other candidate types, then Xi is fixed to V.

⇒ Since there is only one candidate type, this is vacuously true, so Xi is fixed to string. Right?

So where am I going wrong?

UPDATE: My initial investigation on the bus this morning was incomplete and wrong. The text of the first phase specification is correct. The implementation is correct.

The spec is wrong in that it gets the order of events wrong in the second phase. We should be specifying that we make output type inferences before we fix the non-dependent parameters.

Man, this stuff is complicated. I have rewritten this section of the spec more times than I can remember.

I have seen this problem before, and I distinctly recall making revisions such that the incorrect term "type variable" was replaced everywhere with "type parameter". (Type parameters are not storage locations whose contents can vary, so it makes no sense to call them variables.) I think at the same time I noted that the ordering was wrong. Probably what happened was we accidentally shipped an older version of the spec on the web. Many apologies.

I will work with Mads to get the spec updated to match the implementation. I think the correct wording of the second phase should go something like this:

If no unfixed type parameters exist then type inference succeeds.

Otherwise, if there exists one or more arguments Ei with corresponding parameter type Ti such that the output type of Ei with type Ti contains at least one unfixed type parameter Xj, and none of the input types of Ei with type Ti contains any unfixed type parameter Xj, then an output type inference is made from all such Ei to Ti.

Whether or not the previous step actually made an inference, we must now fix at least one type parameter, as follows:

If there exists one or more type parameters Xi such that Xi is unfixed, and Xi has a non-empty set of bounds, and Xi does not depend on any Xj then each such Xi is fixed. If any fixing operation fails then type inference fails.

Otherwise, if there exists one or more type parameters Xi such that Xi is unfixed, and Xi has a non-empty set of bounds, and there is at least one type parameter Xj that depends on Xi then each such Xi is fixed. If any fixing operation fails then type inference fails.

Otherwise, we are unable to make progress and there are unfixed parameters. Type inference fails.

If type inference neither fails nor succeeds then the second phase is repeated.

The idea here is that we want to ensure that the algorithm never goes into an infinite loop. At every repetition of the second phase it either succeeds, fails, or makes progress. It cannot possibly loop more times than there are type parameters to fix to types.

Thanks for bringing this to my attention.

Recommended topics

Hot tags