What does it mean for a language to be statically typed?

P

6

14

My understanding is that it means that one can potentially write a program to formally prove that a program written in a statically typed language will be free of a certain (small) subset of defects.

My problem with this is as follows:

Assume that we have two turing complete languages, A and B. A is presumed to be 'type safe' and 'B' is presumed not to be. Suppose I am given a program L to check the correctness of any program written in A. What is to stop me from translating any program written in B to A, applying L. If P translates from A to B then why isn't PL a valid type checker for any program written in B?

I'm trained in Algebra and am only just starting to study CS so there might be some obvious reason that this doesn't work but I would very much like to know. This whole 'type safety' thing has smelt fishy to me for a while.

Phase answered 2/9, 2010 at 1:53 Comment(4)

Related Question: #468371 – Vernice 2/9, 2010 at 2:3

someone, please add C++ also to the tag of this question. – Machismo 2/9, 2010 at 2:4

@chubsdad: Er...why? It's not about c++. – Allwein 2/9, 2010 at 2:10

@chubsdad My initial spamming served it's purpose. John Kugelman beat me to editing it by 4 minutes ;) – Phase 2/9, 2010 at 2:14

A

5

Let A be your Turing-complete language which is supposed to be statically typed and let A' be the language you get from A when you remove the type checking (but not the type annotations because they also serve other purposes). The accepted programs of A will be a subset of the accepted programs of A'. So in particular, A' will also be Turing-complete.

Given your translator P from B to A (and vice versa). What is it supposed to do? It could do one of two things:

Firstly, it could translate every program y of B to a program of A. In this case, LPy would always return True as programs of A are by definition correctly typed.
Secondly, P could translate every program y of B to a program of A'. In this case, LPy would return True if Py happens to be a program of A and False if not.

As the first case doesn't yield anything interesting, let us stick to the second case, which is probably what you mean. Does the function LP defined on programs of B tell us anything interesting about programs of B? I say no, because it is not invariant under a change of P. As A is Turing-complete, even in the second case P could be chosen so that its image happens to lie in A. Then LP would be constantly True. On the other hand, P could be chosen so that some programs are mapped to the complement of A in A'. In this case LP would spit out False for some (possibly all) programs of B. As you can see, you don't get anything which only depends on y.

I can also put it more mathematically in the following way: There is a category C of programming languages whose objects are the programming languages and whose morphisms are translators from one programming language to another one. In particular if there is a morphism P: X -> Y, Y is at least as expressive as X. Between each pair of Turing-complete languages there are morphisms in both directions. For each object X of C (i.e. for each programming language) we have an associated set, say {X} (bad notation, I know) of those partially defined functions that can be computed by programs of X. Each morphism P: X -> Y then induces an inclusion {X} -> {Y} of sets. Let us formally invert all those morphisms P: X -> Y that induce the identity {X} -> {Y}. I will call the resulting category (which is, in mathematical terms, a localization of C) by C'. Now the inclusion A -> A' is a morphism in C'. However, it is not preserved under automorphisms of A', that is the morphism A -> A' is not an invariant of A' in C'. In other words: from this abstract point of view the attribute "statically typed" is not definable and can be arbitrarily attached to a language.

To make my point clearer you can also think of C' as the category of, say, geometrical shapes in three-dimensional space together with the Euclidean motions as morphisms. A' and B are then two geometrical shapes and P and Q are Euclidean motions bringing B to A' and vice versa. For example, A' and B could be two spheres. Now let us fix a point on A', which shall stand for the subset A of A'. Let us call this point "statically typed". We want to know whether a point of B is statically typed. So we take such a point y, map it via P to A' and test, whether it is our marked point on A'. As one can easily see, this depends on the chosen map P or, to put in other words: A marked point on a sphere is not preserved by automorphisms (that are Euclidean motions that map the sphere onto itself) of that sphere.

Anallese answered 6/10, 2010 at 9:18 Comment(7)

First off, thanks for fixing my mistake with mapping the result of the typecheck back to B! My intuition was change of basis and that threw me off a little. You category formulation is much more compelling (and more likely to induce clear thought about that matter). I don't see why you need to construct the localization though as (AFAICT) it doesn't add any new automorphisms. That is, A -> A' (which is already present in C) is or is not an invariant of A' in C', then such is already the case in C. – Phase 6/10, 2010 at 12:55

@Aaron: Change of basis is, of course, the right intuition! A and B are two bases and L is something like the first coordinate with respect to A. As to why localization seems to be necessary: Given a translator P: A -> B between two Turing-complete languages (so that P is an isomorphism in C'), how do you want to construct an inverse in C? P is allowed to map equivalent programs to the same program in B! – Anallese 6/10, 2010 at 13:22

@Marc, Got distracted for a minute there and forgot about this. Then somebody else answered and I got reminded. I understand why the localization is necessary to have the inverse. I guess I meant that it seems unnecessary to go even that far sincec demonstrating that A->A' isn't even an invariant (let alone A -> B) shuts down my idea. – Phase 29/10, 2010 at 7:44

@Marc: Why do you consider A and A' to be different languages? A formal language is nothing more than a particular set of strings (expressions). Likewise, a well-typed formal language is nothing more than a set of strings which satisfy particular constraints. Whether or not one bothers to verify that the constraints have been satisfied for a given expression (i.e. performs type-checking) is irrelevant to the definition of the language--if type-checking were to be performed, and the expression would fail the check, then the expression is simply not in the language. As such, A = A'. – Knocker 29/10, 2010 at 20:7

Oops. Above, I meant to say "a type-safe formal language" rather than "a well-typed formal language". – Knocker 29/10, 2010 at 20:52

The fact that "statically typed" isn't definable for an abstract language either misses the point: static typing is a property of the concrete representation, or, more generously, it shows this point :) – Rotifer 16/12, 2010 at 3:6

@user359996: By "removing the type checking from language A" I mean that more expressions are part of language A' than of language A. Thus A' is a strict superset of A. I have not specified what happens if a program of A' is run that would fail the type-checking of A (i.e. a program in A'\A) but one may assume that it simply stops. – Anallese 29/12, 2010 at 20:3

B

7

If you can translate every B' (a program written in B) into an equivalent A' (which is correct if B' is), then language B enjoys just as much "type-safety" as language A (in a theoretical sense, of course;-) -- basically this would mean that B is such that you can do perfect type inferencing. But that's extremely limited for a dynamic language -- e.g., consider:

if userinput() = 'bah':
    thefun(23)
else:
    thefun('gotcha')

where thefun (let's assume) is typesafe for int argument, but not for str argument. Now -- how do you translate this to language A in the first place...?

Bedight answered 2/9, 2010 at 2:0 Comment(9)

How to translate doesn't really concern me. Once it was translated (as guaranteed by church-turing), it would not be typesafe in A and hence not typesafe in Python. – Phase 2/9, 2010 at 2:13

If language A (C) is what the B (Python) interpreter is written in, then it's easy: Write a program in A that includes the B program as a string literal, and pass it to the interpreter. It won't be a very idiomatic or very efficient A program, but it counts. – Spermatic 2/9, 2010 at 2:29

"which is correct", AKA typesafe, "iff B' is": that's the part of my A these comments' authors seem to not have read. Nothing in church-turing guarantees that translation "preserves type-safety" because there's no "type safety" concept there -- so the translators could insert arbitrary "casts" (and the interpreter will have a lot of casts), or the interpreter or translator could map every Python type down to PyObject* so that typesafety in A is guaranteed but doesn't mean jackblip about the typesafety of the original B program, and so forth. – Bedight 2/9, 2010 at 2:46

@Alex Martelli Type safety as I have seen it defined seems to be implicit in church turing because they both relate to provable behavior. If I can prove that A' is or is not typesafe, then I can prove that B' is or is not typesafe because they will both exhibit identical behavior. – Phase 2/9, 2010 at 2:58

+1 for mentioning type inferencing. This is exactly what type reconstruction was originally researched to solve... if typed lambda calculi save us from certain errors, but the untyped lambda calculus is simpler to express, is there a way we can translate the untyped version (language B) into a typed equivalent (language A)? And, @AaronMcSmooth, it has been proven that this transformation becomes undecidable as the complexity of the type system increases beyond a certain point. In a type system powerful enough to express a full specification of all program behavior, it's utterly hopeless. – Equiponderate 6/10, 2010 at 9:29

@pelotom Excellent point. Moreover, as explained in my answer, unlike the untyped lambda calculus, where the reduction semantics of any syntactically well-formed expression are always well-defined at execution time, this need not be true of dynamically-typed languages, in the general case. BTW, do you have a reference for the type reconstruction proof to share with us? Sounds like an interesting read. – Knocker 28/10, 2010 at 6:18

@user359996: I believe this result shows undecidability of type reconstruction for full System F: macs.hw.ac.uk/~jbw/papers/#Wells:APAL-1999-v98. It seems to be common knowledge that type reconstruction for System F restricted to rank-2 polymorphism is decidable, while rank-3 and above becomes undecidable, but I can't seem to find where that was proven. – Equiponderate 28/10, 2010 at 7:24

@Alex Martelli: You can simplify this--We can rule out any B such that every B' can be translated to an A', because this would contradict the assertion that B is not type-safe. – Knocker 29/10, 2010 at 7:2

@aaronasterling: Alex Martelli's question was rhetorical. When he asked how? he meant it is impossible. Church-Turing is about computable functions, not the evaluation of meaningless expressions. A non-type-safe language allows one to construct these latter, whereas a type-safe language does not. – Knocker 31/10, 2010 at 20:47

A

5

Let A be your Turing-complete language which is supposed to be statically typed and let A' be the language you get from A when you remove the type checking (but not the type annotations because they also serve other purposes). The accepted programs of A will be a subset of the accepted programs of A'. So in particular, A' will also be Turing-complete.

Given your translator P from B to A (and vice versa). What is it supposed to do? It could do one of two things:

Firstly, it could translate every program y of B to a program of A. In this case, LPy would always return True as programs of A are by definition correctly typed.
Secondly, P could translate every program y of B to a program of A'. In this case, LPy would return True if Py happens to be a program of A and False if not.

As the first case doesn't yield anything interesting, let us stick to the second case, which is probably what you mean. Does the function LP defined on programs of B tell us anything interesting about programs of B? I say no, because it is not invariant under a change of P. As A is Turing-complete, even in the second case P could be chosen so that its image happens to lie in A. Then LP would be constantly True. On the other hand, P could be chosen so that some programs are mapped to the complement of A in A'. In this case LP would spit out False for some (possibly all) programs of B. As you can see, you don't get anything which only depends on y.

I can also put it more mathematically in the following way: There is a category C of programming languages whose objects are the programming languages and whose morphisms are translators from one programming language to another one. In particular if there is a morphism P: X -> Y, Y is at least as expressive as X. Between each pair of Turing-complete languages there are morphisms in both directions. For each object X of C (i.e. for each programming language) we have an associated set, say {X} (bad notation, I know) of those partially defined functions that can be computed by programs of X. Each morphism P: X -> Y then induces an inclusion {X} -> {Y} of sets. Let us formally invert all those morphisms P: X -> Y that induce the identity {X} -> {Y}. I will call the resulting category (which is, in mathematical terms, a localization of C) by C'. Now the inclusion A -> A' is a morphism in C'. However, it is not preserved under automorphisms of A', that is the morphism A -> A' is not an invariant of A' in C'. In other words: from this abstract point of view the attribute "statically typed" is not definable and can be arbitrarily attached to a language.

To make my point clearer you can also think of C' as the category of, say, geometrical shapes in three-dimensional space together with the Euclidean motions as morphisms. A' and B are then two geometrical shapes and P and Q are Euclidean motions bringing B to A' and vice versa. For example, A' and B could be two spheres. Now let us fix a point on A', which shall stand for the subset A of A'. Let us call this point "statically typed". We want to know whether a point of B is statically typed. So we take such a point y, map it via P to A' and test, whether it is our marked point on A'. As one can easily see, this depends on the chosen map P or, to put in other words: A marked point on a sphere is not preserved by automorphisms (that are Euclidean motions that map the sphere onto itself) of that sphere.

Anallese answered 6/10, 2010 at 9:18 Comment(7)

First off, thanks for fixing my mistake with mapping the result of the typecheck back to B! My intuition was change of basis and that threw me off a little. You category formulation is much more compelling (and more likely to induce clear thought about that matter). I don't see why you need to construct the localization though as (AFAICT) it doesn't add any new automorphisms. That is, A -> A' (which is already present in C) is or is not an invariant of A' in C', then such is already the case in C. – Phase 6/10, 2010 at 12:55

@Aaron: Change of basis is, of course, the right intuition! A and B are two bases and L is something like the first coordinate with respect to A. As to why localization seems to be necessary: Given a translator P: A -> B between two Turing-complete languages (so that P is an isomorphism in C'), how do you want to construct an inverse in C? P is allowed to map equivalent programs to the same program in B! – Anallese 6/10, 2010 at 13:22

@Marc, Got distracted for a minute there and forgot about this. Then somebody else answered and I got reminded. I understand why the localization is necessary to have the inverse. I guess I meant that it seems unnecessary to go even that far sincec demonstrating that A->A' isn't even an invariant (let alone A -> B) shuts down my idea. – Phase 29/10, 2010 at 7:44

@Marc: Why do you consider A and A' to be different languages? A formal language is nothing more than a particular set of strings (expressions). Likewise, a well-typed formal language is nothing more than a set of strings which satisfy particular constraints. Whether or not one bothers to verify that the constraints have been satisfied for a given expression (i.e. performs type-checking) is irrelevant to the definition of the language--if type-checking were to be performed, and the expression would fail the check, then the expression is simply not in the language. As such, A = A'. – Knocker 29/10, 2010 at 20:7

Oops. Above, I meant to say "a type-safe formal language" rather than "a well-typed formal language". – Knocker 29/10, 2010 at 20:52

The fact that "statically typed" isn't definable for an abstract language either misses the point: static typing is a property of the concrete representation, or, more generously, it shows this point :) – Rotifer 16/12, 2010 at 3:6

@user359996: By "removing the type checking from language A" I mean that more expressions are part of language A' than of language A. Thus A' is a strict superset of A. I have not specified what happens if a program of A' is run that would fail the type-checking of A (i.e. a program in A'\A) but one may assume that it simply stops. – Anallese 29/12, 2010 at 20:3

C

1

Another way to make the same point as has been made is that your question constitutes a proof by contradiction that either:

A cannot be mapped to B
type safety is not a lexical property of a language

or both. My intuition says that the latter is probably the sticking point: that type-safety is a meta-linguistic property.

Colbycolbye answered 2/9, 2010 at 2:14 Comment(3)

This is my feeling as well though I'm going to have to look more deeply into precisely what a meta-linguistic property is. If I'm not mistaken, any Turing complete language can be mapped to any other. – Phase 2/9, 2010 at 2:24

A cannot be mapped to B in a way that guarantees to preserve type safety -- e.g. the Python C interpreter uses PyObject* everywhere so it is typesafe whether the Python code it's interpreting is, or not -- the translation doesn't preserve that elusive quality (which is not part of the theorem @aaron recalls, or of anything else Turing ever wrote in his short life AFAIK). – Bedight 2/9, 2010 at 2:48

I think you have A and B backwards. – Knocker 28/10, 2010 at 5:29

K

1

There's nothing "fishy" about it. ;)

The set of Turing-complete languages which are type-safe with respect to any nontrivial [1] type system T is a strict subset of the Turing-complete languages. As such, in the general case, no translator P^-1 from B to A exists; therefore, neither does any translator-cum-type-checker LP^-1.

A knee-jerk reaction to this sort of claim might be: Nonsense! Both A and B are Turing-complete, so I can express any computable function in either language! And, indeed, this is correct--you can express any computable function in either language; however, quite often, you can also express quite a bit more. In particular you can construct expressions whose denotational semantics are not well-defined, such as those which might happily try to take the arithmetic sum of the character strings "foo" and "bar" (this is the gist of ~~Chubsdad~~ Alex Martelli's answer). These sorts of expressions may be "in" the language B, but may simply not be expressible in the language A, because the denotational semantics are undefined, thus there is no sensible way to translate them.

This is one of the great strengths of static typing: If your type system is unable to prove, at compile time, that the aforementioned function will receive any parameters but those for which the outcome of the arithmetic addition operator is well-defined, it can be rejected as ill-typed.

Note that while the above is the usual sort of example given to explain the merits of a static type system, it is perhaps too modest. In general, a static type system need not be limited to merely enforcing type-correctness of parameters, but indeed can express any desired property of a program which can be proven at compile time. For example, it is possible to construct type systems which enforce the constraint that one release a filesystem handle (e.g. to a database, file, network socket, etc.) within the same scope in which it was acquired. Obviously, this is tremendously valuable in such domains as life-support systems, among others, where provable correctness of as many parameters of the system as possible is absolutely essential. If you satisfy the static type system, you can get these sorts of proofs for free.

[1] By nontrivial, I mean such that not all possible expressions are well-typed.

Knocker answered 28/10, 2010 at 5:26 Comment(0)

H

0

My understanding is that this has to do with compile-time vs. run-time. In a statically typed language the majority of type checking is performed during compile-time. In a dynamically typed language, the majority of its type checking is performed at run-time.

Hornbeck answered 2/9, 2010 at 2:1 Comment(0)

M

-1

Let me answer this the other way round:

There are two different types of "dynamic" programming.

One is "dynamically typed", which means you have some sort of a shell where you can program by typing definitions into that shell, think of it like Python's IDLE shell.

The other type of dynamic programming, is a more theoretical one. A dynamic program, is one that can change its own source. It needs some level of introjection, and is often used to change program memory on microcontrollers. Sometimes generating lookup tables for number crunching is called dynamic programming.

Marielamariele answered 2/9, 2010 at 2:0 Comment(2)

-1 No. 1) Dynamic programming solely refers to a specific style of solving problems, namely by breaking them into subproblems and exploiting redundant information in the intermediate results in a particular way. 2) Dynamically typed has nothing to do with a shell--it just means (at least some) type information is determined at runtime, "dynamically". 3) A program that changes its own source (more correctly, its own bytecode or machine code) is called reflective, may use introspection (note the "p") and is an example of what one calls metaprogramming. – Knocker 28/10, 2010 at 5:40

Also, the mechanism whereby one can "program by typing definitions into [a] shell" is called a read-eval-print loop, often abbreviated REPL. – Knocker 28/10, 2010 at 6:5

Recommended topics

Hot tags