Why interpreted langs are mostly ducktyped while compiled have strong typing?
Asked Answered
T

9

46

I just don't know that, is there any technical reason for that? Is it more difficult to implement a compiler for a language with weak typing? What is it?

Tractor answered 18/12, 2008 at 1:19 Comment(2)
It seems you're convoluting some terms. Python, for example, is a dynamically-typed language with ducktyping, yet it is strongly typed; C has static types but is weakly typed.Spoilt
Yes, I guess I am then. Thanks for the tip, I I'll go google some definitions and clarify these terms.Tractor
R
212

The premises behind the question are a bit dodgy. It is not true that interpreted languages are mostly ducktyped. It is not true that compiled languages mostly have strong typing. The type system is a property of a language. Compiled versus interpreted is a property of an implementation.

Examples:

  • The programming language Scheme is dynamically typed (aka duck-typed), and it has many dozens of interpreted implementations, but also some fine native-code compilers including Larceny, Gambit, and PLT Scheme (which includes both an interpreter and a JIT compiler making seamless transitions).

  • The programming language Haskell is statically typed; the two most famous implementations are the interpreter HUGS and the compiler GHC. There are several other honorable implementations split about evenly between compiling to native code (yhc) and interpretation (Helium).

  • The programming language Standard ML is statically typed, and it has had many native-code compilers, of which one of the best and most actively maintained is MLton, but one of the most useful implementations is the interpreter Moscow ML

  • The programming language Objective Caml is statically typed. It comes with only one implementation (from INRIA in France) but this implementation includes both an interpreter and a native-code compiler.

  • The programming language Pascal is statically typed, but it became popular in the 1970s because of the excellent implementation built at UCSD, which was based on a P-code interpreter. In later years fine native-code compilers became available, such as the IBM Pascal/VS compiler for the 370 series of computers.

  • The programming language C is statically typed, and today almost all implementations are compiled, but in the 1980s those of us lucky enough to be using Saber C were using an interpreter.

Nevertheless there is some truth behind your question, so you deserve a more thoughtful answer. The truth is that dynamically typed languages do seem to be correlated with interpreted implementations. Why might that be?

  • Many new languages are defined by an implementation. It is easier to build an interpreter than to build a compiler. It is easier to check types dynamically than to check them statically. And if you are writing an interpreter, there is little performance benefit to static type-checking.

  • Unless you are creating or adapting a very flexible polymorphic type system, a static type system is likely to get in the programmer's way. But if you are writing an interpreter, one reason may be to create a small, lightweight implementation that stays out of the programmer's way.

  • In some interpreted languages, many fundamental operations are so expensive that the additional overhead of checking types at run time doesn't matter. A good example is PostScript: if you're going to run off and rasterize Bezier curves at the drop of a hat, you won't balk at checking a type tag here or there.

Incidentally, please be wary of the terms "strong" and "weak" typing, because they don't have a universally agreed technical meaning. By contrast, static typing means that programs are checked before being executed, and a program might be rejected before it starts. Dynamic typing means that the types of values are checked during execution, and a poorly typed operation might cause the program to halt or otherwise signal an error at run time. A primary reason for static typing is to rule out programs that might have such "dynamic type errors". (This is another reason people who write interpreters are often less interested in static typing; execution happens immediately after type checking, so the distinction and the nature of the guarantee aren't as obvious.)

Strong typing generally means that there are no loopholes in the type system, whereas weak typing means the type system can be subverted (invalidating any guarantees). The terms are often used incorrectly to mean static and dynamic typing. To see the difference, think of C: the language is type-checked at compile time (static typing), but there are plenty of loopholes; you can pretty much cast a value of any type to another type of the same size---in particular, you can cast pointer types freely. Pascal was a language that was intended to be strongly typed but famously had an unforeseen loophole: a variant record with no tag.

Implementations of strongly typed languages often acquire loopholes over time, usually so that part of the run-time system can be implemented in the high-level language. For example, Objective Caml has a function called Obj.magic which has the run-time effect of simply returning its argument, but at compile time it converts a value of any type to one of any other type. My favorite example is Modula-3, whose designers called their type-casting construct LOOPHOLE.

In summary:

  • Static vs dynamic is the language.

  • Compiled vs interpreted is the implementation.

  • In principle the two choices can be and are made orthogonally, but for sound technical reasons dynamic typing frequently correlates with interpretation.

Reprehend answered 18/12, 2008 at 3:23 Comment(9)
Very informative, thank you. But there's still one important question left: while I understood the pitfalls of implementing interpreter for strong typed languages, you said nothing about compiling dynamic typed languages.Tractor
I'm not an expert. There's a huge body of literature on compiling Scheme and Lisp to native machine code. Post a question and I'll try to answer :-)Reprehend
There is a follow-up over here: https://mcmap.net/q/66396/-is-c-strongly-typed#430414Geniality
I hope you don't mind my asking. If Dynamic typing means that the types of values are checked during execution, then how can any interpreted language be statically typed?Caveat
@Venemo: You run a compiler that generates bytecodes, which are then interpreted. Examples include Objective Caml, Moscow ML, Saber C (of fond memory), UCSD Pascal (of ancient lore), and many others.Reprehend
@Norman, so following that logic, every interpreted language that is not compiled to bytecode is dynamically typed, right?Caveat
@DrewHall apparently with the "new" bounty system (it's not that new), if you really want to give me +100, you can :-) blog.stackoverflow.com/2010/06/improvements-to-bounty-systemReprehend
Wirth had a Pascal compiler for the CDC 6000 mid 1970. On Micro computers Pascal interpreters came before compilers due to UCSDs very popular implementation, but on mainframes and mini's compiled Pascal was a staple. Even Unix (BSD) had one, Berkeley Pascal.Gluttonous
dynamic typing != duck typing. Scheme isn't duck typedBirchard
C
10

The reason that you do early binding (strong typing) is performance. With early binding, you find the location of the method at compile time, so that at run time it already knows where it lives.

However, with late binding, you have to go searching for a method that seems like the method that the client code called. And of course, with many, many method calls in a program, that's what makes dynamic languages 'slow'.

But sure, you could create a statically compiled language that does late binding, which would negate many of the advantages of static compilation.

Coz answered 18/12, 2008 at 1:32 Comment(4)
So what you're saying, compiled language with late binding will make the program slow at a runtime? Now would it be slower than interpreter then?Tractor
Travis is oversimplifying. C++ virtual member functions have late binding but a compiler finds lots of useful work to do. The Self language has really late binding, but thanks to Urs Hoelzle's brilliant dynamic compiler, it gets great performance. No wonder Urs is now a big shot at Google :-)Reprehend
Yer, it was a bit of an over simplification, and yes, there's lots you can do to speed up late bound method, e.g. lookup caching etc (hence the single quotes around 'slow').Coz
Early/late binding is unrelevant to strong/weak typing.Donica
C
6

It's pretty much because people who write and use interpreted languages tend to prefer ducktyping, and people who develop and use compiled languages prefer strong explicit typing. (I think the concensus reason for this would be somewhere in the area of 90% for error prevention, and 10% for performance.) For most programs written today, the speed difference would be insignificant. Microsoft Word has run on p-code (uncompiled) for - what - 15 years now?

The best case in point I can think of. Classical Visual Basic (VB6/VBA/etc.) The same program could be written in VB and run with identical results and comparable speed either compiled or interpreted. FUrthermore, you have the option of type declarations (in fact variable declarations) or not. Most people preferred type declarations, usually for error prevention. I've never heard or read anywhere to use type declarations for speed. And this goes back at least as far as two powers of magnitude in hardware speed and capacity.

Google is getting a lot of attention lately because of their work on a JIT compiler for javascript - which will require no changes to the language, or require any extra consideration on the part of the programmer. In this case, the only benefit will be speed.

Corncob answered 18/12, 2008 at 2:21 Comment(0)
M
5

Because compiled languages need to take the amount of memory used in account when they are compiled.

When you see something like:

int a

in C++, the compiler spits out code that reserves four bytes of memory and then assigns the local symbol "a" to point to that memory. If you had a typeless scripting language like javascript, the interpreter, behind the scenes, allocates the memory required. You can do:

var a = 10;  // a is probably a four byte int here
a = "hello world"; // now a is a 12 byte char array

There is a lot that happens between those two lines. The interpreter deletes the memory at a, allocates the new buffer for the chars, then assigns the a var to point to that new memory. In a strongly typed language, there is no interpreter that manages that for you and thus the compiler must write instructions that take into account type.

int a = 10; // we now have four bytes on the stack.
a = "hello world"; // wtf? we cant push 12 bytes into a four byte variable! Throw an error!

So the compilers stops that code from compiling so the CPU dosn't blindly write 12 bytes into a four byte buffer and cause misery.

The added overhead for a compiler writing extra instructions to take care of type would slow down the language significantly and remove the benefit of languages like C++.

:)

-nelson

EDIT in response to comment

I don't know much about Python, so I can't say much about that. But loosely typedness slows down runtime considerably. Each instruction that the interpreter (VM) calls has to evaulate, and if necessary, coerce the var into the expected type. If you have:

mov a, 10
mov b, "34"
div a, b

Then the interpreter has to make sure that a is a variable and a number, then it would have to coerce b into a number before processing the instruction. Add that overhead for every instruction that the VM executes and you have a mess on your hands :)

Motherless answered 18/12, 2008 at 1:31 Comment(5)
There's no reason why that sort of memory management couldn't be done in an VM like Java or .NET, but it's not because there are other advantages of strong typing.Thirzia
They don't because of performance. A better question is "why would you want a loosely typed language?" Call me old fashion or an elitist but I don't feel comfortable that my variables' types are not enforced by the interpreter.Motherless
Oh, and the CLR isn't really a VM because it JITs the IL into machine language before it runs.Motherless
I've written a lot of perl, and a lot of C/C++/Java. They both have their uses. I wouldn't want to write a payroll system in a loosely typed language, but I also don't want to have to define interfaces and classes for my file parser.Thirzia
Could you please clarify that: what means 'slow down the language'? Is it slowing down at compiling time or at runtime? And, as far as I know, Python has a compiler. What's wrong with it then?Tractor
D
3

There are basically two reasons to use static typing over duck typing:

  1. Static error checking.
  2. Performance

If you have an interpreted language, then there's no compile time for static error checking to take place. There goes one advantage. Furthermore, if you already have the overhead of the interpreter, then the language is already not going to be used for anything performance critical, so the performance argument becomes irrelevant. This explains why statically typed interpreted languages are rare.

Going the other way, duck typing can be emulated to a large degree in statically typed languages, without totally giving up the benefits of static typing. This can be done via any of the following:

  1. Templates. In this case, if the type you instantiate your template with supports all the methods called from within the template, your code compiles and works. Otherwise it gives a compile time error. This is sort of like compile-time duck typing.
  2. Reflection. You try to invoke a method by name, and it either works or throws an exception.
  3. Tagged unions. These are basically container classes for other types that contain some memory space and a field describing the type currently contained. These are used for things like algebraic types. When a method is invoked, it either works or throws, depending on whether the type currently contained supports it.

This explains why there are few dynamically typed, compiled languages.

Dorree answered 18/12, 2008 at 2:36 Comment(1)
"If you have an interpreted language, then there's no compile time for static error checking to take place." This incorrect. You can check types at parse time.Lanate
L
2

I'm guessing that languages with dynamic (duck) typing employ lazy evaluation, which is favored by lazy programmers, and lazy programmers don't like to write compilers ;-)

Lavettelavigne answered 18/12, 2008 at 1:25 Comment(11)
drive-by downvoters have no sense of humor. fortunately, i do.Lavettelavigne
When you've got 10k (or nearly that, in my case) reputation, it's not too painful to sacrifice one or two for a joke, is it?Thirzia
A programmers best attribute is that he/she is lazy!Lasandralasater
I appreciate the joke, but how do you handle that this gets higher than a real answer? Maybe as I do: keep reading all non-negative reputation answers :)Giarla
I handle it by laughing. One funny answer - with a grain of truth to it! - does not prevent non-funny answers from appearing and/or being voted up.Lavettelavigne
Heh, it's a nice play on words, but what's the grain of truth? Most languages with dynamic (duck) typing don't do lazy evaluation (by default), and lazy evaluation is most famous in Haskell which has a static type system and an excellent compiler. :)Phenoxide
@ShreevatasaR: i'm confused, i don't think it is possible to do duck typing without lazy evaluation (late binding/dynamic binding)...Lavettelavigne
Duck typing is associated with late binding, but lazy evaluation is something quite different: en.wikipedia.org/wiki/Lazy_evaluation is.gd/bZbT etc, think Unix pipes: is.gd/dAtP (and of course, the awesome "Why FP" paper is.gd/dAtY which is really about lazy evaluation)Phenoxide
@ShreevatasaR: not to argue about a joke post, but "call-by-name and lazy evaluation are each a form of late binding" -- academic.udayton.edu/SaverioPerugini/courses/cps343/… ;-)Lavettelavigne
That's interesting, thanks. :) Hadn't thought of lazy evaluation (or call-by-name) as a form of "late binding". (But in any case the converse is not true; not all late binding is lazy evaluation, and most dynamically typed languages do not have lazy evaluation...) It's all just words, in the end. :)Phenoxide
IMO lazy evaluation has little to do with late binding: lazy evaluation means that you reduce an expression when its value is required, late binding refers to when you bind a name to an expression. In Haskell, e.g., if you write x = 12 + y, x is immediately bound to a thunk, but the thunk is evaluated lazily (later). At least this is my intuition of how lazy evaluation works in Haskell.Lombardy
J
2

Languages with weak typing can be compiled, for example, Perl5 and most versions of Lisp are compiled languages. However, the performance benefits of compiling are often lost because much of the work that the language runtime has to perform is to do with determining what type a dynamic variable really has at a particular time.

Take for example the following code in Perl:

$x=1;
$x="hello";
print $x;

It is obviously pretty difficult for the compiler to determine what type $x really has at a given point in time. At the time of the print statement, work needs to be done to figure that out. In a statically typed language, the type is fully known so performance at runtime can be increased.

Jangro answered 18/12, 2008 at 1:33 Comment(2)
Compilers are perfectly capable of providing code to deal with whatever types show up at execution time. Visual Basic 6 is a very popular compilable language that requires no type declarations - but lets you use them if you like.Corncob
VB6 variables that aren't specified are implicitly typed as variants. These are roughly analogous to a scalar type in Perl and needs to have the same kind of behind the scenes carry on that all the other dynamically typed languages haveJangro
T
0

A guess:

In a compiled language, one system (the compiler) gets to see all the code required to do strong typing. Interpreters generally only see a tiny bit of the program at a time, and so can't do that sort of cross checking.

But this isn't a hard and fast rule - it would be quite possible to make a strongly typed interpreted language, but that would go against the sort of "loose" general feel of interpreted languages.

Thirzia answered 18/12, 2008 at 1:30 Comment(6)
The question is why would you do that? So that people writing C++ dont have to understand less about what their code actually dose to the computer and write a bunch of inefficient code?Motherless
Because some programmers like the freedom of late binding. Others like the comfort of the compiler enforcing some discipline. I like both in their place.Thirzia
Even in loosely types languages it is commonly regarded as "bad practice" to use the same var for multiple types at runtime - or write a function/method that returns variables of different types. So why is it a good thing?Motherless
One thing I like about perl is that I can parse a string from a file, and if it looks like a number, use it as a number without conversion. Much quicker to write than Integer.parseInt/catch NumberFormatException.Thirzia
All great until you get a badly formed file and your program happily uses zero everywhere.Darter
@Zan, That's why I use perl for some things, and Java for others.Thirzia
D
0

Some languages are meant to run perfect in non-exceptional conditions and that's sacrificed for horrible performance they run into during exceptional conditions, hence very strong typing. Others were just meant to balance it with additional processing.

At times, there's way more in play than just typing. Take ActionScript for instance. 3.0 introduced stronger typing, but then again ECMAScript enables you to modify classes as you see fit at runtime and ActionScript has support for dynamic classes. Very neat, but the fact they're stating that dynamic classes should not be used in "standard" builds means it's a no-no for when you need to play it safe.

Dahliadahlstrom answered 18/12, 2008 at 2:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.