Choosing an intermediate language
Asked Answered
S

4

7

I am currently playing around with programming languages. I have spent some time writing parsers and interpreters in high level languages (most notably Haxe).

I have had some results, that I think are actually quite nice, but now I'd like to make them fast.

My idea was to translate the input language into C.
My C knowledge is limitted to what you learn at university. Beyond some exercises, I have never written actual C programs. But I feel confident I can make it work.

Of course I could try to write a frontend for the LLVM or to generate MSIL or JVM bytecode. But I feel that's too much to learn right now, and I don't see much of a gain actually.
Also C is perfectly human readable, so if I screw up, it's much easier to understand why. And C is, after all, high level. I can really translate concepts from the input language without too much mind-bending. I should be having something working up and running in a reasonable amount of time and then optimize it as I see fit.

So: Are there any downsides to using C? Can you recommend an alternative?
Thank you for your insight :)


Edit: Some Clarification

  • The reason why I want to go all the way down is, that I am writing a language with OOP support and I want to actually implement my method dispatching by hand, because I have something very specific in mind.
  • A primary area of use would be writing HTTP services, but I could image adding bindings to a GUI library (wxWidgets maybe) or whatever.
Skinflint answered 14/6, 2011 at 16:26 Comment(2)
Most compilers that I can think of that take an intermediate step before native go to C, so, yeah, I think C is a good choice, especially since that automatically gives you a great deal of portability. If your language is object-oriented, you might have a better time translating to C++ or Objective-C; likewise, if it's functional you might have a better time translating to Haskell.Trite
@Rafe Kettler: post it as an answer, why limit to a comment? :)Uglify
Y
6

C is a good and quite popular choice for what you're trying to do.

Still, take a look at LLVM's intermediate language (IR). It's pretty readable and I think it's cleaner and easier to generate and parse than C. LLVM comes with quite a big collection of tools to work with it. You can generate native code for variety of platforms (as with C but with slightly more control over output) or for virtual machines. Possibility of JIT compilation is also a plus.

See The Architecture of Open Source Applications, Chapter 11 for introduction to LLVM approach and some snippets of IR.


What is your target environment? This might help us give you better answer.

Yetta answered 14/6, 2011 at 18:4 Comment(1)
@rubenvb: Yes, it was. That's why it's worth taking look at it even if you won't use it in the end. And you might not be able to use it directly (e.g. LLVM is C++, so it might be hard to use it in some environments). Or it might inspire you to do something exactly opposite to your initial plan: use LLVM frontend for some existing (and popular) language and provide your own customized backend or VM for your application.Yetta
F
2

C is actually a pretty good choice for a target language for a little or experimental compiler -- its widely available on many platforms, so your compiler becomes immediately useful in many environments. The main drawback is dealing with things that are not well supported in C, or are not well defined in the C spec. For example, if you want to do dynamic code generation (JIT compilation), C is problematic. Things like stack unwinding and reflection are tricky to do in C (though setjmp/longjmp and careful use of structs for which you generate layout descriptions can do a lot). Things like word sizes, big or little-endian layout, and arithmetic precision vary between C compilers, so you have to be aware of that, but those are things you need to deal with if you want to support multiple target machines anyways.

Other languages can be used as well -- the main advantage of C is its ubiquity.

Ferryman answered 14/6, 2011 at 17:54 Comment(1)
Big/little endian will be an issue no matter language. Of course, languages that are too high level to even do things on a byte level won't experience any endian issues, because they don't even support them.Selfmade
T
2

You might consider C--, a C-like language intended to be a better target for code generation than C.

Tor answered 14/6, 2011 at 21:49 Comment(0)
G
0

C is a good choice, IMHO. Unlike many languages, C is generally considered "elegant" in that you have only 32 keywords, and very basic constructs (sequence, selection, iteration), with a very simple-and-consistent collection of tokens and operators.

Because syntax is very consistent within C (brackets and braces, blocks and statements, use of expressions), you're not marching into an unbounded world of language expansion. C is a mature language, has weathered time nicely, and now-a-days is a "known quantity" (which is really hard to say about many other languages, even "mature" ones).

Goble answered 15/6, 2011 at 17:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.