How to approach creating a JVM programming language?
Asked Answered
A

9

101

I have created a compiler in C (using Lex & Bison) for a dynamic typed programming language that supports loops, functions declarations inside functions, recursive calls, etc. I also created a virtual machine that runs the intermediate code created by the compiler.

I was thinking about compiling it to Java bytecode instead of my own intermediate code.

I saw that the question about creating a JVM language has already been asked, but I don’t find the answer very informative.

So here are my questions:

  1. I guess to create a language for JVM a must is to read the JVM specification book, what other books can you suggest (except Dragon Book of course)? I’m mostly concerned about books or tutorials on how to create a JVM language, not a compiler in general.
  2. There are many Java libraries to read, write and change .class files like jclasslib, bcel, gnu bytecode, etc. Which one would you suggest? Also, are you aware of C libraries that do the same job?
  3. I was thinking about having a look at maybe another language that targets the JVM like Clojure, Jython or JRuby. But all these languages are very high level and complicated (to create a compiler for them). I was looking for a simpler (I don't mind if it's unknown or unused) programming language that targets the JVM and with an open-source compiler. Any ideas?
Abra answered 1/8, 2010 at 2:28 Comment(0)
B
68

I would also recommend ASM, but have a look at Jasmin, I used it (or, rather, had to use it) for a university project, and it worked quite well. I wrote a lexer-parser-analyzer-optimizer-generator combination for a programing language using Java and Jasmin, so it was generating JVM Code. I uploaded the code here; the interesting part should be the source code itself. In the folder bytecode/InsanelyFastByteCodeCreator.java, you find a piece of code which transforms an AST Tree into the input format of Jasmin assembler. It is quite straightforward.

The source language (which was transformed to the AST by the lexer-parser-analyzer) is a subset of Java called MiniJava. It lacks some “complicated” features like inheritance, constructors, static methods, private fields and methods. None of those features are difficult to implement, but there was another task to write an x86 backend (so to generate machine assembler), and those things tend to get difficult if you got no JVM which handles some of those things.

In case you wonder about the strange class name: The task of the university project was to transform the AST into an SSA Graph (representing the input code), optimize the graph, and then turn it into Java bytecode. That was about ¾ of the work of the project and the InsanlyFastByteCodeCreator was just a short-cut to test everything.

Have a look at the “Java Virtual Machine” book from Jon Meyer and Troy Downing. This book heavily references the Jasmin Assembler; it’s quite helpful for understanding the JVM internals.

Burgoo answered 1/8, 2010 at 20:35 Comment(5)
Thanks for your answer, I will take a look at Jasmin. And also I would be glad if you could upload the source so I can take a look. About the book you suggested, it seems interesting but it's out of print and quite old :(.Abra
The book is very cheap second-hand though. I found a copy for a few dollars.Busybody
Have a look at my edit above, if you have any questions, I'll be glad to help.Burgoo
The link to the "source-code itself" is broken. Although I guess that is to be expected after 8 years.Shenyang
@LlewVallis, if I interpret all information right, the code seems to be here: github.com/replimoc/compiler.Trangtranquada
D
16

Last semester I have attended a "Compiler Construction" course. Our project was exactly what you want to do.

The language I've used to write my language was Scala. It runs on a JVM but supports a lot of advanced features that Java doesn't (still fully compatible with a pure java JVM).

To output java bytecode I've used the Scala CAFEBABE library. Well documented and you don't have to go deep inside java classes to understand what to do.

Beside the book, I think you can find a lot of infos by going trough the labs we've done during the course.

Demonology answered 2/8, 2010 at 21:10 Comment(3)
This sound like a great course. Would you mind to share your notes or code?Owings
No problem, I will check where my backups are and post a link here so you can download it asap.Demonology
Neat, I've been looking for an hands-on compiler course that targets the JVM with all the material online for self-study.Busybody
O
5

Last weekend, I was asking myself the same question to port my toy language to the JVM.

I spend only few hours searching information,so take this references with a grain of salt.

  • Language Implementation Patterns. I hate antlr but this book looks very good. If you dont like antlr neither, there is a very good about parsing "Parsing Techniques. A Practical Guide."

    Learn to build configuration file readers, data readers, model-driven code generators, source-to-source translators, source analyzers, and interpreters. You don’t need a background in computer science—ANTLR creator Terence Parr demystifies language implementation by breaking it down into the most common design patterns. Pattern by pattern, you’ll learn the key skills you need to implement your own computer languages.

    Chapter 10 cover in 30 pages (to fast IMO) this topics. But there are other chapter that probably you will be interested.

    • 10 Building Bytecode Interpreters
      • 10.1 Programming Bytecode Interpreters . .
      • 10.2 Defining an Assembly Language Syntax
      • 10.3 Bytecode Machine Architecture . . . . .
      • 10.4 Where to Go from Here . . . . . . . . . .
      • P.26. Bytecode Assembler . . . . . . . . . . .
      • P.27. Stack-Based Bytecode Interpreter . . .
      • P.28. Register-Based Bytecode Interpreter
      http://pragprog.com/titles/tpdsl/language-implementation-patterns
    • The Implementation of Lua 5.0 This is a great paper about register- based bytecode machines. Go an read it even for the sake of it.

    • Lisp in Small Pieces. This book teach how to write a 2 schme compailers that compile to C. So many lessons can be learned from this book. I own a copy of this book and it is really good for anyone interesting is lisp, but maybe not your cup of tea.

      This is a comprehensive account of the semantics and the implementation of the whole Lisp family of languages, namely Lisp, Scheme and related dialects. It describes 11 interpreters and 2 compilers ...

    http://www.amazon.com/Lisp-Small-Pieces-Christian-Queinnec/dp/0521562473

Check the Dalvik7 VM, a register-based VM. The DVM operates on bytecodes that are transformed from the Java Class files compiled by a Java compiler.

There is a mailing list about the topic, jvm-languages.

Are you planning to upload the code to anyplace? I would like to take a look.

Owings answered 1/8, 2010 at 2:28 Comment(1)
Are you planning to upload the code to anyplace? I'm not proud of that code :( ... I maybe would rewrite the whole thing. Anyway If I do I will let you know. Thank you very much for your suggestions.Abra
C
5

ASM can be a solution for generating bytecode. To start, check the topics on generating elements from the manual.

Conch answered 1/8, 2010 at 2:46 Comment(0)
S
4

I was thinking to have a look at maybe another language that targets the JVM like Clojure, Jython or JRuby. But all these languages are very high level and complicated (to create a compiler for them).

Suggestion: You could have a look at Lua Programming Language, there are JVM implementations of it like LuaJ.

Lightweight, fast, Java-centric Lua interpreter written for J2ME and J2SE, with libraries for basic, string, table, package, math, io, os, debug and coroutine packages, a compiler, luajava bindings, and JSR-233 pluggable scripting engine bindings.

(Not to be confused with LuaJava that uses a native libs with JNI approach.)

Supertanker answered 1/8, 2010 at 20:53 Comment(0)
P
2

Of course once could use Java to write a new language. With the Java reflection-API You can achive a llot. If speed don't matters too much, I would give Java the preference instead of ASM. Programming is easier and less error-prone in Java (IMHO). Take a look at the RPN language 7th. It is entirely written in Java.

Paronymous answered 1/8, 2010 at 2:28 Comment(0)
C
2

I would recommend that you first learn how JVM assembly works, if you don't already know it.

Many instructions have the form ?name, where ? is i if the instruction works on with an integer type and a if it works with a reference type.

Basically, JVM is a stack machine with no registers, so all instructions work with data directly on the stack. You can push/pop data with ?push/?pop and move data between local variables (stack locations referenced by offsets) and the top of the stack using ?store/?load. Some other important instructions are invoke??? and if_???.

For my university's compiler course we used Jasmin to assemble the programs. I don't know if this is the best way, but at least it is an easy place to start.

Here is an instruction reference for an old version of the JVM, which might contain fewer instructions than a new one.

Consort answered 1/8, 2010 at 2:28 Comment(0)
L
1

First I'd back off, modify my compiler to output actual Java instead of Java byte codes (which means creating more of a translator than compiler), and compile the Java output with whatever Java environment is convenient (which would probably generate better object code than my own compiler).

You could use the same technique (eg, compile to C#) to generate CLI byte codes, or compile to Pascal to generate P-code, etc.

It's not clear why you're considering Java codes instead of using your own VM, but if it's for performance then of course you should also consider compiling to actual machine code.

Larrylars answered 1/8, 2010 at 2:28 Comment(1)
Compiling for the JVM will allow one's code to be run more widely than if one compiles to native code. Further, compiling to bytecode will make it possible for code to do some things which are not possible within the Java language itself.Coleorhiza
L
0

These days, I'd suggest Truffle as the perfect starting point. Once you have your AST, you can use Truffle's tooling and Graal for compilation. And since JDK9, Graal compiler can be used straight from the JDK itself. Truffle's API is IMHO as friendly as it gets, and by utilizing Graal you go in the same direction as Java itself.

Lite answered 1/8, 2010 at 2:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.