How do I design and implement a programming language? [closed]
Asked Answered
O

4

12

This question is related to

The past couple of years I've been thinking about things I like and don't like about languages I use. I always wanted to write my own language, but never did so.

I also own both the Lego RCX and NXT, but most of the time I never actually make my robots do anything because of their restrictive visual programming environments.

I think I will design my programming language for the NXT because there are already tons of general purpose languages and the NXT gives me a concrete set of problems and goals and hopefully a nice sandbox to play with.

Now what? Where do I start? What do I need to know?

If possible, I'd write the compiler in Python or Clojure. There is an SDK for the NXT, but also an Assembly language. What would be the best/easiest route?

The Lego NXT has a small screen, USB and Bluetooth, it has 4 sensor ports both digital and analogue, 3 output ports and 2 ARM processors, one main processor and one co-processor. http://mindstormsnxt.blogspot.com/2006/08/whats-inside-nxt-brick.html

Programming the NXT is going to all about handling data and events, so some sort of monoiconic dataflow/reactive style seem appropriate. It should also handle parallel tasks well, so I'm thinking functional. I'm currently thinking of stack based as well.

In my head I'm already trying to unify these concepts and think of sample code. I'm thinking of a tree rather than a stack, where functional branches can run in parallel. An example:

# implicit main stack
5 5 +
# 10

# quoted branch or list
[1 -]
# 10 [1 -]

# eval list and recur until false
loop
# [9 8 7 6 5 4 3 2 1 0]

# define stack as a function
[1 = [1 8 motor] [1 0 motor] if] fn
# [9 8 7 6 5 4 3 2 1 0] <function>

# define function as a symbol
"handle-press" def
# [9 8 7 6 5 4 3 2 1 0]

# reactively loop over infinite lazy stack returned by sensor
# in a parallel branch
|4 sensor handle-press for|
# [9 8 7 6 5 4 3 2 1 0] [8 nil nil nil 8 ...]

There are obviously still gaping holes in the reasoning behind this, but I'm posting this rough sketch anyway to spark some helpful answers and discussion.

Outgeneral answered 25/10, 2010 at 11:55 Comment(1)
B
23

Now what? Where do I start? What do I need to know?

Start by learning more programming languages.

After learning several languages, buy a book on compilers. There are many. Google will help. It doesn't matter which one you buy. You'll need several. It's okay to read many books.

Once you've learned languages and read up on compilers, do the following.

  1. Build the run-time libraries you need. Implement them in some suitable language like C or Python or whatever.

  2. Once you have run-time libraries which really work. Really totally work. Totally. You can think about syntax and lexical scanning and compiling. Those are hard problems, but not half as hard as getting your run-time libraries to work.

Fooling around with syntax (i.e., a Domain Specific Language) is an attractive nuisance. Many people have "improved" syntax but no usable run-time libraries. So their "language" is incomplete because it doesn't do anything.

Get your language to do something first.

Bandler answered 25/10, 2010 at 12:7 Comment(5)
What about writing a self hosted language? What about using the runtime of the host language?Outgeneral
@Pepijn: "self hosted language"? You mean like LISP or Forth where the language is written in itself? What would you like to know? Perhaps you should open another question. First, identify the specific things you'd like to know. Then Google for those things. Then ask questions here to clarify anything that was confusing.Bandler
Could you comment more on "1. Build the run-time libraries". In my head the term "run-time library" is broad (could be anything built into language). What kind of run-time libraries do you mean?Mesozoic
@Halst: "What kind of run-time libraries do you mean?" run-time library could be anything built into language. Anything. Step 1 is to get something that works first, and fiddle with syntax later. For "works" you have to pick a subject area or domain or focus or something. It's important to avoid being vague about what a programming language is going to do; pick something -- anything -- and get the run-time support squared away and working.Bandler
It is impossible to do a RT if the IR-instructions is not known. The Intermediate Representation falls out naturally from CT and DT. The commands come from the concepts/ideas of the domain. The machine is not the reality out there, if you want a domain-specific language. Don't let the machine design the language. The domain and concepts you want into the heads of the users should be the commands. The IR instruction set should be as near as possible the commands, to get the fastest interpreter. Sorry I couldn't do Answer, Q was closed.Rectrix
C
5

The easiest route is using a concatenative programming language, like Forth, Factor, or your own design of one.

The Forth interpreter is very easy to implement and does not need to take up more than a few KB; important for the Lego device. You need to understand how the Forth interpreter works. This is covered, for instance, in chapter 9 of Starting Forth.

Crapulous answered 25/10, 2010 at 15:34 Comment(0)
L
5

Read fun books about language design!

the author of Clojure recommended following the book "lisp in small Pieces" by Christian Queinnec. The Clojure Reading list covers many books that incluenced the design of the Clojure language.

Longbow answered 26/10, 2010 at 18:42 Comment(0)
T
5

Don't afraid to write a compiler, which compiles to an existing language, and not to object code. For example, Lightweight C++ is a C++ -> C compiler is based on this idea (altough, C++ does the same job somewhere): http://linux.wareseeker.com/Programming/lightweight-c-1.3.2.zip/331414

If you have a small-but-smart idea on how to improve programming, it's a quick win way.

There's a similar situation with search engines. If I say, that I can do better than Google, maybe I can do it with a Google mashup, which reorganizes Google's result set, and I don't need to buy 343 Zigabytes of storage to set up a second Google just for changing the number of results from 10 to 15. (Unfortunatelly, it does not works if I have different ranking or crawling ideas.)

Maybe, Twitter is a better example. Write your own Twitter by using Twitter API. (Of course, only if your idea fits into the Twitter's base model.)

We're now working on a dataflow engine (see Wikipedia: flow-based programming, dataflow programming). We've developed a very lite new language, which has 3 instruction types (component creation, parameter setting, message declaration), and 2 block types (component declaration and implementation). It's compiled to C++ code, so the compiler is simple, and result is optimal fast. Also, there're several cases, when our language script is generated from configurations, or, more elegant, it supports metaprogramming.

We should break off the 1-step (source->executable) and 0-step (the source script is the executable) complilation languages; 3-4 level is easy yet to overview, and - if we do it right - it can make the developement more effective.

This answered 1/11, 2010 at 22:42 Comment(2)
Is there a place one can read about your dataflow engine? Is it somehow related to digital hardware? Anyway, it is an interesting topic for me.Mesozoic
see homeaut.com, and feel free to e-mail your questionsThis

© 2022 - 2024 — McMap. All rights reserved.