Code generation by genetic algorithms

Asked 20/4, 2011 at 15:33 Answered 1/3, 2015 at 22:52

Solved algorithm code-generation genetic-algorithm genetic-programming evolutionary-algorithm

Evolutionary programming seems to be a great way to solve many optimization problems. The idea is very easy and the implementation does not make problems.

I was wondering if there is any way to evolutionarily create a program in ruby/python script (or any other language)?

The idea is simple:

Create a population of programs
Perform genetic operations (roulette-wheel selection or any other selection), create new programs with inheritance from best programs, etc.
Loop point 2 until program that will satisfy our condition is found

But there are still few problems:

How will chromosomes be represented? For example, should one cell of chromosome be one line of code?
How will chromosomes be generated? If they will be lines of code, how do we generate them to ensure that they are syntactically correct, etc.?

Example of a program that could be generated:

Create script that takes N numbers as input and returns their mean as output.

If there were any attempts to create such algorithms I'll be glad to see any links/sources.

Brigettebrigg answered 20/4, 2011 at 15:33 Comment(5)

Could be funny if one of the generated program would erase the drive. Surely you would need some way to sandbox that, and then be careful when you open pandora's box. I believe there was a book (can't remember the name though) where such a program would eventually evolve such a way that it would take control over all the machines of the world and begin killing humans ;) – Moolah 20/4, 2011 at 15:39

A book? This idea has been beaten to death in the cheap SF literature. – Calondra 20/4, 2011 at 16:12

I tried this once. After the program became self-aware it took over the printer and used the laser to shoot everyone in the building. – Coupler 20/4, 2011 at 17:36

What is the fitness function? These evolutionary methods need a way to calculte how well they match some criteria. – Fimbria 20/4, 2011 at 17:52

Jivlain's answer is the one you wanna mark as answer – Whiteness 25/4, 2011 at 14:30

If you are sure you want to do this, you want genetic programming, rather than a genetic algorithm. GP allows you to evolve tree-structured programs. What you would do would be to give it a bunch of primitive operations (while($register), read($register), increment($register), decrement($register), divide($result $numerator $denominator), print, progn2 (this is GP speak for "execute two commands sequentially")).

You could produce something like this:

progn2(
  progn2(
    read($1)
    while($1
      progn2(
        while($1
          progn2( #add the input to the total
            increment($2)
            decrement($1)
          )
        )
        progn2( #increment number of values entered, read again
          increment($3)
          read($1)
        )
      )
    )
  )
  progn2( #calculate result
    divide($1 $2 $3)
    print($1)
  )
)

You would use, as your fitness function, how close it is to the real solution. And therein lies the catch, that you have to calculate that traditionally anyway*. And then have something that translates that into code in (your language of choice). Note that, as you've got a potential infinite loop in there, you'll have to cut off execution after a while (there's no way around the halting problem), and it probably won't work. Shucks. Note also, that my provided code will attempt to divide by zero.

*There are ways around this, but generally not terribly far around it.

Clevelandclevenger answered 20/4, 2011 at 22:28 Comment(4)

Note that you can count on any solutions actually produced by GP being far more obfuscated than that :) – Clevelandclevenger 20/4, 2011 at 22:33

I would point towards this project jgap.sourceforge.net/doc/robocode/robocode.html – Mertiemerton 21/4, 2011 at 9:19

GP evolves computer programs. These are traditionally represented in memory as tree structures but also commonly as a linear structure (series of commands) or graphs. Whatever representation works for you - nothing restricts GP to tree structures. – Coupler 17/12, 2011 at 9:3

you have to calculate that traditionally anyway Not necessarily, especially when you're trying to find an inverse function (which is often harder to calculate than the original function). For example, if you're evolving a square root function f(x), you can measure fitness by how close f(x*x) is to x. – Parson 15/1, 2016 at 9:7

It can be done, but works very badly for most kinds of applications.

Genetic algorithms only work when the fitness function is continuous, i.e. you can determine which candidates in your current population are closer to the solution than others, because only then you'll get improvements from one generation to the next. I learned this the hard way when I had a genetic algorithm with one strongly-weighted non-continuous component in my fitness function. It dominated all others and because it was non-continuous, there was no gradual advancement towards greater fitness because candidates that were almost correct in that aspect were not considered more fit than ones that were completely incorrect.

Unfortunately, program correctness is utterly non-continuous. Is a program that stops with error X on line A better than one that stops with error Y on line B? Your program could be one character away from being correct, and still abort with an error, while one that returns a constant hardcoded result can at least pass one test.

And that's not even touching on the matter of the code itself being non-continuous under modifications...

Unreflective answered 20/4, 2011 at 16:5 Comment(5)

Are you saying that Genetic Programming doesn't exist? en.wikipedia.org/wiki/Genetic_programming - this approach is currently used quite successfully in a huge number of search problems – Whiteness 24/4, 2011 at 23:5

@JohnIdol: My impression is that it's something that makes academics burst out in tears of joy over the potential for interesting papers to be written about it, but has yielded very few practical results that weren't either trivial or constrained to specific problem classes that lend themselves to proper fitness functions. – Unreflective 25/4, 2011 at 0:41

As I am not an academic I can understand that could be your impression (as I entertained the same thoughts before I looked into this stuff), but there are a number of instances where genetic programming solved problems far more efficiently abd with better results than standard genetic algorithms could. For an example, this is the first famous example of such an approach applied to cellular automata [ citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.55.7754 ] and it's dated 1996. We've come a long way since then. – Whiteness 25/4, 2011 at 14:16

@JohnIdol: That's what the wikipedia article says, but I've noticed that a lot of the linked-to projects seem to be stagnant. As for the paper you linked to, I'd say it proves my point: the fitness function is conveniently continuous, the program being evolved is extremely simple (are boolean functions even turing complete?), and the problem it solves has purely academical relevance. – Unreflective 25/4, 2011 at 20:38

The problem has not a purely academical relevance, since CA synchronization has direct applications in noise tolerance over digital transmissions, cryptography and others (even so, I don't see how the purely academical relevance would make any difference since we are talking about generic programs). I won't push this any longer since I am not on a crusade or anything, but the material is available out there if you're interested to get to know more before you pontificate further. – Whiteness 25/4, 2011 at 21:9

Well this is very possible and @Jivlain correctly points out in his (nice) answer that genetic Programming is what you are looking for (and not simple Genetic Algorithms).

Genetic Programming is a field that has not reached a broad audience yet, partially because of some of the complications @MichaelBorgwardt indicates in his answer. But those are mere complications, it is far from true that this is impossible to do. Research on the topic has been going on for more than 20 years.

Andre Koza is one of the leading researchers on this (have a look at his 1992 work) and he demonstrated as early as 1996 how genetic programming can in some cases outperform naive GAs on some classic computational problems (such as evolving programs for Cellular Automata synchronization).

Here's a good Genetic Programming tutorial from Koza and Poli dated 2003.

For a recent reference you might wanna have a look at A field guide to genetic programming (2008).

Whiteness answered 25/4, 2011 at 14:29 Comment(2)

the Genetic Programming tutorial link is dead, do you know where it has moved since? – Herakleion 3/4, 2018 at 11:0

@Wolf, I fixed the link. It probably got broken when geneticprogramming.com was redesigned a couple of years ago. Speaking of which, geneticprogramming.com itself is another great resource for this stuff - it has a lot of accessible high-level summaries of different GP techniques. – Hopfinger 3/4, 2018 at 22:41

Since this question was asked, the field of genetic programming has advanced a bit, and there have been some additional attempts to evolve code in configurations other than the tree structures of traditional genetic programming. Here are just a few of them:

PushGP - designed with the goal of evolving modular functions like human coders use, programs in this system store all variables and code on different stacks (one for each variable type). Programs are written by pushing and popping commands and data off of the stacks.
FINCH - a system that evolves Java byte-code. This has been used to great effect to evolve game-playing agents.
Various algorithms have started evolving C++ code, often with a step in which compiler errors are corrected. This has had mixed, but not altogether unpromising results. Here's an example.
Avida - a system in which agents evolve programs (mostly boolean logic tasks) using a very simple assembly code. Based off of the older (and less versatile) Tierra.

Hopfinger answered 1/3, 2015 at 22:52 Comment(0)

The language isn't an issue. Regardless of the language, you have to define some higher-level of mutation, otherwise it will take forever to learn.

For example, since any Ruby language can be defined in terms of a text string, you could just randomly generate text strings and optimize that. Better would be to generate only legal Ruby programs. However, it would also take forever.

If you were trying to build a sorting program and you had high level operations like "swap", "move", etc. then you would have a much higher chance of success.

In theory, a bunch of monkeys banging on a typewriter for an infinite amount of time will output all the works of Shakespeare. In practice, it isn't a practical way to write literature. Just because genetic algorithms can solve optimization problems doesn't mean that it's easy or even necessarily a good way to do it.

Extravehicular answered 20/4, 2011 at 16:3 Comment(3)

"In theory, a bunch of monkeys banging on a typewriter for an infinite amount of time will output all the works of Shakespeare." The key words being "in theory" and "infinite amount of time". The probabilty of a monkey getting just one line, say "To be or not to be, that is the question", by randomly hitting keys, even assuming we ignore spacing, punctuation, and capitalization, is something like 1 in 26^30 = 1 in 2.8E42. If you have 10 billion monkeys typing a line every second, the chance that you'll get even that one line in 10 billion years is still just 1 in 9e14. – Coupler 20/4, 2011 at 16:29

@Coupler and Larry, that's why genetic algorithms only work with a continuous fitness function. If your only test for fitness is whether or not string x equals string y, then there is no way for the genetic algorithm/program to determine if string x is closer to string y. This is what Michael was pointing out: you can evolve one program to be exactly like another (letter by letter), but you can't evolve it function by function because there is no gradual state... the program either works or it doesn't (e.g. throws an exception). – Billetdoux 21/4, 2011 at 0:34

If you know what the desired end state is, and you compare each new iteration to the desired end state in some incremental fashion, then sure, eventually you'll get there. But that requires knowing the desired end state and being able to determine whether something is closer or farther away. If you can do that, why not just write the code for the desired end state? I'll have to read some of the articles referenced, maybe there's something here that I'm missing, but I just don't see it. – Coupler 26/4, 2011 at 16:21

The biggest selling point of genetic algorithms, as you say, is that they are dirt simple. They don't have the best performance or mathematical background, but even if you have no idea how to solve your problem, as long as you can define it as an optimization problem you will be able to turn it into a GA.

Programs aren't really suited for GA's precisely because code isn't good chromossome material. I have seen someone who did something similar with (simpler) machine code instead of Python (although it was more of an ecossystem simulation then a GA per se) and you might have better luck if you codify your programs using automata / LISP or something like that.

On the other hand, given how alluring GA's are and how basically everyone who looks at them asks this same question, I'm pretty sure there are already people who tried this somewhere - I just have no idea if any of them succeeded.

Embraceor answered 20/4, 2011 at 16:6 Comment(3)

..thousands of people succeeded in writing GAs that successfully evolved computer programs (= Genetic Programming, GP) that can compete or outperform solutions coded by humans :-) – Coupler 17/12, 2011 at 9:10

Oh, and by the way: It is just pure fun watching meaningful behaviour emerge from a primordial pool of totally random 'programs' ;-) – Coupler 17/12, 2011 at 9:10

Just a little punctualization. Genetic algoritms do have mathematical background (and even not so easy): they maximize the fitness function and with crossover and mutation they avoid local optimums. – Ommatidium 10/6, 2014 at 0:3

Good luck with that.

Sure, you could write a "mutation" program that reads a program and randomly adds, deletes, or changes some number of characters. Then you could compile the result and see if the output is better than the original program. (However we define and measure "better".) Of course 99.9% of the time the result would be compile errors: syntax errors, undefined variables, etc. And surely most of the rest would be wildly incorrect.

Try some very simple problem. Say, start with a program that reads in two numbers, adds them together, and outputs the sum. Let's say that the goal is a program that reads in three numbers and calculates the sum. Just how long and complex such a program would be of course depends on the language. Let's say we have some very high level language that lets us read or write a number with just one line of code. Then the starting program is just 4 lines:

read x
read y
total=x+y
write total

The simplest program to meet the desired goal would be something like

read x
read y
read z
total=x+y+z
write total

So through a random mutation, we have to add "read z" and "+z", a total of 9 characters including the space and the new-line. Let's make it easy on our mutation program and say it always inserts exactly 9 random characters, that they're guaranteed to be in the right places, and that it chooses from a character set of just 26 letters plus 10 digits plus 14 special characters = 50 characters. What are the odds that it will pick the correct 9 characters? 1 in 50^9 = 1 in 2.0e15. (Okay, the program would work if instead of "read z" and "+z" it inserted "read w" and "+w", but then I'm making it easy by assuming it magically inserts exactly the right number of characters and always inserts them in the right places. So I think this estimate is still generous.)

1 in 2.0e15 is a pretty small probability. Even if the program runs a thousand times a second, and you can test the output that quickly, the chance is still just 1 in 2.0e12 per second, or 1 in 5.4e8 per hour, 1 in 2.3e7 per day. Keep it running for a year and the chance of success is still only 1 in 62,000.

Even a moderately competent programmer should be able to make such a change in, what, ten minutes?

Note that changes must come in at least "packets" that are correct. That is, if a mutation generates "reax z", that's only one character away from "read z", but it would still produce compile errors, and so would fail.

Likewise adding "read z" but changing the calculation to "total=x+y+w" is not going to work. Depending on the language, you'll either get errors for the undefined variable or at best it will have some default value, like zero, and give incorrect results.

You could, I suppose, theorize incremental solutions. Maybe one mutation adds the new read statement, then a future mutation updates the calculation. But without the calculation, the additional read is worthless. How will the program be evaluated to determine that the additional read is "a step in the right direction"? The only way I see to do that is to have an intelligent being read the code after each mutation and see if the change is making progress toward the desired goal. And if you have an intelligent designer who can do that, that must mean that he knows what the desired goal is and how to achieve it. At which point, it would be far more efficient to just make the desired change rather than waiting for it to happen randomly.

And this is an exceedingly trivial program in a very easy language. Most programs are, what, hundreds or thousands of lines, all of which must work together. The odds against any random process writing a working program are astronomical.

There might be ways to do something that resembles this in some very specialized application, where you are not really making random mutations, but rather making incremental modifications to the parameters of a solution. Like, we have a formula with some constants whose values we don't know. We know what the correct results are for some small set of inputs. So we make random changes to the constants, and if the result is closer to the right answer, change from there, if not, go back to the previous value. But even at that, I think it would rarely be productive to make random changes. It would likely be more helpful to try changing the constants according to a strict formula, like start with changing by 1000's, then 100's then 10's, etc.

Coupler answered 20/4, 2011 at 17:31 Comment(9)

@Jay, you could evolve a program to compile by having strict mutation rules that comply with the compiler, that's not necessarily the hard part... the hard part is evolving a program that WORKS (i.e. doesn't crash when run or performs something useful). You've made a good analysis otherwise. – Billetdoux 21/4, 2011 at 0:39

I thought a little about that, but my post was too long as it was. You could have the mutations generate tokens rather than individual letters, i.e. complete keywords, function names, variables names, etc. But it would still be hard to have anything resembling a random process that still guaranteed syntactic correctness. Like, you could make it generate only complete keywords like "if" and "else" and not "ib" and "els". But even a simple rule like making sure that all ELSE's are paired with an IF would require analysis of program structure that would take us a long way from "random mutations". – Coupler 21/4, 2011 at 1:11

@Jay, the "rules" are really easy to enforce. Each node in the tree has a given number of children, for example IFGT (if-greater-than) has four children: IFGT(A,B,CaseA,CaseB). The children can be other functions, such as ADD(A,B), SWAP(A,B,) or MULT(A,B)... so now you can make a program: IFGT(1,2,MULT(5,6),ADD(8,2)) where 1, 2, 5, 6 and 8 are all terminal nodes. Now you can easily mutate that program, but things get a bit more difficult if you add non-numeric nodes like SUBSTR(S1,S2). So making a program that compiles is not hard... – Billetdoux 21/4, 2011 at 1:58

But now you're clearly moving from "random mutations" to "artificial intelligence", which is a whole different question. When you start saying that you're going to build a higher-level map of the program, determine what type of nodes are valid at any given point within the map, manage interrelationships between the results of operations at those nodes, etc, this is not "writing a program through random mutations and a filter" any more, this is building a lot of intelligence into the "mutator". – Coupler 21/4, 2011 at 15:40

@Coupler It's no higher level of the program than the compiler. DNA is a great example: A only connects with T and C only connects to G, i.e. each DNA "node" can only connect to a specific set of nodes, it's the same with a node in a GP. This might help you: technically you can express any computer program with a minimal set of CPU instructions: ADD, OR, XOR. You can pretty much guarantee that any combination of those will always "compile," but even if it does compile you can't measure if it's closer to, say, opening a file or not (i.e. a file can't be 50% open or 75% open). – Billetdoux 21/4, 2011 at 16:28

@Jay, basically... see Jivlain's answer, he hits it right on the nail. – Billetdoux 21/4, 2011 at 16:31

@Link Okay, you could bypass the compiler and write machine code directly. If your computer implements every possible combination of bits as an executable opcode, then it will run. Then your mutator could randomly change or insert bits in the program. But then you're even less likely to get a program by chance mutations that does anything useful. There isn't even the higher level structure of a compiler language to impose some rationality. There's a big difference between "the program runs" and "the program produces meaningful output." – Coupler 26/4, 2011 at 16:17

..and why would you choose that silly representation? Typing text is programming for humans. There's way more efficient 'coding techniques' for machines, just look into GP with linear, tree or graph genomes (to name just a few). -another Jay – Coupler 17/12, 2011 at 9:7

another @Jay: Sure, that's what I said in an earlier post: You could have a set of rules to constrain changes to something valid. I don't doubt you could build a system where every mutation compiles. Enough constraints could perhaps ensure that it does not crash. But the more constraints you add, the less this is random mutation and the more it is artificial intelligence. And you still have to figure out how to decide that the program is "approaching the correct solution" without intelligent analysis. – Coupler 26/12, 2011 at 22:9

I want to just give you a suggestion. I don't know how successful you'd be, but perhaps you could try to evolve a core war bot with genetic programming. Your fitness function is easy: just let the bots compete in a game. You could start with well known bots and perhaps a few random ones then wait and see what happens.

Cabinetwork answered 21/4, 2011 at 3:12 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags