I am currently trying to design a Genetic Programming algorithm that analyses a sequence of characters and assigns a value to those characters. Below I have made up an example set. Every line represents a data point. The values that are trained are real-valued.
Example:
For the word ABCDE
the algorithm should return 1.0.
Example dataset:
ABCDE : 1
ABCDEF : 10
ABCDEGH : 3
ABCDELKA : 50
AASD : 3
The dataset could be as large as it is needed, since this is all just made up. Lets assume the rule that the GP should figure out is not too complicated and that its explained by the data.
What i would like the algorithm to do is to approximate the values from my dataset when given the input sequence. My problem now is that each sequence can consist of a different number of characters. I would prefer not to need to write some fancy descriptors myself, if possible.
How can I train my GP (preferably using tinyGP
or python) to build this model?
Since there was so much discussion here - a diagram says a thousand words: What I want to do is just put a data point and put that into a function. Then I get a value, which is my result. Unfortunately i do not know this function, I just have a dataset that has some examples (maybe 1000 examples just an example). Now I use the Genetic Programming Algorithm to find an Algorithm that is able to convert my Datapoint into a Result. This is my model. The problem that I have in this case is that the data points are of differing lengths. For a set length I could just specify each of the characters in the string as a input parameter. But beats me what to do if I have a varying number of input parameters.
Disclaimer: I have gotten to this problem multiple times during my studies, but we could never work out a solution that would work out well (like using a window, descriptors, etc). I would like to use a GP, because I like the technology and would like to try it out, but during Uni we also tried this with ANNs, etc, but to no avail. The problem of the variable input size remains.
for char in input
should do the trick. – Leicester