Permutation job scheduling with partial available machines

Asked 30/11, 2014 at 14:47 Answered 6/12, 2014 at 23:47

algorithm search genetic-algorithm evolutionary-algorithm

I'm looking for a suitable algorithm to solve a time scheduling problem. First i will outline the problem itself, then in a second part i will give the direction i was thinking towards for a solution. I'm trying to solve this problem because i have an interest in these kinds of problems and also because the same kind of problem can be solved later with more variables and a bigger setup.

problem

I would like to do some tests on batteries to see how they respond when connected to a load. And perform these tests in the shortest amount of time possible to complete all tests. The two important variables here are:

State-of-Charge (SoC) the amount of energy left in the battery from 100% to 0%. We will test 99%, 75%, 50% and 25% (4 variations). (explained later why 99% and not 100%). We will assume the SoC lost when relaxing is 0.
Relaxation the amount of how much the battery has relaxed in hours. We know that theoretically 24 hours should be enough, so this is the maximum. We will test different times like: 5min, 15min, 30min, 1 hour, 2 hour, 6 hour, 24 hour (7 variations).

Total combinations: 4 x 7 = 28 for one battery

The order in which the test should proceed is the following: Charge to 100%, discharge to wanted SoC, relax, discharge to a new SoC while measuring

Example: we want to see how the battery reacts while discharging from 75% to 50% while having relaxed for 2 hours

Battery has unknown SoC (measurement methods are not accurate enough)
Recharge to 100%
Discharge to 75%
Relax 2 hours
Discharge while measuring, stop at 50%

The battery can now relax again and start its measure from 50% to 25%. It does NOT have to be recharged to 100% again.

situations / states

Now i will outline some situations which can occur and what has to be done in such case.

initialization

The problem can be initialized with already performed tests (this is important because we might want to reschedule halfway through). If the batteries have a known state (SoC/relax) we can use that. If the SoC is unknown then the battery has to be recharged. If the relaxation is unknown but the SoC is known then the battery has to be relaxed for at least 24 hours.

recharge

Putting the battery in the recharger has to be done manually. Leaving the battery in the recharger is not a problem. Recharging takes about 2.5 hours. Each battery has it's own dedicated charger, but in the future we might have more batteries then chargers so the algorithm needs to be able to take a variable amount of chargers.

relaxation (relax)

Relaxation can simply be done by not connecting the battery to anything, so it does not need any special equipment. Before the relaxation time period can start the battery has to be stressed (= connected to the discharger). We don't know for sure how long the stress period will take, but we assume that the period it takes to discharge the battery 1% will be enough. 99% will therefor be the first SoC where we can accurately determine the relaxation time.

discharging

There is only one discharger at the moment, but the algorithm should be able to take a variable amount of dischargers. Putting the battery in the discharger has to be done manually (also taking it out). HOWEVER putting the battery in the discharger does not necessarily discharge the battery right away. A time can be set to start at a certain time. And the discharger can automatically stop when enough energy has been discharged. An estimate of the discharging time can be estimated from a lookup table. This is not linear so 75% to 50% does not have to take the same amount of time as from 25% to 0%. The lookup is fairly accurate (about 5 minute difference on 2.5 hours).

waiting

The battery can wait if all dischargers are taken, but waiting for a discharger raises the relaxation time. So if the relaxation time gets higher than the relaxation time needed for the measurements that have to be performed then it either has to discharge to a lower level of charge and relax again, or it has to be charged again.
The battery can wait if all chargers are taken safely, there is no penalty/disadvantage here other then loosing some time for having to wait.

constraints

The things that have to be done manually can only be done during office hours (monday-friday 8:30-17:00). So for example putting the battery in the discharger has to be done manually. Then at a set time in the night (after the battery has relaxed enough) the discharger can be started on a timer, then next morning when arriving at the office the battery can be put in the charger.

thoughts for a solution

I'm not sure if i'm thinking in the right direction here, because i don't have the working solution yet. So anything in the part might be wrong..

The sequence of tasks matter because a different sequence might introduce more or less waiting time then another sequence. So for just one battery with 28 tests that will be a permutation of 28! which is quite big number. Therefor an exhaustive search of the problem space is not feasible. The only type of algorithm that i know that can give a fairly good result on these kinds of problems is the genetic algorithm. Though with all the constraints and possibilities i can not just use a classic genetic algorithm. I've read some (research) papers and eventually the description of the Permutation Flowshop Scheduling Problem (PFSP) resonated the most (various sources). Although the mentioned Extended Job-Shop Scheduling Problem (EJSSP) here was also interesting.

The biggest problem i see is the office hours constraint. If it wasn't for that the scheduling could be similar to just fitting blocks into time slots (even though the slots would be of dynamic size). I'm not sure what is the best way to deal with this constraint. Either i could model the machines (discharger) as two separate machines that are each active at different moments, or i could introduce fake jobs so that the machines can not be taken by the normal jobs.

This is just speculation at this point, because of my lack of experience. I'm more of a pragmatic programmer than an academic and i have a real hard time to figure out which of the possible algorithms are suitable and what the caveats are. I'm happy to do the implementation, but right now im still stuck at:

which algorithms are suitable for this type of problem?
how do i set the special conditions on the algorithms?
how do i can i make a crossover/selection/mutation function?
do i need to break this problem up in sub problems and incorporate that into a bigger algorithm? Which sub-problems are optimal to solve first?
how would the pseudo-code look like?

Pokorny answered 30/11, 2014 at 14:47 Comment(10)

Do you need to be present at the start and/or end of measuring? If not, is it possible to program a series of discharge and relax times into the discharger? E.g. in your example, could you have manually put the freshly charged battery into the discharger, and programmed it to discharge to 75%, then relax 2 hours, then continue discharging to 50% while it measures, then relax 5 minutes, then discharge to 25% while it measures? – Composition 4/12, 2014 at 0:58

How many batteries? It would also be helpful to know roughly how long a full (100% to 0%) discharge takes. And regarding "If the relaxation is unknown but the SoC is known then the battery has to be relaxed for at least 24 hours": does that mean that a battery with known SoC but unknown relaxation cannot simply be recharged right away? – Composition 4/12, 2014 at 1:23

I think asking the question in CS Theory section is better, since stack overflow is mainly technical – Unbuild 4/12, 2014 at 8:42

@Composition the order is this: put battery in discharger, start measurement, end measurement, take battery out of discharger. I need to be present for the 1st and the 4th step. Your proposol to keep the SAME battery in the discharger and have multiple discharge/relax series is POSSIBLE. There are 2 batteries at this moment, but this should be a variable. "with known SoC but unknown relaxation cannot simply be recharged right away?" that's right, you need to wait 24 hours. – Pokorny 4/12, 2014 at 9:2

@khaledAKhunaifer yes i was doubting between here and CS. After the bounty period i can ask a moderator to move the question or i can delete and reask. – Pokorny 4/12, 2014 at 9:3

OK. Still slightly confused about this 24-hour relaxation: I presume a 24-hour relaxation is also required before a recharge even if you know the relaxation level, but it exceeds the level you need? E.g. if the battery has relaxed for 5 hours already, but your next planned test is for a 2-hour relaxation, you would need to let it relax another 19 hours (5+19=24) before recharging, right? – Composition 4/12, 2014 at 17:30

Also roughly how long does a full (100% to 0%) discharge take? – Composition 4/12, 2014 at 19:23

@Composition if you know the relaxation level, and it is more hours than you need then you need to recharge/discharge again. If it is less than you need then you simply have to wait until enough time has passed. So the battery relaxed 5 hours, gets a reset by charge/discharge then starts counting up from 0 to 2 hours while relaxing. The discharge is estimated at 12 hours, but this length should not determine the type of algorithm. – Pokorny 4/12, 2014 at 22:40

Not an answer to your question, but please don't delete the question now.. someone has already put a lot of effort trying to answer the question. Both Question and Answer are interesting to me at least :) – Abiosis 6/12, 2014 at 4:35

is it using genetic algorithm mandatory ? is it ok to use different meta heuristic algorithm ? – Thorp 6/12, 2014 at 23:11

Overview

Although you've explicitly tagged evolutionary-algorithm, I'd suggest you to have a look on a set of algorithms summarized under the name Approximate Dynamic Programming (ADP). In my opinion a good introductory book is the one of Warren B. Powell. It contains a lot of such ressource allocation problems, as well as several other practically relevant stuff which often goes under the name of Optimal Control (example: controlling a trucking company having several thousands of trucks).

One advantage ADP has over directly applying evolutionary-algorithms, simulated-annealing and so on is that it doesn't present a specific algorithm, but rather a framework for modeling time-dependent Markov decision problems. Within these framework, one is free to employ a variety of appropriate algorithms.

In order to apply ADP, one key is the correct mathematical modelling of the given problem. Of central importance is the state of the system, the actions one can take in a given state as well as the cost these actions require and which one wants to minimize -- your costs here are given by the duration of the test. Given one has constructed an appropriate model, the task is then to (approximately) solve the Bellman equation, for which several algorithms exist.

In the following, I'll give an example of how one can proceed here. This is naturally not a ready-to-use model, as such might take a rather long time to build -- actually I think this would be a fine problem for a masters or even a PhD thesis. However, I'll try to keep it exhaustive in a first step, such that one can introduce approximations later on.

Modelling the state of a single battery

First we are going to set up a rough model for a single battery. Here it is helpful that your problem is as deterministic as you described, i.e. that there are no stochastic components here (for example, that all durations are fixed and not randomly drawn from some statistical distribution).

Single state: As you wrote, one battery is given by a state
```
S = {SoC, Relax}      where SoC   \in {UNKNOWN, 0%, 25%, 50%, 75%, 100%}
                      and   Relax \in {UNKNOWN, 0m, 5m, 15m, 30m, 1h, 2h, 6h, 24h}
```
I've added 0% and 0m for convenience, although they are maybe not really needed here.

Note that I've already made a large simplification here, as the State-of-Charge can also be 79%, for instance. The assumption which justifies this is the following: once you start your experiment for the first time (or also anew after a long time), all batteries are reasonably in the state {UNKNOWN,UNKNOWN}. Then, according to your description, the first one has to do is to do a full recharge, which sets all to the state {100%,0m} (and which costs 2,5h). From here, one only does qualified state changes -- discharging is done only to specific SoC's, and recharching only to 100% (this I assumed based on your description). Note that this becomes harder in a more natural stochastic framework, where for instance the batteries' SoC is not that well-behaved.
Actions and costs: for specific single-battery states, one has associated a set of feasible actions (plus their corresponding costs). Let's collect them in the following:
```
State                  Possible Actions                       Cost
-------------------------------------------------------------------------------
{UNKNOWN, Relax}  ->   RECHARGE TO {100%,0m}                  2,5h

{SoC, 0m}         ->   RELAX TO {SoC,5m},                     5m          

{SoC, Relax}      ->   RECHARGE TO {100%,0m},                 2,5h           
                       DISCARGE TO {SoC-1},                   C(SoC, SoC-1)
                       DISCHARGE-MEASURE TO {SoC-1},          C(SoC, SoC-1)
                       RELAX TO {SoC, RELAX+1}                C(Relax, Relax+1)

{SoC, UNKNOWN}    ->   RECHARGE TO {100%,0m},                 2,5h
                       DISCARGE TO {SOC-1,0m},                C(SoC,SoC-1)   
                       RELAX {SoC, 24h}                       24h
```
SoC-1 here stands for the next feasible state, whereas C(SoC, SoC-1) means "time to go from SoC to SoC-1, eg. from 75% to 50%. It's your turn here to check this table whether it meets your model. If not you have to correct or extend it.

Note that I've made again a simplification by allowing only transitions to the next feasible state SoC-1 (e.g., from 75% to 50%). This is reasonable as the cost is assumed as additional. For example, when you go from 75% to 25%, it takes the same time as when going first to 50% and then to 25%.

Further, all RECHARGE and DISCHARGE actions are feasible only in the office hours, which has not been accounted for in the above table (but which needs to be incorporated later in the model).

Combine the above to a model of the complete system

Now let us assume you have N batteries, M rechargers and K dischargers (where we can assume M<=N and K<=N), all of which are identical. Further, let's say the goal is to perform each test only once.

Test state: The test state Test is a vector of dimension 4*7 of 0's and 1's containing the information whether a specific test has already been performed. Note that this corresponds to 2^28 possible states, so one definitely has to introduce an approximation here.
Multi-Battery state: The combined state of all batteries B is the cartesian product of the single-battery state, i.e. it is in the space {SoC,Relax}^N. That means simply that one needs to consider N single-battery states.
```
B={SoC_1, Relax_1}, ..., {SoC_N, Relax_N}
```
Again, the size of this space is is going to be a very large number for moderate numbers N.
Office time: Further, we need to incorporate the time of the day T. Doing it exactly, one ends up with a number of 24*60m / 5m = 288 possible five-minutes slots.
Multi-Battery actions: Similarly, the multi-battery actions are given by an N-dimensional cartesian product of the one dimensional actions. RECHARGE and DISCHARGE is only feasible for T in the offcie hours and if enough Re- and Dischargers are available (the sum of all RECHARGE/ DISCHARGE must not exceed M/K).

Summarizing, the complete state S is given by the combination

S = {Test, T, B}

The dimension of the state space is about 2^28 * (6*9)^N * 288 which quickly becomes huge.

Further, for each state there is a corresponding set of allowed actions which should be clear by now.

So now that the model of the system has been specified more or less (correct it if needed!), we can go on by trying to solve it.

The Bellman equation

The Bellman equation is the central equation of (approximate) dynamic programming. For a nice introduction have a look at the book of Sutton and Barto which is freely available on the net.

It's idea is rather simple: for each possible state, introduce a value function V(S) which tells you how good it is to be in state S. This value function in principle contains the time needed to finish the tests once you are in state S. In order to determine this value for a finite-horizon problem as this is, one starts from the end state and recurses until the beginning -- that is, at least if the size of the problem allows it. But let's do this schematically in the following and see:

The final state is the one where Test contains only ones, i.e. all 28 tests have been performed. Set V(S)=0 for all these states (regardless of T and the multi-battery state), as you do not need any more time.

One step back: Now consider all states where Test has only 27 ones and one zero, i.e. one test still needs to be performed. For each possible multi-battery state B and time point T one can automatically spot the quickest alternative. Set V(S) equal to this cost.

Another step back: Next one considers all states where Test has only 26 ones and two zeros (two tests still to be made). Now, for each possible multi-battery state B and time point T one chooses the action a such as to minimize the cost of the action plus the value of the state to which the action leads. In terms, you have to choose a such as to minimize C(a) + V(S'), where a leads from S to S'. If you found this action, set the state equal to V(S) = C(a) + V(S').

And so on. One does this for all possible states and stores each optimal value V(S) obtained in this way -- for small number of batteries N, this might even be feasible in practice.

Once you're ready with this, you get the optimal action in each state. For this, if one is in state S, one follows the same recipe as above and always picks the action a which minimizes C(a) + V(S'). This can also be done only once when the best action for each state is stored.

And then you're done -- you completely solved your problem. Or say, at least theoretically, because in practice the problem size requires too much effort and storage to do the above backward recursion and store it in a table (for the problem as specified above, I'd say this regime begins when N~3 and larger). One therefore needs to introduce approximations.

Approximations

In using approximations, in general one sacrifices the "optimal solution" for a "good-working solution". As this can be done in several ways, this is where art begins. Or, to cite Powell, chapter 15.7.: "A successful ADP algorithm for an important problem class is, we believe, a patentable invention". Due to this, I will only sketch some possible steps that can be done here.

One class of approximation methods called aggregation uses a simplified model of the state variable. For instance, instead of including the time T in 5m chunks in the state variable (288 states), one can use 1h chunks or even a boolean value which indicates whether it's office time. Or you can use 5m chunks only during office time plus one state out-of-office-time. And so on ... lot's of possibilities. Here one always gains a smaller tabular representation of the
Another class of approximation methods uses a parametrized representation of the value function, say a linear regression model or a neural network. In the iteration scheme above, the value functions are not stored in a table, but they are rather used as input to fit the parameters. This one replaces the often huge tabular representation by a much smaller number of parameters, but a drawback is that the fitting procedure is usually more sophisticated. (Note that in this step evolutionary algorithms can naturally be applied).
In another method, one uses basis functions which capture important states of the system. For example, in a tic-tac-toe game, you do not need all possible game states, but basis states which indicate who occupies the center and the occupied number of edges are sufficient.
Next, instead of trying to perform a full iteration over all states, one can use Monte-Carlo methods to randomly explore many but not all possible states. This is the more efficient the better heuristics exist, which let the algorithm explore meaningful states.

For other ideas as well as their practical application, consult the books mentioned above.

Ok, that became lengthy but I hope it helps to give you some idea of one possible approach. I would suggest you to design a small model using only one battery, say, and try to implement the above backward-iteration by yourself. Alternatively, in the both cited books you find several toy problems where you can get familiar with these kind of problems. Good luck with the batteries!

Tantamount answered 5/12, 2014 at 1:13 Comment(4)

It was a very interesting read, the level of explanation is precisely right. Not too abstract/academic that i don't know how to implement it, and leaving out some implementation details of which i think i can figure them out by myself. I'm glad you considered something else than a genetic algorithm. You meet criteria 1 partially, by reviewing one alternative algorithm other than the proposed one. Of course this answer also fulfil criteria 2 which is to model a solution. I will try to implement this within a few days and get back to you. – Pokorny 5/12, 2014 at 9:58

@Flip: just stumbled over this again ... have you solved your problem back then? And if yes with which method? – Tantamount 14/6, 2017 at 5:51

I tried to implement this myself but had to give up at the time, however i cherish this answer and will certainly be needing it in the future. I will this accept the answer once i verified it. – Pokorny 14/6, 2017 at 11:11

@Flip: it's not about the acceptance, I was just curious. If you need further assistance, just let me know. – Tantamount 14/6, 2017 at 20:7

As this is StackOverflow, and as you are going to spend fkn 500 of your hardly earned points, and as I am greatly interested in this stuff myself, I wrote you some example code which implements the recipe I presented a few days ago. Although it is a toy model which is solved there, it contains most of the issues of your original problem, and -- given enough computer power -- might be readily extended to solve the latter.

Model problem

Let us consider the following model problem:

N=2 batteries, M=2 Rechargers, K=1 Discharger.
Time T proceeds in multiples of 6 hours: 0:00, 6:00, 12:00, 18:00. Office hours are all except 0:00 in the night.
Battery states are:
- SoC in {0%,50%,100%}
- Relax in {0h,6h,12h,18h,24h}
Batteries are initially in the state {0%,0h}, i.e. uncharged at the start of the test conduction (Relax is unimportant as one anyway needs to recharge first).
Available actions are:
- RELAX: do nothing. Increases Relax by 6 hours.
- RECHARGE: increases SoC to the next state and sets Relax to 0h.
- MEASURE: decreases SoC to the previous state and also sets Relax to 0h.
Each action "costs" 6 hours.

Note that MEASURE behaves almost identically to what you called DISCHARGE. Particularly, if MEASURE is applied to a battery state which is not a test case, it is just a DISCHARGE. However, and that is the only difference, when MEASURE is applied to one of the test cases also the Test state is updated.
Batteries need to be measured once for each state {SoC,Relax} in {{50%,6h},{100%,6h},{50%,24h},{100%,24h}}. That makes a Test vector of dimension four containing either zeros or ones.

Once you got familiar with the implementation below, you can easily change all these choices.

Model description

I'm going to use the system description as presented in my previous answer. In particular, the state of the system is given by a multi-vector

{Test, T, {SoC_1,Relax_1}, {SoC_2,Relax_2}}

This is in fact not the best state variable (--a state variable should always be as compact as possible). For example, it is not accounted for the symmetry of the batteries. In "reality-hard" models, problems as this should be avoided whenever possible.

Further, I simplified things by assuming all happens in time-steps of 6 hours. By this one can simply increase the T variable by 6 hours after each action. If there were actions with a duration of, say, 12 hours, one would need to add further variables to the state variable which indicate whether the battery is blocked and when it will be available again. Not to speak if there would be an action which lasted 5 hours, for instance. So you see, the time is treated rather basic here.

Implementation

I've used C++ for the implementation and I hope you're comfortable with that. The code itself is already rather lengthy, but I tried to comment it at least a bit. Here are the main points implementation:

First, regarding the basic elements: the more elementary ones are given by enums such as
```
enum struct StateOfCharge
{
    P0=0,
    P50,
    P100,
    END_OF_LIST
};
```
Integers would have also worked well, but I guess it becomes easier to read using enums. For these enums, I've also provided operator<< for screen output as well as operator++ and operator-- for increasing/decreasing the state. The two latter operators are function templates accepting any enum (--at least if it contains a state END_OF_LIST).
The more advanced elements such as states and actions are classes. Particularly the State class contains a lot of logic: it is able to tell whether a given action is allowed via its member
```
bool isActionAllowed(Action const&) const
```
Further it can give you the next state for a given action via
```
State nextState(Action const&) const
```
These functions are central in the value iteration descibed next.
There is the class BatteryCharger, which performs the actual dynamic programming algorithm. It contains a container std::map<State,int> to store the value of a given state (remember that the value here is simply the time required to finish, which naturally shall be minimized). In order for the map to work, there is also an operator< comparing two State variables.
Finally, a few words on the Value Iteration scheme. In the answer above, I wrote one can do it by backward-iteration. This is true, but it can become rather complicated because you need to find a way back such that all possible states are covered (and at best only once). Although this would be possible here, one can also simply iterate over all states in an arbitrary way. This needs to be done iteratively, however, because otherwise you draw on state-values which you haven't optimized yet. The iteration here ends when all values are converged.

For more on value iteration, see again the book of Sutton and Barto.

Code

Here is the code. It's not particularly well written (--global variables etc.), but it works and can be easily extended:

#include<iostream>
#include<array>
#include<map>
#include<type_traits>
#include<algorithm>

template<typename T, typename = typename std::enable_if<std::is_enum<T>::value >::type>
bool isLast(T const& t)
{
    return t == static_cast<T>(static_cast<int>(T::END_OF_LIST) - 1);
}

template<typename T, typename = typename std::enable_if<std::is_enum<T>::value >::type>
bool isFirst(T const& t)
{
    return t == static_cast<T>(0);
}



template<typename T, typename = typename std::enable_if<std::is_enum<T>::value >::type>
T& operator++(T& t)
{
    if (static_cast<int>(t) < static_cast<int>(T::END_OF_LIST) - 1)
    {
        t = static_cast<T>(static_cast<int>(t)+1);
    }
    return t;
}

template<typename T, typename = typename std::enable_if<std::is_enum<T>::value >::type>
T operator++(T& t, int)
{
    auto ret = t;
    ++t;
    return ret; 
}


template<typename T, typename = typename std::enable_if<std::is_enum<T>::value >::type>
T& operator--(T& t)
{
    if (static_cast<int>(t) > 0)
    {
        t = static_cast<T>(static_cast<int>(t)-1);
    }
    return t;
}


template<typename T, typename = typename std::enable_if<std::is_enum<T>::value >::type>
T operator--(T& t, int)
{
    auto ret = t;
    --t;
    return ret;
}



const int Nbattery = 2;
const int Nrecharger = 2;
const int Ndischarger = 1;
const int Ntest = 4;

enum struct StateOfCharge
{
    P0=0,
    P50,
    P100,
    END_OF_LIST
};

//screen output
std::ostream& operator<<(std::ostream& os, StateOfCharge const& s)
{
    if(s==StateOfCharge::P0) os << "P0";
    else if (s==StateOfCharge::P50) os << "P50";
    else if (s == StateOfCharge::P100) os << "P100";
    return os;
}


enum struct StateOfRelax
{
    H0=0,
    H6,
    H12,
    H18,
    H24,
    END_OF_LIST
};

//screen output
std::ostream& operator<<(std::ostream& os, StateOfRelax const& s)
{
    if(s==StateOfRelax::H0) os << "H0";
    else if (s == StateOfRelax::H6) os << "H6";
    else if (s == StateOfRelax::H12) os << "H12";
    else if (s == StateOfRelax::H18) os << "H18";
    else if (s == StateOfRelax::H24) os << "H24";
    return os;
}


struct SingleBatteryState
{
    //initialize battery as unfilled
    StateOfCharge SoC;
    StateOfRelax Relax;
    SingleBatteryState(StateOfCharge _SoC = static_cast<StateOfCharge>(0), StateOfRelax _Relax = static_cast<StateOfRelax>(0))
        : SoC(_SoC)
        , Relax(_Relax)
    {}


    //loop over state
    bool increase()
    {
        //try to increase Relax
        if (!isLast(Relax))
        {
            ++Relax;
            return true;
        }
        //if not possible, reset Relax
        else if (!isLast(SoC))
        {
            ++SoC;
            Relax = static_cast<StateOfRelax>(0);
            return true;
        }

        //no increment possible: reset and return false
        SoC = static_cast<StateOfCharge>(0);
        Relax = static_cast<StateOfRelax>(0);
        return false;
    }
};

std::ostream& operator<<(std::ostream& os, SingleBatteryState const& s)
{
    os << "("<<s.SoC << "," << s.Relax << ")";
    return os;
}


bool operator<(SingleBatteryState const& s1, SingleBatteryState const& s2)
{
    return std::tie(s1.SoC, s1.Relax) < std::tie(s2.SoC, s2.Relax);
}

bool operator==(SingleBatteryState const& s1, SingleBatteryState const& s2)
{
    return std::tie(s1.SoC, s1.Relax) == std::tie(s2.SoC, s2.Relax);
}

//here specify which cases you want to have tested
std::array<SingleBatteryState, Ntest> TestCases = { SingleBatteryState{ StateOfCharge::P50, StateOfRelax::H6 }
                                                  , SingleBatteryState{ StateOfCharge::P50, StateOfRelax::H24 }
                                                  , SingleBatteryState{ StateOfCharge::P100, StateOfRelax::H6 }
                                                  , SingleBatteryState{ StateOfCharge::P100, StateOfRelax::H24 } };


// for a state s (and action MEASURE), return the entry in the array Test
// which has to be set to true
int getTestState(SingleBatteryState const& s)
{
    auto it = std::find(std::begin(TestCases), std::end(TestCases), s);

    if(it==std::end(TestCases))
        return -1;
    else
        return it - std::begin(TestCases);
}


enum struct SingleAction
{
    RELAX = 0,
    RECHARGE,
    MEASURE,
    END_OF_LIST
};

std::ostream& operator<<(std::ostream& os, SingleAction const& a)
{
    if(a== SingleAction::RELAX) os << "RELAX";
    else if (a == SingleAction::RECHARGE) os << "RECHARGE";
    else if (a == SingleAction::MEASURE) os << "MEASURE";

    return os;
}


enum struct TimeOfDay
{
    H0 = 0,
    H6,
    H12,
    H18,
    END_OF_LIST
};


//here set office times
std::array<TimeOfDay,3> OfficeTime = {TimeOfDay::H6, TimeOfDay::H12, TimeOfDay::H18};

bool isOfficeTime(TimeOfDay t)
{
    return std::find(std::begin(OfficeTime), std::end(OfficeTime),t) != std::end(OfficeTime);
}

//screen output
std::ostream& operator<<(std::ostream& os, TimeOfDay const& s)
{

    if(s==TimeOfDay::H0) os << "0:00 h";
    else if (s == TimeOfDay::H6) os << "6:00 h";
    else if (s == TimeOfDay::H12) os << "12:00 h";
    else if (s == TimeOfDay::H18) os << "18:00 h";
    return os;
}


struct Action
{
    SingleAction& operator[](int i) { return A[i]; }
    SingleAction const& operator[](int i) const { return A[i]; }

    std::array<SingleAction, Nbattery> A{};

    bool increase()
    {
        for (int i = Nbattery - 1; i >= 0; --i)
        {
            if (!isLast(A[i]))
            {
                ++A[i];
                return true;
            }
            else
            {
                A[i] = static_cast<SingleAction>(0);
            }       
        }
        return false;
    }
};


//screen output
std::ostream& operator<<(std::ostream& os, Action const& A)
{
    os << "(";
    for (int i = 0; i < Nbattery-1 ; ++i)
    {
        os << A[i] << ",";
    }
    os << A[Nbattery-1] << ")";

    return os;
}



struct State
{
    std::array<bool, Ntest> Test{};
    TimeOfDay T = TimeOfDay::H0;
    std::array<SingleBatteryState, Nbattery> B{};

    State()
    {
        for (int i = 0; i < Ntest; ++i)
        {
            Test[i] = true;
        }
    }

    bool isSingleActionAllowed(SingleAction const& a) const
    {
        if ( !isOfficeTime(T) &&  a != SingleAction::RELAX)
        {
            return false;
        }

        return true;
    }

    bool isActionAllowed(Action const& A) const
    {
        //check whether single action is allowed
        for (int i = 0; i < Nbattery; ++i)
        {
            if (!isSingleActionAllowed(A[i]))
                return false;
        }

        //check whether enough Rechargers and Dischargers are available
        int re = 0;
        int dis = 0;

        for (int i = 0; i < Nbattery; ++i)
        {
            //increase re counter
            if (A[i] == SingleAction::RECHARGE)
            {
                ++re;
            }
            //increase dis counter
            else if (A[i] == SingleAction::MEASURE)
            {
                ++dis;
            }

            //check whether ressources are exceeded
            if (re>Nrecharger || dis > Ndischarger)
            {
                return false;
            }
        }

        return true;
    }

    //loop over state
    bool increase()
    {
        //increase time
        if (!isLast(T))
        {
            ++T;
            return true;
        }
        else
        {
            T = static_cast<TimeOfDay>(0);
        }

        //if not possible, try to increase single battery states
        for (int i = Nbattery-1; i >= 0; --i)
        {
            if (B[i].increase())
            {
                return true;
            }
        }

        //if also not possible, try to increase Test state
        for (int i = Ntest-1; i >=0; --i)
        {
            if (Test[i])
            {
                Test[i] = false;
                return true;
            }
            else
            {
                Test[i] = true;
            }
        }

        return false;
    }


    // given a single action and a single-battery state, calculate the new single-battery state.
    // it is assumed the action is allowed
    SingleBatteryState newState(SingleBatteryState s, SingleAction const& a) const
    {
        if (a == SingleAction::RELAX)
        {
            ++s.Relax;
        }
        else if (a == SingleAction::RECHARGE)
        {
            ++s.SoC;
            s.Relax = static_cast<StateOfRelax>(0);
        }
        else if (a == SingleAction::MEASURE)
        {
            --s.SoC;
            s.Relax = static_cast<StateOfRelax>(0);
        }
        return s;
    }

    // calculate new complete state while assuming the action s allowed
    State newState(Action const& a) const
    {
        State ret = *this;

        //increase time
        if (isLast(ret.T))
        {
            ret.T = static_cast<TimeOfDay>(0);
        }
        else
        {
            ret.T++;
        }

        //apply single-battery action
        for (int i = 0; i < Nbattery; ++i)
        {
            ret.B[i] = newState(B[i], a[i]);

            // if measurement is performed, set new Test state.
            // measurement is only possible if Soc=50% or 100%
            // and Relax= 6h or 24h
            if (a[i] == SingleAction::MEASURE
                && getTestState(B[i]) != -1)
            {
                ret.Test[getTestState(B[i])] = true;
            }
        }
        return ret;
    }

    int cost(Action const& a) const
    {
        if (std::find(std::begin(Test), std::end(Test), false) == std::end(Test))
        {
            return 0;
        }
        return 1;
    }

};

//screen output
std::ostream& operator<<(std::ostream& os, State const& S)
{
    os << "{(";
    for (int i = 0; i < Ntest-1; ++i)
    {
        os << S.Test[i]<<",";
    }
    os << S.Test[Ntest-1] << "),"<<S.T<<",";

    for (int i = 0; i < Nbattery; ++i)
    {
        os << "(" << S.B[i].SoC << "," << S.B[i].Relax<<")";
    }
    os << "}";

    return os;
}

bool operator<(const State& s1, const State& s2)
{
    return std::tie(s1.Test, s1.T, s1.B) < std::tie(s2.Test, s2.T, s2.B);
}




struct BatteryCharger
{
    bool valueIteration()
    {
        // loop over all states with one specified Test state
        State S;

        int maxDiff=0;
        do
        {
            int prevValue = V.find(S)->second;
            int minCost = prevValue;

            // loop over all actions
            // and determine the one with minimal cost
            Action A;
            do
            {
                if (!S.isActionAllowed(A))
                {
                    continue;
                }

                auto Snew = S.newState(A);
                int newCost = S.cost(A) + V.find(Snew)->second;

                if (newCost < minCost)
                {
                    minCost = newCost;
                }
            }
            while (A.increase());

            V[S] = minCost;

            maxDiff = std::max(maxDiff, std::abs(prevValue - minCost));
        }
        while (S.increase());

        //return true if no changes occur
        return maxDiff!=0;
    }


    void showResults()
    {
        State S;
        do
        {
            auto Aopt = getOptimalAction(S);
            auto Snew = S.newState(Aopt);
            std::cout << S << "   " << Aopt << "   " << Snew << "   " << V[S] << "   " << V.find(Snew)->second << std::endl;
        }
        while (S.increase());
    }

    Action getOptimalAction(State const& S) const
    {
        Action Aopt;

        Action A;
        int minCost = std::numeric_limits<int>::max();
        do
        {
            if (!S.isActionAllowed(A))
            {
                continue;
            }

            auto Snew = S.newState(A);
            int newCost = S.cost(A) + V.find(Snew)->second;

            if (newCost < minCost)
            {
                minCost = newCost;
                Aopt = A;
            }
        }
        while (A.increase());

        return Aopt;
    }

    BatteryCharger()
    {
        State S;
        do
        {
            int ad = 0;
            for (int i = 0; i < Ntest; ++i)
            {
                if (!S.Test[i])
                    ad += 100;
            }
            V[S] = ad;
        }
        while (S.increase());
    }

    std::map<State, int> V;
};





int main(int argc, char* argv[])
{
    BatteryCharger BC;

    int count = 0;
    while (BC.valueIteration())
    {
        ++count;
    };
    std::cout << "Value Iteration converged after " << count << " iterations\n"<<std::endl;

    //start at 6:00 with no tests at all performed
    State S;
    S.Test[0] = false;
    S.Test[1] = false;
    S.Test[2] = false;
    S.Test[3] = false;
    S.T = TimeOfDay::H6;

    //get sequence of optimal actions
    auto Aopt = BC.getOptimalAction(S);
    while (BC.V[S] != 0)
    {
        std::cout << S << "    " << Aopt << "   " << BC.V[S] << std::endl;

        S = S.newState(Aopt);
        Aopt = BC.getOptimalAction(S);
    }
    std::cout << S << "    " << Aopt << "   " << BC.V[S] << std::endl;

    return 0;
}

Here is a DEMO to play around.

Results

With the above code, one obtains the following results printed on the screen:

Value Iteration converged after 8 iterations

{(0,0,0,0),6:00 h,(P0,H0)(P0,H0)}    (RELAX,RECHARGE)   10
{(0,0,0,0),12:00 h,(P0,H6)(P50,H0)}    (RECHARGE,RECHARGE)   9
{(0,0,0,0),18:00 h,(P50,H0)(P100,H0)}    (RECHARGE,RELAX)   8
{(0,0,0,0),0:00 h,(P100,H0)(P100,H6)}    (RELAX,RELAX)   7
{(0,0,0,0),6:00 h,(P100,H6)(P100,H12)}    (MEASURE,RELAX)   6
{(0,0,1,0),12:00 h,(P50,H0)(P100,H18)}    (RELAX,RELAX)   5
{(0,0,1,0),18:00 h,(P50,H6)(P100,H24)}    (RELAX,MEASURE)   4
{(0,0,1,1),0:00 h,(P50,H12)(P50,H0)}    (RELAX,RELAX)   3
{(0,0,1,1),6:00 h,(P50,H18)(P50,H6)}    (RELAX,MEASURE)   2
{(1,0,1,1),12:00 h,(P50,H24)(P0,H0)}    (MEASURE,RELAX)   1
{(1,1,1,1),18:00 h,(P0,H0)(P0,H6)}    (RELAX,RELAX)   0

One sees that the value iteration converges after 8 iterations. On my PC and with Visual Studio in Release mode this takes roughly half a second. So you can safely make the problem more detailed and still expect a feasible exact answer.

In the lines below, you see an example of an optimized test process. Here, the initial state is {(0,0,0,0),6:00 h,(P0,H0)(P0,H0)}, i.e. the complete test is started at 6am with unused batteries (P0 here means 0%, 6:00 h means 6am or the german "6:00 Uhr"). The next column the gives you the optimized action {RELAX,RECHARGE} (which leads to the state in the next line). The number in the third column is the time (in units of 6 hours) which is still required to finish the tests. For this example case, a total of 60 hours are required.

There you are, the model problem is solved. In order to check different initial states (they have all been optimized!), just change the following lines in main:

//start at 6:00 with no tests at all performed
State S;
S.Test = {false,false,false,false};
S.T = TimeOfDay::H6;

And now, have fun! In the best case, you let us know how you proceeded and how successful it were.

Tantamount answered 6/12, 2014 at 23:47 Comment(5)

I reviewed the results, and to be frank they don't look good. I would have to spend the first 24 hours just relaxing/recharging the battery and not getting measurements done. I have my doubts whether the bellman equation is suitable for this problem, because it assumes that the action with the shortest duration is most valuable. Can you proof that there won't be a situation where choosing a slower action will be more beneficial given the constraints (i.e. office time)? I think that the value of an action can only be determined after the entire timeline has been calculated. – Pokorny 7/12, 2014 at 16:49

@Flip: provided I made everything correct in the implementation (--which I can't exclude although I did it carefully) these results are in fact the optimal ones. The Bellman works with the general idea of "costs" which are to be minimized. These costs can be everything (e.g. a combination of time and money), but here I assumed it such that the total duration becomes minimized. And this is exactly why there are no measurements in the first day: "slower" actions like RECHARGE and RELAX are favorized over quick measurements. By this, the measurements can be done in turn from 100% to 0%. – Tantamount 7/12, 2014 at 18:16

@Flip: one more: "I think that the value of an action can only be determined after the entire timeline has been calculated." That is right, and that is also exactly what is done here. The way to do so is dynamic programming: optimize the whole problem in small steps. It is, so to say, identical to a shortest path problem to from the (0,0,0,0) to the (1,1,1,1)-town where each connection takes 6 hours. Finally, regarding the example results: to me they look reasonable. It is however just a simple model. Using more realistic rules, things might be different and maybe easier to believe. – Tantamount 7/12, 2014 at 18:25

I took a second look at the results and indeed they do checkout. The code you added is nice C++, but since it leaves out many things, it's too much simplified and i can't tell if it will be able to deal with the nitty gritty: weekends, longer/shorter actions (you have fixed 6 hour actions), how to deal with consecutive relax actions. I'm still enthousiastic about your first answer and i'm working on my own implementation. It's going to tough to finish it quickly, but only then it can be proven that the bellman method works for this problem. – Pokorny 8/12, 2014 at 11:57

@Flip: you're right the code implements only a toy model. Nevertheless, the Bellman equation is surely an appropriate way to go and it will work also for much more complicated stuff. Some options: Weekend: add a day counter. Longer/shorter actions: set the minimal duration equal to the the shortest action. For longer actions, add a "blocker" variable which states when the battery can be reused. Etc. Obviously, by doing so there comes a point when it's inevitable to use approximations. I'd be interested how far my simple code can be extended... – Tantamount 8/12, 2014 at 14:5

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++