As this is StackOverflow, and as you are going to spend fkn 500 of your hardly earned points, and as I am greatly interested in this stuff myself, I wrote you some example code which implements the recipe I presented a few days ago. Although it is a toy model which is solved there, it contains most of the issues of your original problem, and -- given enough computer power -- might be readily extended to solve the latter.
Model problem
Let us consider the following model problem:
N=2
batteries, M=2
Rechargers, K=1
Discharger.
Time T
proceeds in multiples of 6 hours: 0:00
, 6:00
, 12:00
, 18:00
. Office hours are all except 0:00
in the night.
Battery states are:
SoC
in {0%,50%,100%}
Relax
in {0h,6h,12h,18h,24h}
Batteries are initially in the state {0%,0h}
, i.e. uncharged at the start of the test conduction (Relax is unimportant as one anyway needs to recharge first).
Available actions are:
RELAX
: do nothing. Increases Relax
by 6 hours.
RECHARGE
: increases SoC
to the next state and sets Relax
to 0h
.
MEASURE
: decreases SoC
to the previous state and also sets Relax
to 0h
.
Each action "costs" 6 hours.
Note that MEASURE behaves almost identically to what you called DISCHARGE. Particularly, if MEASURE is applied to a battery state which is not a test case, it is just a DISCHARGE. However, and that is the only difference, when MEASURE is applied to one of the test cases also the Test
state is updated.
Batteries need to be measured once for each state {SoC,Relax}
in {{50%,6h},{100%,6h},{50%,24h},{100%,24h}}
. That makes a Test
vector of dimension four containing either zeros or ones.
Once you got familiar with the implementation below, you can easily change all these choices.
Model description
I'm going to use the system description as presented in my previous answer. In particular, the state of the system is given by a multi-vector
{Test, T, {SoC_1,Relax_1}, {SoC_2,Relax_2}}
This is in fact not the best state variable (--a state variable should always be as compact as possible). For example, it is not accounted for the symmetry of the batteries. In "reality-hard" models, problems as this should be avoided whenever possible.
Further, I simplified things by assuming all happens in time-steps of 6 hours. By this one can simply increase the T
variable by 6 hours after each action. If there were actions with a duration of, say, 12 hours, one would need to add further variables to the state variable which indicate whether the battery is blocked and when it will be available again. Not to speak if there would be an action which lasted 5 hours, for instance. So you see, the time is treated rather basic here.
Implementation
I've used C++ for the implementation and I hope you're comfortable with that. The code itself is already rather lengthy, but I tried to comment it at least a bit. Here are the main points implementation:
First, regarding the basic elements: the more elementary ones are given by enum
s such as
enum struct StateOfCharge
{
P0=0,
P50,
P100,
END_OF_LIST
};
Integers would have also worked well, but I guess it becomes easier to read using enums. For these enums, I've also provided operator<<
for screen output as well as operator++
and operator--
for increasing/decreasing the state. The two latter operators are function templates accepting any enum (--at least if it contains a state END_OF_LIST
).
The more advanced elements such as states and actions are classes. Particularly the State
class contains a lot of logic: it is able to tell whether a given action is allowed via its member
bool isActionAllowed(Action const&) const
Further it can give you the next state for a given action via
State nextState(Action const&) const
These functions are central in the value iteration descibed next.
There is the class BatteryCharger
, which performs the actual dynamic programming algorithm. It contains a container std::map<State,int>
to store the value of a given state (remember that the value here is simply the time required to finish, which naturally shall be minimized). In order for the map
to work, there is also an operator<
comparing two State
variables.
Finally, a few words on the Value Iteration scheme. In the answer above, I wrote one can do it by backward-iteration. This is true, but it can become rather complicated because you need to find a way back such that all possible states are covered (and at best only once). Although this would be possible here, one can also simply iterate over all states in an arbitrary way. This needs to be done iteratively, however, because otherwise you draw on state-values which you haven't optimized yet. The iteration here ends when all values are converged.
For more on value iteration, see again the book of Sutton and Barto.
Code
Here is the code. It's not particularly well written (--global variables etc.), but it works and can be easily extended:
#include<iostream>
#include<array>
#include<map>
#include<type_traits>
#include<algorithm>
template<typename T, typename = typename std::enable_if<std::is_enum<T>::value >::type>
bool isLast(T const& t)
{
return t == static_cast<T>(static_cast<int>(T::END_OF_LIST) - 1);
}
template<typename T, typename = typename std::enable_if<std::is_enum<T>::value >::type>
bool isFirst(T const& t)
{
return t == static_cast<T>(0);
}
template<typename T, typename = typename std::enable_if<std::is_enum<T>::value >::type>
T& operator++(T& t)
{
if (static_cast<int>(t) < static_cast<int>(T::END_OF_LIST) - 1)
{
t = static_cast<T>(static_cast<int>(t)+1);
}
return t;
}
template<typename T, typename = typename std::enable_if<std::is_enum<T>::value >::type>
T operator++(T& t, int)
{
auto ret = t;
++t;
return ret;
}
template<typename T, typename = typename std::enable_if<std::is_enum<T>::value >::type>
T& operator--(T& t)
{
if (static_cast<int>(t) > 0)
{
t = static_cast<T>(static_cast<int>(t)-1);
}
return t;
}
template<typename T, typename = typename std::enable_if<std::is_enum<T>::value >::type>
T operator--(T& t, int)
{
auto ret = t;
--t;
return ret;
}
const int Nbattery = 2;
const int Nrecharger = 2;
const int Ndischarger = 1;
const int Ntest = 4;
enum struct StateOfCharge
{
P0=0,
P50,
P100,
END_OF_LIST
};
//screen output
std::ostream& operator<<(std::ostream& os, StateOfCharge const& s)
{
if(s==StateOfCharge::P0) os << "P0";
else if (s==StateOfCharge::P50) os << "P50";
else if (s == StateOfCharge::P100) os << "P100";
return os;
}
enum struct StateOfRelax
{
H0=0,
H6,
H12,
H18,
H24,
END_OF_LIST
};
//screen output
std::ostream& operator<<(std::ostream& os, StateOfRelax const& s)
{
if(s==StateOfRelax::H0) os << "H0";
else if (s == StateOfRelax::H6) os << "H6";
else if (s == StateOfRelax::H12) os << "H12";
else if (s == StateOfRelax::H18) os << "H18";
else if (s == StateOfRelax::H24) os << "H24";
return os;
}
struct SingleBatteryState
{
//initialize battery as unfilled
StateOfCharge SoC;
StateOfRelax Relax;
SingleBatteryState(StateOfCharge _SoC = static_cast<StateOfCharge>(0), StateOfRelax _Relax = static_cast<StateOfRelax>(0))
: SoC(_SoC)
, Relax(_Relax)
{}
//loop over state
bool increase()
{
//try to increase Relax
if (!isLast(Relax))
{
++Relax;
return true;
}
//if not possible, reset Relax
else if (!isLast(SoC))
{
++SoC;
Relax = static_cast<StateOfRelax>(0);
return true;
}
//no increment possible: reset and return false
SoC = static_cast<StateOfCharge>(0);
Relax = static_cast<StateOfRelax>(0);
return false;
}
};
std::ostream& operator<<(std::ostream& os, SingleBatteryState const& s)
{
os << "("<<s.SoC << "," << s.Relax << ")";
return os;
}
bool operator<(SingleBatteryState const& s1, SingleBatteryState const& s2)
{
return std::tie(s1.SoC, s1.Relax) < std::tie(s2.SoC, s2.Relax);
}
bool operator==(SingleBatteryState const& s1, SingleBatteryState const& s2)
{
return std::tie(s1.SoC, s1.Relax) == std::tie(s2.SoC, s2.Relax);
}
//here specify which cases you want to have tested
std::array<SingleBatteryState, Ntest> TestCases = { SingleBatteryState{ StateOfCharge::P50, StateOfRelax::H6 }
, SingleBatteryState{ StateOfCharge::P50, StateOfRelax::H24 }
, SingleBatteryState{ StateOfCharge::P100, StateOfRelax::H6 }
, SingleBatteryState{ StateOfCharge::P100, StateOfRelax::H24 } };
// for a state s (and action MEASURE), return the entry in the array Test
// which has to be set to true
int getTestState(SingleBatteryState const& s)
{
auto it = std::find(std::begin(TestCases), std::end(TestCases), s);
if(it==std::end(TestCases))
return -1;
else
return it - std::begin(TestCases);
}
enum struct SingleAction
{
RELAX = 0,
RECHARGE,
MEASURE,
END_OF_LIST
};
std::ostream& operator<<(std::ostream& os, SingleAction const& a)
{
if(a== SingleAction::RELAX) os << "RELAX";
else if (a == SingleAction::RECHARGE) os << "RECHARGE";
else if (a == SingleAction::MEASURE) os << "MEASURE";
return os;
}
enum struct TimeOfDay
{
H0 = 0,
H6,
H12,
H18,
END_OF_LIST
};
//here set office times
std::array<TimeOfDay,3> OfficeTime = {TimeOfDay::H6, TimeOfDay::H12, TimeOfDay::H18};
bool isOfficeTime(TimeOfDay t)
{
return std::find(std::begin(OfficeTime), std::end(OfficeTime),t) != std::end(OfficeTime);
}
//screen output
std::ostream& operator<<(std::ostream& os, TimeOfDay const& s)
{
if(s==TimeOfDay::H0) os << "0:00 h";
else if (s == TimeOfDay::H6) os << "6:00 h";
else if (s == TimeOfDay::H12) os << "12:00 h";
else if (s == TimeOfDay::H18) os << "18:00 h";
return os;
}
struct Action
{
SingleAction& operator[](int i) { return A[i]; }
SingleAction const& operator[](int i) const { return A[i]; }
std::array<SingleAction, Nbattery> A{};
bool increase()
{
for (int i = Nbattery - 1; i >= 0; --i)
{
if (!isLast(A[i]))
{
++A[i];
return true;
}
else
{
A[i] = static_cast<SingleAction>(0);
}
}
return false;
}
};
//screen output
std::ostream& operator<<(std::ostream& os, Action const& A)
{
os << "(";
for (int i = 0; i < Nbattery-1 ; ++i)
{
os << A[i] << ",";
}
os << A[Nbattery-1] << ")";
return os;
}
struct State
{
std::array<bool, Ntest> Test{};
TimeOfDay T = TimeOfDay::H0;
std::array<SingleBatteryState, Nbattery> B{};
State()
{
for (int i = 0; i < Ntest; ++i)
{
Test[i] = true;
}
}
bool isSingleActionAllowed(SingleAction const& a) const
{
if ( !isOfficeTime(T) && a != SingleAction::RELAX)
{
return false;
}
return true;
}
bool isActionAllowed(Action const& A) const
{
//check whether single action is allowed
for (int i = 0; i < Nbattery; ++i)
{
if (!isSingleActionAllowed(A[i]))
return false;
}
//check whether enough Rechargers and Dischargers are available
int re = 0;
int dis = 0;
for (int i = 0; i < Nbattery; ++i)
{
//increase re counter
if (A[i] == SingleAction::RECHARGE)
{
++re;
}
//increase dis counter
else if (A[i] == SingleAction::MEASURE)
{
++dis;
}
//check whether ressources are exceeded
if (re>Nrecharger || dis > Ndischarger)
{
return false;
}
}
return true;
}
//loop over state
bool increase()
{
//increase time
if (!isLast(T))
{
++T;
return true;
}
else
{
T = static_cast<TimeOfDay>(0);
}
//if not possible, try to increase single battery states
for (int i = Nbattery-1; i >= 0; --i)
{
if (B[i].increase())
{
return true;
}
}
//if also not possible, try to increase Test state
for (int i = Ntest-1; i >=0; --i)
{
if (Test[i])
{
Test[i] = false;
return true;
}
else
{
Test[i] = true;
}
}
return false;
}
// given a single action and a single-battery state, calculate the new single-battery state.
// it is assumed the action is allowed
SingleBatteryState newState(SingleBatteryState s, SingleAction const& a) const
{
if (a == SingleAction::RELAX)
{
++s.Relax;
}
else if (a == SingleAction::RECHARGE)
{
++s.SoC;
s.Relax = static_cast<StateOfRelax>(0);
}
else if (a == SingleAction::MEASURE)
{
--s.SoC;
s.Relax = static_cast<StateOfRelax>(0);
}
return s;
}
// calculate new complete state while assuming the action s allowed
State newState(Action const& a) const
{
State ret = *this;
//increase time
if (isLast(ret.T))
{
ret.T = static_cast<TimeOfDay>(0);
}
else
{
ret.T++;
}
//apply single-battery action
for (int i = 0; i < Nbattery; ++i)
{
ret.B[i] = newState(B[i], a[i]);
// if measurement is performed, set new Test state.
// measurement is only possible if Soc=50% or 100%
// and Relax= 6h or 24h
if (a[i] == SingleAction::MEASURE
&& getTestState(B[i]) != -1)
{
ret.Test[getTestState(B[i])] = true;
}
}
return ret;
}
int cost(Action const& a) const
{
if (std::find(std::begin(Test), std::end(Test), false) == std::end(Test))
{
return 0;
}
return 1;
}
};
//screen output
std::ostream& operator<<(std::ostream& os, State const& S)
{
os << "{(";
for (int i = 0; i < Ntest-1; ++i)
{
os << S.Test[i]<<",";
}
os << S.Test[Ntest-1] << "),"<<S.T<<",";
for (int i = 0; i < Nbattery; ++i)
{
os << "(" << S.B[i].SoC << "," << S.B[i].Relax<<")";
}
os << "}";
return os;
}
bool operator<(const State& s1, const State& s2)
{
return std::tie(s1.Test, s1.T, s1.B) < std::tie(s2.Test, s2.T, s2.B);
}
struct BatteryCharger
{
bool valueIteration()
{
// loop over all states with one specified Test state
State S;
int maxDiff=0;
do
{
int prevValue = V.find(S)->second;
int minCost = prevValue;
// loop over all actions
// and determine the one with minimal cost
Action A;
do
{
if (!S.isActionAllowed(A))
{
continue;
}
auto Snew = S.newState(A);
int newCost = S.cost(A) + V.find(Snew)->second;
if (newCost < minCost)
{
minCost = newCost;
}
}
while (A.increase());
V[S] = minCost;
maxDiff = std::max(maxDiff, std::abs(prevValue - minCost));
}
while (S.increase());
//return true if no changes occur
return maxDiff!=0;
}
void showResults()
{
State S;
do
{
auto Aopt = getOptimalAction(S);
auto Snew = S.newState(Aopt);
std::cout << S << " " << Aopt << " " << Snew << " " << V[S] << " " << V.find(Snew)->second << std::endl;
}
while (S.increase());
}
Action getOptimalAction(State const& S) const
{
Action Aopt;
Action A;
int minCost = std::numeric_limits<int>::max();
do
{
if (!S.isActionAllowed(A))
{
continue;
}
auto Snew = S.newState(A);
int newCost = S.cost(A) + V.find(Snew)->second;
if (newCost < minCost)
{
minCost = newCost;
Aopt = A;
}
}
while (A.increase());
return Aopt;
}
BatteryCharger()
{
State S;
do
{
int ad = 0;
for (int i = 0; i < Ntest; ++i)
{
if (!S.Test[i])
ad += 100;
}
V[S] = ad;
}
while (S.increase());
}
std::map<State, int> V;
};
int main(int argc, char* argv[])
{
BatteryCharger BC;
int count = 0;
while (BC.valueIteration())
{
++count;
};
std::cout << "Value Iteration converged after " << count << " iterations\n"<<std::endl;
//start at 6:00 with no tests at all performed
State S;
S.Test[0] = false;
S.Test[1] = false;
S.Test[2] = false;
S.Test[3] = false;
S.T = TimeOfDay::H6;
//get sequence of optimal actions
auto Aopt = BC.getOptimalAction(S);
while (BC.V[S] != 0)
{
std::cout << S << " " << Aopt << " " << BC.V[S] << std::endl;
S = S.newState(Aopt);
Aopt = BC.getOptimalAction(S);
}
std::cout << S << " " << Aopt << " " << BC.V[S] << std::endl;
return 0;
}
Here is a DEMO to play around.
Results
With the above code, one obtains the following results printed on the screen:
Value Iteration converged after 8 iterations
{(0,0,0,0),6:00 h,(P0,H0)(P0,H0)} (RELAX,RECHARGE) 10
{(0,0,0,0),12:00 h,(P0,H6)(P50,H0)} (RECHARGE,RECHARGE) 9
{(0,0,0,0),18:00 h,(P50,H0)(P100,H0)} (RECHARGE,RELAX) 8
{(0,0,0,0),0:00 h,(P100,H0)(P100,H6)} (RELAX,RELAX) 7
{(0,0,0,0),6:00 h,(P100,H6)(P100,H12)} (MEASURE,RELAX) 6
{(0,0,1,0),12:00 h,(P50,H0)(P100,H18)} (RELAX,RELAX) 5
{(0,0,1,0),18:00 h,(P50,H6)(P100,H24)} (RELAX,MEASURE) 4
{(0,0,1,1),0:00 h,(P50,H12)(P50,H0)} (RELAX,RELAX) 3
{(0,0,1,1),6:00 h,(P50,H18)(P50,H6)} (RELAX,MEASURE) 2
{(1,0,1,1),12:00 h,(P50,H24)(P0,H0)} (MEASURE,RELAX) 1
{(1,1,1,1),18:00 h,(P0,H0)(P0,H6)} (RELAX,RELAX) 0
One sees that the value iteration converges after 8 iterations. On my PC and with Visual Studio in Release mode this takes roughly half a second. So you can safely make the problem more detailed and still expect a feasible exact answer.
In the lines below, you see an example of an optimized test process. Here, the initial state is {(0,0,0,0),6:00 h,(P0,H0)(P0,H0)}
, i.e. the complete test is started at 6am with unused batteries (P0
here means 0%
, 6:00 h
means 6am or the german "6:00 Uhr"). The next column the gives you the optimized action {RELAX,RECHARGE}
(which leads to the state in the next line). The number in the third column is the time (in units of 6 hours) which is still required to finish the tests. For this example case, a total of 60 hours are required.
There you are, the model problem is solved. In order to check different initial states (they have all been optimized!), just change the following lines in main
:
//start at 6:00 with no tests at all performed
State S;
S.Test = {false,false,false,false};
S.T = TimeOfDay::H6;
And now, have fun! In the best case, you let us know how you proceeded and how successful it were.