How is Monte Carlo Tree Search implemented in practice
Q

1

11

I understand, to a certain degree, how the algorithm works. What I don't fully understand is how the algorithm is actually implemented in practice.

I'm interested in understanding what optimal approaches would be for a fairly complex game (maybe chess). i.e. recursive approach? async? concurrent? parallel? distributed? data structures and/or database(s)?

-- What type of limits would we expect to see on a single machine? (could we run concurrently across many cores... gpu maybe?)

-- If each branch results in a completely new game being played, (this could reach the millions) how do we keep the overall system stable? & how can we reuse branches already played?

Quickel answered 21/3, 2018 at 2:31 Comment(1)
I understand that this may be too broad, but would appreciate any links/references before this would be flagged.Quickel
I
10

recursive approach? async? concurrent? parallel? distributed? data structures and/or database(s)

  • In MCTS, there's not much of a point in a recursive implementation (which is common in other tree search algorithms like the minimax-based ones), because you always go "through" a game in sequences from current game state (root node) till game states you choose to evaluate (terminal game states, unless you choose to go with a non-standard implementation using a depth limit on the play-out phase and a heuristic evaluation function). The much more obvious implementation using while loops is just fine.
  • If it's your first time implementing the algorithm, I'd recommend just going for a single-threaded implementation first. It is a relatively easy algorithm to parallelize though, there are multiple papers on that. You can simply run multiple simulations (where simulation = selection + expansion + playout + backpropagation) in parallel. You can try to make sure everything gets updated cleanly during backpropagation, but you can also simply decide to not use any locks / blocking etc. at all, there's already enough randomness in all the simulations anyway so if you lose information from a couple of simulations here and there due to naively-implemented parallelization it really doesn't hurt too much.
  • As for data structures, unlike algorithms like minimax, you actually do need to explicitly build a tree and store it in memory (it is built up gradually as the algorithm is running). So, you'll want a general tree data structure with Nodes that have a list of successor / child Nodes, and also a pointer back to the parent Node (required for backpropagation of simulation outcomes).

What type of limits would we expect to see on a single machine? (could we run concurrently across many cores... gpu maybe?)

Running across many cores can be done yes (see point about parallelization above). I don't see any part of the algorithm being particularly well-suited for GPU implementations (there are no large matrix multiplications or anything like that), so GPU is unlikely to be interesting.

If each branch results in a completely new game being played, (this could reach the millions) how do we keep the overall system stable? & how can we reuse branches already played?

In the most commonly-described implementation, the algorithm creates only one new node to store in memory per iteration/simulation in the expansion phase (the first node encountered after the Selection phase). All other game states generated in the play-out phase of the same simulation do not get any nodes to store in memory at all. This keeps memory usage in check, it means your tree only grows relatively slowly (at a rate of 1 node per simulation). It does mean you get slightly less re-usage of previously-simulated branches, because you don't store everything you see in memory. You can choose to implement a different strategy for the expansion phase (for example, create new nodes for all game states generated in the play-out phase). You'll have to carefully monitor memory usage if you do this though.

Isodimorphism answered 21/3, 2018 at 9:29 Comment(4)
Thank you for this answer! Very verbose. Last thing, am I right in assuming that the overall systems performs a new game at each node, which can then generate other new & complete games (though nothing in stored in memory) -- all in all, just to satisfy the optimal performance in the first gameQuickel
@rambossa Every node corresponds to just a single game state, not a complete game (this is, assuming you have a deterministic game, like chess). When adding nodes to the tree, you can choose to also store the corresponding game state in the node, or you can choose not to. If you store them, subsequent Selection phases going through the same nodes will be faster since they no longer need to (re-)compute those game states. However, earlier simulations may be slower, since you have to copy game states before applying moves (to avoid also modifying game state objects stored in parent nodes)Isodimorphism
You write: "In MCTS, there's not much of a point in a recursive implementation...The much more obvious implementation using while loops is just fine". But MCTS explanations I saw describe the algorithm in a recursive way. The while approach is not obvious to me. Because in MCTS we have to do the backprop, the only non-recurisve implementation that comes to my mind requires keeping a "stack" vector that stores nodes (or node indices) that will be updated. Is that the best way?Artois
@Artois Usually I have every node object also explicitly keeping a pointer/reference back to its parent node, such that I can follow the chain of nodes back up through that. Only the root node would have a null parent. I must say I never really thought too much about the performance implications of this. I usually work with domains where the domain itself (e.g., computing legal moves / applying moves to game states in game contexts) is the performance bottleneck anyway, such that these kinds of tiny low-level details inside the MCTS itself don't end up really mattering.Isodimorphism

© 2022 - 2024 — McMap. All rights reserved.