How many threads are okay to use for tic-tac-toe using minimax?
Asked Answered
H

6

7

Let's take a 5x5 tic-tac-toe as an example. Let's say it's my AI's turn. Then,

  • I make 25 moves (at each cell basically, of course, if it's a legal move),
  • create a thread for each move (25 threads total (at most)),
  • call a minimax function on each made move,
  • then when all results come from each thread,
  • compare the scores and choose the move with the best score.

Here are my questions:

  • Is it efficient to use 25 threads? What does using 25 threads mean?

  • Is it 25 times faster (most likely not)? What it depends on? On the computer, of course, but how can I know how many threads are okay to use based on the computer's resources?

  • What happens if I use too many threads (nothing I guess...)?

Is my idea good? Thanks.

Heath answered 24/11, 2013 at 0:10 Comment(4)
But as soon as any one of your 25 threads attempts to "call a minimax function" it is going to need to consider about 24 possible responses. Are you going to fork a new thread for each of these? If you do, each of those new threads will need to consider about 23 AI second moves, and each of those 23 will fork 22 more, and each of those will fork 21, et cet. You will eventually attempt to create 25! = 15,511,210,043,330,985,984,000,000 threads. But don't be discouraged: if you can just convince Intel to produce a 16,000,000,000,000,000,000,000,000 core Celeron, your AI will be invincible.Headless
Just for a first move.Heath
Also notice than in the initial stage of the game, a lot of moves might be symmetric. Having a good algorithm that detects this (along with prunning, killer-moves, etc) will give you more "power" than just spawning threads.Vermin
There are only six distinct first moves; so why do you want to check 25?Tecu
G
12

For a typical compute-bound application, a good rule of thumb is to use as many threads as you have hardware cores (or hyperthreads). Using more threads than cores won't make your application go faster. Instead, it will cause your application to use more memory than is necessary. Each thread typically has a 0.5 to 1Mbyte stack ... depending on your hardware and the Java version. If you create too many threads, the extra memory usage will result in a significant performance hit; i.e. more threads => slower program!

Another thing to consider is that Java threads are expensive to create on a typical JVM. So unless a thread does enough work (in its lifetime) there is a risk that you spend more time creating threads than you gain by using multiple cores in the computation.

Finally, you may find that the work does not spread evenly over all threads, depending on your minmax algorithm ... and the game state.


If I was trying to implement this, I'd start by implementing it as a single threaded application, and then:

  • benchmark it to figure out how long it takes to calculate a more when run serially,
  • profile it to get rid of any bottlenecks
  • re-benchmark to see if it is fast enough already.

If and only if it needs to go faster, I would then examine the code and (if necessary) add some monitoring to see how to break the computation down into large enough chunks to be executed in parallel.

Finally, I'd use those results to design and implement a multi-threaded version.

I'd also look at alternatives ... like using Java 7 fork/join instead of threads.


To answer your direct questions:

Is it efficient to use 25 threads?

Probably not. It would only be efficient if you had that many cores (unlikely!). And even then you are only going to get a good speedup from using lots of threads if you gain more by running things in parallel than you lose due to thread-related overheads. (In other words, it depends how effectively you use those threads.)

What does using 25 threads mean?

I assume that you mean that you have created and started 25 Threads, either explicitly or using some existing thread pool implementation.

But the bottom line is that if you have (say) 4 cores, then at most 4 of those 25 threads can be executing at one time. The other threads will be waiting ...

Is it 25 times faster (most likely not)? What it depends on? On the computer, of course, but how can I know how many threads are okay to use based on the computer's resources?

The primary factor that limits performance is the number of cores. See above.

What happens if I use too many threads (nothing I guess...)?

Too many threads means you use more memory and that makes your application run slower because of memory bandwidth competition, competition for physical memory pages, extra garbage collection. These factors are application and platform dependent, and hard to quantify; i.e. predict or measure.

Depending on the nature of your application (i.e. precisely how you implement the algorithms) too many threads could result in extra lock contention and thread context switching. That will also make your application slower.

It is a impossible to predict what would happen without seeing your actual code. But the number of cores gives you a theoretical upper bound on how much speedup is possible. If you have 4 cores, then you cannot get more than a 4-fold speedup with multi-threading.

Gavial answered 24/11, 2013 at 0:52 Comment(1)
definitly a case for fork/join, since it is about parallelism, not concurrencyMontenegro
H
3

So, the threading answers given are ok, but it seemed to me they overlooked the alpha-beta pruning feature of minimax search.

If you launch a thread for each "next move" from your current position, then having those thread talk to each other is slow and painful to write correctly. But, if they cant talk to each other, then you don't get the depth boosting that comes from alpha-beta pruning, until one level further down.

This will act against the efficiency of the result.

For general cases of improve computation time, the best case tends to be 1 thread per core, with either a simple assignment of tasks to thread if they are all similar time (eg matrix multiplication), or having a "set" of tasks, with each thread grabbing the next un-started one whenever it finishes its current task. (this has some locking tasks, but if they are small compared to resolution cost it is very effective).

So, for a 4 core system, and ~25 natural tasks you can hope for a speed up in the range of 3.5-4x. (you would do 4 in parallel ~5 times, then finish messily). But, in the minimax case you have lost the alpha-beta pruning aspect, which I understand is estimated to reduce "effective breadth" from N to about sqrt(N). For ~25 case, that means a effective branching factor of 5. this means using 4 cores and skipping pruning for the first level might actually hurt you.

So, where does the leave us?

  1. Give up on going mutli-thread. or,
  2. thread based on available cores. Is up to 4 times faster, while also being up to sqrt(25)==5 times slower. or,
  3. go multi-thread, but propagate the betas across your threads. This will like require some locking code, but hopefully that wont be too costly. You will reduce the effectiveness of the alpha-beta pruning, since you will be searching sub-trees you wouldn't search in a strict left->right pass, but any thread that happens to be searching redundant areas is still little worse than having a core doing nothing. So, superfluous searches should be more than offset by additional useful work done. (but this is much harder to code then a simple task<-> thread mapping). The real issue here may be the needing to be/find someone who really groks both alpha-beta pruning and multi-threading for speed. It doesn't strike me as code I would trust many people to write correctly. (eg I have in my time writtin many multi-threaded programs and several minimax searches but I don't know of the top of my head if you will need to propagate betas or alphas or both between threads for the search from the top node).
Hyoscyamine answered 30/11, 2013 at 11:52 Comment(6)
The multithreading is only used at the first move so it wont affect alpha-beta pruning that much. Multithreading is practically impossible to be used lower level because no of threads would be large.Rottweiler
I know its only used for the first move, but the first layer is really important. Giving up on even for just one level it will be like adding an extra half level to the total cost. Given a branching factor approaching 25, that's a lot of extra cost (my reading of the literature guesses that cost factor to be about 5x).Hyoscyamine
but how about multithreading for only 4 branches at a time & then updating alpha at that level . wouldnt that give some speed up ? Because your branching factor here can be said as 25/4 with sqrt(4)=2 penalty which is 2 times faster than without multithreading.Rottweiler
Yep, absolutely. That is similar to what i envision with option3 in my answer. The code is a little harder, and the speed up is not the 3-4x you might expect, but there should definitely be some speed up.Hyoscyamine
It might be helpful to explain what you mean in somewhat more complex terms. By my understanding, what you're saying is that a thread which assumes Player One's first move is to square 1 and a thread that assumes his first moves are to square 2 may both end up evaluating many of the cases where Player One's first two moves are to squares 1 and 2, even though because of pruning some such positions may only get evaluated by one thread or the other and thus neither thread can "assume" any position will get covered by the other.Fescue
"the alpha-beta pruning feature of minimax search". Alpha-beta prunning is not part of the basic minimax algorithm.Vermin
P
3

As all my friends said that use as many threads as your machine has capacity.

but by adding them to you should go with improving algorithm as well.

for example in 5x5 tic tac toe both will get 12 or 13 moves. so number of posible moves are as nCr(combination equation) base 25C12 = 5,200,300. so now you have decrease number of thread now going to best selection you have best way to find best solution are only 12 (wining position) & 12 to lose worst condition all other are draw condition. so now what you can do is simply check those 12 condition from threads & leave extra combination with calculation that you need to create 12! * 12 no of threads which is very low compare to 25!.

hence your number of thread is going to decrease you can further think on it to decrease your number of thread.

when as your moves goes more & more you can go with alpha-beta pruning so that you can improve your algorithm as well.

Publicist answered 2/12, 2013 at 5:47 Comment(0)
R
1

If you are using Threads then to prevent memory wastage just use them for first calls of mini-max and then combine the result of the thread to get the output. It is a wastage if you use 25 threads or something number so big because the available cores are way less than that so what you can do is schedule only no of threads equivalent to available cores at a time on different states and combine all the results at the end.

Here is pseudo code:-

int miniMax(State,Player,depth) {


// normal minimax code


}


State ParaMiniMax(State,Player) {

   int totalThreads = Runtime.getRuntime().availableProcessors());

   NextStates = getNextStates(State);

   while(NextStates.size()>0) {

     k = totalThreads;


     while(k>0 && NextStates.size>0) {

         //Schedule thread with nextState. with run calling miniMax with other player
         //Store (score,state) in Result List  
        k--;
        NextStates.removeTop();
     }

     wait(); // waits for threads to complete



   }

   if(player==max) {

     return(maxScore(Result).State);
   }

   else return(minScore(Result).State);


}
Rottweiler answered 27/11, 2013 at 4:57 Comment(0)
D
0

You should only use a number of threads equal to the number of cores the machine has. Scheduling tasks onto those threads is a different thing.

Damondamour answered 27/11, 2013 at 13:18 Comment(0)
T
0

Consider the symmetry of your problem. There are actually only a very limited number of "unique" initial moves - the rest are the same but for reflection or rotation (therefore of identical strategic value). The unique moves for a 5x5 board are:

xxx..
.xx..
..x..
.....
.....

Or just 6 initial moves. Bam - you just reduced the complexity by >4x with no threads.

As others said, more threads than you have cores usually doesn't help in speeding up unless individual threads spend time "waiting" - for inputs, memory access, other results. It might be that six threads would be a good place to start.

Just to convince you about the symmetry, I am marking equivalent positions with the same number- see if you agree

12321
24542
35653
24542
12321

This is the same when you rotate by any multiple of 90 degrees, or reflect about diagonal or left-right, up-down.

PS I realize this is not really answering the question you asked, but I believe it very directly addresses your underlying question - "how do I efficiently solve 5x5 tic-tac-toe exhaustively". As such I won't be upset if you select a different answer but I do hope you will take my advice to heart.

Tecu answered 3/12, 2013 at 0:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.