Tic Tac Toe and Minimax - Creating an imperfect AI on a microcontroller
Asked Answered
H

1

10

I have created a Tic-Tac-Toe game on a microcontroller, including a perfect AI (perfect meaning that it doesn't lose). I did not use a minimax algorithm for that, just a little state machine with all possible and optimal moves. My problem now is that I wanted to implement different difficulties (Easy, Medium and Hard). The AI so far would be the hard one. So I've thought about how to do this the best way and ended up wanting to use the minimax algorithm but in a way that it calculates all the scores for all game positions so that I can also sometimes pick the second best score instead of the best. Since I can't always do all of these calculations on the microcontroller itself, I wanted to create a little program that I can run on my computer which gives me arrays of all possible board states (with respect to symmetry, ect to minimize the storage used) and their according scores. For this I firstly tried to implement the minimax algorithm itself, regarding the depth in order to properly calculate the scores of each state. It was then supposed to give me back all of the optimal moves (for now) in an array. However, it does not seem to work that well. I have tried to debug it with some printf lines. Here is the code so far of both the minimax function as well as my main function:

    static int minimax(int *board, int depth)
{
    int score;
    int move = -1;
    int scores[9];
    int nextDepth;

    printf("\n----- Called Minimax, Depth: %i -----\n\n", depth);

    if(depth%2 ==1){
        player = -1;
    } else {
        player = 1;
    }

    printf("Player: %i\n---\n", player);

    if(isWin(board) != 0){
        score = (10-depth)*winningPlayer;

        printf("Player %i won on depth %i\n", winningPlayer, depth);
        printf("Resulting score: (10-%i)*%i = %i\nScore returned to depth %i\n---\n", depth, winningPlayer, score, depth-1);

        return score;
    }

    score = -20;
    nextDepth = depth+1;

    printf("Next depth is %i\n---\n", nextDepth);

    int i;
    for(i=0; i<9; i++){
        if(board[i] == 0) {

            if(nextDepth%2 ==0) {
                player = -1;
            } else {
                player = 1;
            }

            printf("Found vacant space at position %i\n", i);
            printf("Set value of position %i to %i\n---\n", i, player);

            board[i] = player;
            int thisScore = minimax(board, nextDepth);

            printf("Value of the move at position %i on next depth %i is %i\n---\n", i, nextDepth, thisScore);

            scores[i] = thisScore;
            if(thisScore > score){

                printf("New score value is greater than the old one: %i < %i\n---\n", thisScore, score);

                score = thisScore;
                move = i;
                g_moves[nextDepth-1] = move;

                printf("Score was set to %i\n", thisScore);
                printf("Remembered move %i\n---\n", move);

            }
            board[i] = 0;

            printf("Value of position %i was reset to 0 on next depth %i\n---\n", i, nextDepth);

        }
    }

    if(move == -1) {

        printf("Game ended in a draw.\n Returned score: 0\n---\n");

        return 0;
    }

    printf("Move at position %i was selected on next depth %i\n", move, nextDepth);
    printf("Returning score of %i to depth %i\n---\n", score, depth);


    return score;
}

The main is:

int main(int argc, char **argv)
{   
    memcpy(board, initBoard, sizeof(board));
    int score = 0;
    int depth = getDepth(board);
    score = minimax(board, depth);
    printf("\n--- finished ---\n\n");


    printf("Moves with the highest score: ");
    int i;
    for(i=0; i<9; i++){
        printf("%i | ", g_moves[i]);
    }
    printf("\n");

    printf("The score is %i\n", score);

    printf("The best next board is: \n|----|----|----|\n");

    for(i=0; i<3; i++){
        printf("| %-2i ", board[i]);
    }
    printf("|\n|----|----|----|\n");
    for(i=3; i<6; i++){
        printf("| %-2i ", board[i]);
    }
    printf("|\n|----|----|----|\n");
    for(i=6; i<9; i++){
        printf("| %-2i ", board[i]);
    }
    printf("|\n|----|----|----|\n");

    return 0;
}

Furthermore, i have some variables:

//1  = Beginning Player
//-1 = second Player
static int player;
static int winningPlayer = 0;
static int g_moves[9];

/* 0 1 2
 * 3 4 5
 * 6 7 8
 */
int initBoard[9] = {
    0, 0, 0,
    0, 0, 0,
    0, 0, 0,
};

int board[9];

As well as my winning function:

int isWin(int *board)
{
    unsigned winningBoards[8][3] = {
        {board[0], board[1], board[2],},
        {board[3], board[4], board[5],},
        {board[6], board[7], board[8],},
        {board[0], board[3], board[6],},
        {board[1], board[4], board[7],},
        {board[2], board[5], board[8],},
        {board[0], board[4], board[8],},
        {board[2], board[4], board[6],},
    };

    int i;
    for(i=0; i<8; i++){
        if( (winningBoards[i][0] != 0) &&
            (winningBoards[i][0] == winningBoards[i][1]) &&
            (winningBoards[i][0] == winningBoards[i][2])){
                winningPlayer = winningBoards[i][0];
                return winningPlayer;
            }
    }
    return 0;
}

For some reason, the last time the minimax returns from depth 7 step-by-step to depth 1, it overwrites my array g_moves with all 0s thus resulting in the following lines in my printed output (only the last 70 lines):

...
----- Called Minimax, Depth: 7 -----

Player: -1                                                                                                                                                                                                                                                                     
---                                                                                                                                                                                                                                                                            
Player 1 won on depth 7                                                                                                                                                                                                                                                        
Resulting score: (10-7)*1 = 3                                                                                                                                                                                                                                                  
Score returned to depth 6                                                                                                                                                                                                                                                      
---                                                                                                                                                                                                                                                                            
Value of the move at position 2 on next depth 7 is 3                                                                                                                                                                                                                           
---                                                                                                                                                                                                                                                                            
Value of position 2 was reset to 0 on next depth 7                                                                                                                                                                                                                             
---                                                                                                                                                                                                                                                                            
Move at position 0 was selected on next depth 7                                                                                                                                                                                                                                
Returning score of 3 to depth 6                                                                                                                                                                                                                                                
---                                                                                                                                                                                                                                                                            
Value of the move at position 3 on next depth 6 is 3                                                                                                                                                                                                                           
---                                                                                                                                                                                                                                                                            
Value of position 3 was reset to 0 on next depth 6                                                                                                                                                                                                                             
---                                                                                                                                                                                                                                                                            
Move at position 0 was selected on next depth 6                                                                                                                                                                                                                                
Returning score of 3 to depth 5                                                                                                                                                                                                                                                
---                                                                                                                                                                                                                                                                            
Value of the move at position 4 on next depth 5 is 3                                                                                                                                                                                                                           
---                                                                                                                                                                                                                                                                            
Value of position 4 was reset to 0 on next depth 5                                                                                                                                                                                                                             
---                                                                                                                                                                                                                                                                            
Move at position 0 was selected on next depth 5                                                                                                                                                                                                                                
Returning score of 3 to depth 4                                                                                                                                                                                                                                                
---                                                                                                                                                                                                                                                                            
Value of the move at position 5 on next depth 4 is 3                                                                                                                                                                                                                           
---                                                                                                                                                                                                                                                                            
Value of position 5 was reset to 0 on next depth 4                                                                                                                                                                                                                             
---                                                                                                                                                                                                                                                                            
Move at position 0 was selected on next depth 4                                                                                                                                                                                                                                
Returning score of 3 to depth 3                                                                                                                                                                                                                                                
---                                                                                                                                                                                                                                                                            
Value of the move at position 6 on next depth 3 is 3                                                                                                                                                                                                                           
---                                                                                                                                                                                                                                                                            
Value of position 6 was reset to 0 on next depth 3                                                                                                                                                                                                                             
---                                                                                                                                                                                                                                                                            
Move at position 0 was selected on next depth 3                                                                                                                                                                                                                                
Returning score of 5 to depth 2                                                                                                                                                                                                                                                
---                                                                                                                                                                                                                                                                            
Value of the move at position 7 on next depth 2 is 5                                                                                                                                                                                                                           
---                                                                                                                                                                                                                                                                            
Value of position 7 was reset to 0 on next depth 2                                                                                                                                                                                                                             
---                                                                                                                                                                                                                                                                            
Move at position 0 was selected on next depth 2                                                                                                                                                                                                                                
Returning score of 5 to depth 1                                                                                                                                                                                                                                                
---                                                                                                                                                                                                                                                                            
Value of the move at position 8 on next depth 1 is 5                                                                                                                                                                                                                           
---                                                                                                                                                                                                                                                                            
Value of position 8 was reset to 0 on next depth 1                                                                                                                                                                                                                             
---                                                                                                                                                                                                                                                                            
Move at position 0 was selected on next depth 1                                                                                                                                                                                                                                
Returning score of 5 to depth 0
---

--- finished ---

Moves with the highest score: 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 
The score is 5
The best next board is: 
|----|----|----|
| 0  | 0  | 0  |
|----|----|----|
| 0  | 0  | 0  |
|----|----|----|
| 0  | 0  | 0  |
|----|----|----|

If you need any other information in order to help me, I'll be glad to give them to you if I have them myself.

Thanks in advance.

EDIT: So I rewrote my minimax function so it now prints all the possible board states on a .txt file using the console (cmd: ./NAME_OF_FILE > DEST_NAME.txt in the according folder). The code is the following:

int minimax(int *board, int depth)
{
    g_node++;
    int player;
    int move = -1;
    int score = -20;
    int thisScore = -20;
    int i;

    if(isWin(board) != 0){
        printf("\nNode: %i\n", g_node);
        printf("Board state:");
        for(i=0;i<9;i++) {
            if((i%3) == 0)
                printf("\n");
            printf("%2i ", board[i]);
        }
        printf("\n");
        printf("has a score of %i\n", (10-depth)*winningPlayer);
        return (10-depth)*winningPlayer;
    }


    if(depth%2 ==1){
            player = -1;
        } else {
            player = 1;
        }
    for(i=0; i<9; i++){
        if(board[i] == 0){
            board[i] = player;
            thisScore = minimax(board, depth+1);
            if(thisScore > score){
                score = thisScore;
                move = i;
            }
            board[i] = 0;
        }
    }

    printf("\nNode: %i\n", g_node);
    printf("Board state:");
    for(i=0;i<9;i++) {
        if((i%3) == 0)
            printf("\n");
        printf("%2i ", board[i]);
    }
    printf("\n");

    if(move == -1){
        printf("has a score of 0\n");
        return 0;

    }
    printf("has a score of %i\n", score);
    return score;
}

My next step is to print out the board with the max score of each move at the according position.

Example:
10  8 10
 8  7  8
10  8 10
for the empty board at the beginning.

EDIT 2: I now added another function called printScoredBoards which basically is supposed to do what I discribed above in my last edit, however there is a problem to it. Since it is always possible to win after the 5th move if your opponent plays dumb enough and since the minimax tries out all possibilities, including those, with the following code I get a scored board of all 15s for the empty board.

void printScoredBoards(int *board, int depth)
{
    int player;
    int scoredBoard[9] = {0,0,0,0,0,0,0,0,0,};
    int i;
    if(isWin(board) == 0){
        if(depth%2 ==1){
            player = -1;
        } else {
            player = 1;
        }

        for(i=0; i<9; i++){
            if(board[i] == 0){
                board[i] = player;
                scoredBoard[i] = minimax(board, depth+1)+10;
                printScoredBoards(board, depth+1);
                board[i] = 0;
            }
        }
        printf("Scored board:");
        dumpTable(scoredBoard);
        printf("\n");
    }
}

This is although the corners should be worth more and the center is the least valuable. Does anyone happen to know a work-around for this?


EDIT: I've set up a new minimax algorithm and posted it in another post. You can find the post in the "Linked" section on the right or here. Now all I'm doing is implementing it in the microcontroller code and creating a function which can pick the best/second best move out of all scored moves as well as randomize it if there are multiple moves with the same score. This post can thereby be closed.

Hengel answered 20/7, 2016 at 11:34 Comment(6)
For such a very small state space as tic-tac-toe, you might just keep the perfect AI and 'roll a dice' at each time to decide if you choose the optimal step or another, random step. Additionally, medium could have a check to prevent other player's winning moves when it can.Concubine
That actually is not such a bad idea and I guess I'll just do that if this thing turns out to be a dead end but I'd still like to know where I'm making a mistake. It's probably something obvious that I'm just constantly missingHengel
I would also second lorro, I would just guess that the computational power is sufficient to do minmax or at least minmax with alpha-beta-pruning. What kind of microcontroller is it? I would like to get a rough idea of its computational power.Pianissimo
It's got an MB91F464AB chip on itHengel
~Edited my original post~Hengel
~Edited my original post again~Hengel
V
1

I think trying to get the second best move with a full depth analysis is overdoing it. Don't explore the whole tree by limiting the depth of your minmax (2 move ahead permits to win but the AI is still strong), or just a use a random move for a really imperfect AI.

Virology answered 20/7, 2016 at 11:49 Comment(9)
What do you mean by limiting the depth exactly?Hengel
You don't explore the whole game tree but only 2 moves ahead. (int depth = getDepth(board); become int depth = 2;)Virology
So what criteria should I use to score each move then?Hengel
@Lithimlin: You might simply do Lose < (notFinished / Draw) < Win.Audwin
~Edited my original post~Hengel
This answer is a chat with OP, not real answer so it does have no value for other people.Canalize
~Edited my original post again~Hengel
If you have multiple equal score you can take the first ot the last move depending on how your minmaxing (> or >=) without doing anything, or choose at random between the best score if you don't want to have the computer always play the same way.Virology
I think you want to make the evaluate function too perfect. It is supposed to play at random the first turn because otherwise it'll play the best move and it will be undefeatable. Later on your AI will play the best move because depth 2 is sufficient for that, but it may have already lost.Virology

© 2022 - 2024 — McMap. All rights reserved.