How I taught sklearn algorithm to play Tic-Tac-Toe.

4 min readNov 7, 2019

What is Tic-Tac-Toe?

Tic-Tac-Toe or X & O is a game that is played by 2 players (X and O) that are placing their marks on a 3x3 game board. The first player that places his marks (X or O) on a diagonal, vertical or horizontal axes is the winner.

Being one of the most simple game, it was used in teaching purposes for developing sport principles and some branches of Artificial Inteligence. There is a total of 26830 viabile games, which makes possible the training of some algorithms to play this game in a shortly enough amount of time, being comparable to chess (10¹²⁰, or the Sannon’s number) or GO (10⁷²⁰).

Used Logic

During this project, there where tested 8 algorithms at playing Tic-Tac-Toe. They were playing in a virtual environment created in Python 3 programming language, that apply the rules of this game on a 3x3 matrix, that is supposed to be the game board. The algorithms learned to play this game using a database of randomly generated gemes. The generator algorithm generated 5000 games, which was split based on the winner of the game (X or O), these games were split on turns, that represent the samples in the project’s databese, 9279 turns per every dataset.

You can find the game generator algorithm and the game environment code on GitHub repository:

Every player was built up from 2 algorithms — one was predicting the column, another was predicting the row where the new sign was placed. The entry data is the game board. The output are the coordinates of the cell where the new sign should be placed. But Idiscovered that very often algorithms were making illegal moves, it happened because of the absence of probability. To eliminate these moves, I tried to use probability. Each algorithm was predicting 3 probabilities that created a likelihood that placing the new sign on this column and this row the player will win. These 2 vectors of probabilities were after that multiplied (one of them of course were transposed before that), getting in such way a matrix of probabilities, every cell had the probability of winning by placing the new sign on this cell. Applying this matrix on game board, the environment extracts only the probabilities that are related to empty game board cells. After that, by finding the highest probability the player places the new sign. In such way I eliminated illegal moves.

One more important point is the fact that every game starts from a random move. These feature is based on 2 reasons:

The same input generates the same output. These would generate only one game every time.
This feature very well virtualizes an average statistical player, being how practically each of us is starting the Tic-Tac-Toe game.

Chosen algorithms

For this project I chosen 8 algorithms. Each algorithm must fulfill the next criteria:

It should be implemented in the Python Machine Learning package — sklearn.
It must have the possibility to return the probabilities as an output.

The chosen algorithms are: Decision Tree, Random Forest, Gradient Boosting, Ada Boost, Gaussian Naive Bayes, Bernouli Naive Bayes, Logistic Regression and Support Vector Classifier (Gaussian kernel).

The research process

The algorithms played with each other 100 games, the winner was the algorithm that won the majority of games. If the scores were between 40 or 60 the result is a tie.

The interpetation of results

From all used algorithms in this research the most victories are taken by TreeDecisionClassifier. The next place in the rankings were also taken by CARTs (Classification and Regression Trees) — Random Forest, SVC (exception, it’s not a CART), Ada Boost and Gradient Boosting.

I think that these results are due the fact that CARTs are bulding an if-else structure of the game, that without any notice, they are building a game-tree, going through the entire game, choosing the needed cells to win the game.

GitHub repository: https://github.com/ScienceKot/Tik-Taker.git

Thank you