Just a moment, though... do you have any idea how they work? How is it possible that in a mere instant they analyse an infinite number of variations reliably? I myself, after 20 years of competition, still didn’t have a clue about the mechanism by which these monsters work. I was bitten by the curiosity bug and did some research on the subject. In this article we’re going to try and understand how engines function and the way in which they think. That means talking about some mathematical concepts, principally algorithms.
Don’t be afraid, though! We’re not going to go too deeply into the topic and everyone will be able to grasp it.
Let’s get started:
Chess is a so-called “zero-sum game”, which simply means a game in which if one player wins the other loses. As a consequence, total wins minus total losses equals zero, from where “zero-sum” is derived.
For example: in the diagram position White has just captured on c6, winning a knight.
White’s winnings: 1 knight
Black’s losses: 1 knight
Wins – losses = 0
In order to determine the score in a zero-sum game after a certain number of moves, you can use the Minimax algorithm, in which the best play possible is assumed according to an evaluation function.
Minimax is a very beautiful word, but how does the algorithm work?
In what’s called a 1-ply search i.e. where only one move ahead is examined, the side playing (called the max player) simply looks at the evaluation after each possible move. The move with the best evaluation is chosen. However, in a 2-ply search, where the opponent (the min player) also moves, it’s more complex. He or she also chooses a move according to the best evaluation.
In short, it’s about choosing the best move for yourself, assuming that your opponent will also choose the worst move for you.
Consider the example in the following diagram:
Imagine it’s White to play, and he has to choose between Rc7, Nc6 or a4. For each possible choice Black can choose between three replies. The numbers below indicate the evaluation of the position (a positive number favours White, a negative one favours Black). As Minimax suggests, we have to look at the worst-case scenario and assume that Black will reply with the most damaging move. That’s why in the first phase of the algorithm, the most damaging move as a reply is calculated, which will be the minimum in each of the cases (-8, 0, 3). Finally, the maximum is calculated (3). Therefore the move the machine will make is Nc6.
As you can see, from the computational point of view it would take two subroutines, one for the min player and one for the max player. They can be joined using a version of the Minimax algorithm called Negamax, which is based on the following mathematical relationship (the positive evaluation for one side equals the negative evaluation for the other):
max (a,b) = - min (-a, -b)
Anyone who has stopped to think a little about Minimax will have realised that using brute force is quite inefficient, and in fact it has a computation cost of the order of
where b is the so-called branching factor and d is the depth. b is usually about 35 in a middlegame position (35 legal moves on each turn) and d could be up to 100, as there are on average 50 moves in each game. Therefore 35^100=2.5E154, which means that it would take many universes to reach the end of the search tree for a single game.
We therefore need a solution for this problem. One of the most commonly employed is limiting the search depth. At a certain depth, we simply cut things off and perform a static evaluation of the position, as seen in the diagram below.
How do machines evaluate each position on the board? This issue is critical, since if the evaluation is incorrect then it doesn’t matter if you manage to reach a depth of many moves.
These evaluations are produced by what is called an evaluation function, which can be expressed as follows:
f(x)= w1*f1(x) + w2*f2(x)+ w3*f3(x)+ ...
An evaluation function is a usually linear combination of “characteristics” f(x) with weightings that determine the importance of each of the characteristics. The most basic evaluation function is carried out solely by counting the pieces remaining on the board:
f1(x)= nº of white queens - nº of black queens
f2(x)= nº of white rooks - nº of black rooks
f3(x)= nº of white bishops - nº of black bishops
f4(x)= nº of white knights - nº of black knights
f5(x)= nº of white pawns - nº of black pawns
where the weightings assigned are the numbers that we learn in school
Clearly this simple evaluation function leads to mistakes, so it’s not sufficient for the engines. As an example, take the following diagram:
where the evaluation function mentioned would give the result:
f(x)= 9*(1-0) + 5*(0-2) + 3*(1-1) + 3*(0-2) + 1*(1-4) = -10
i.e. a decisive black advantage. Nothing could be further from the truth since White will play Qh6 and Black will be able to resign. For this reason the machines use an advanced evaluation where different techniques are employed:
1. Evaluation by game phase
It’s interesting to vary the weights of the functions according to the current phase of the game. For example, we want our king to be far from the centre in the middlegame. However, as we all know, it’s a key piece in endings and then it’s better for it to be in the centre. In order to gauge the current phase of the game the engines can, for instance, use the number of pieces remaining on the board.
2. Bishop pair
You can add a small bonus for the bishop pair (and another if the bishops cover many squares on the board).
3. Piece-square tables
There are easy ways of assigning values to specific pieces on specific squares. For example, in the opening pawns get a small bonus if they occupy central squares.
4. King safety
This is very important. For example, you can gauge this by calculating the number of pawns surrounding the king, or whether there’s a rook nearby.
You normally prefer positions in which you have more options, for example with open diagonals for the bishops. This can be gauged, for instance, by using the number of legal moves available in a position as a mobility score.
6. Pawn structure
Doubled pawns can be punished slightly, as can isolated pawns in endgames, since we all know how easy they are to attack.
7. Rooks on open files
It’s well-known that this is a positive, as is having a rook on the 7th rank. There are many more factors that we can take into account, but these seven give an idea of the advanced evaluation that you can add to the evaluation function.
Now let’s talk about the horizon effect, something which occurs due to limiting the depth of the search. Imagine that the engine is searching at a low depth and there’s a trapped rook. In order to avoid a loss the engine may sacrifice a piece in order to delay the loss of the rook “beyond the horizon”. As a result, it ends up losing two pieces, since the loss of the rook was inevitable in any case.
Suppose that in the example illustrated above it’s White to move and the engine’s depth is 2. The rook on a5 is lost, but the machine would play Bxf7 in order to avoid the loss of the rook within the horizon. The result is disastrous and after Rxf7 White would lose both pieces.
The solution to this is known as “quiescence search” and consists of not limiting the depth of analysis to a simple number but analysing variations until the position is “quiet”. What do we mean by a quiet position? One in which, for instance, captures are not possible.
The Alpha-beta algorithm is a major improvement over the Minimax search algorithm. It was proposed by John McCarthy in 1955 at a conference in Dartmouth. It substantially reduces the possibilities to be analysed in the search tree and uses a technique of branching and bounding. That means that when analysing possible replies to a move it’s sufficient to find a refutation. Perhaps there are stronger refutations, but it’s not necessary to analyse them since one alone is enough to discard the option. This seems simple, but when the analysis is very deep it’s quite complex, so let’s, as always, try to understand it with a simple example.
Imagine it’s White to move and we’re searching at a depth of 2, which means that we consider all the moves for White and all the possible replies to each of those moves by Black. First, choose one of the possible moves for White, which we’re going to call “move 1”. We consider all the replies on the part of Black and as a result of that analysis the position is found to be equal if White decides to go for “move 1”.
After that first analysis, we consider another possible move for White – “move 2”. Imagine that when analysing possible replies for Black the first of those results in Black winning a bishop! We can say that “move 2” has been refuted without the need to look any deeper in the rest of the possible replies for Black. Perhaps it’s even worse and Black wins the queen or gives mate immediately, but the loss of the bishop is already enough for us to discard “move 2”, saving a lot of computational time. Therefore “move 1” prevails over “move 2”. In short, the full analysis of “move 1” provides a limit i.e. we know there’s at least one move that gives an equal position, so we can discard any move that leaves White clearly worse.
Of course, when the depth of the analysis is 3
or greater the complexity is substantially increased, because both players can
now choose options that affect the move tree. The strategy then is to maintain
lower and upper limits known as alpha and beta. The function of the lower limit
is not to consider a move that is too bad. We also establish an upper limit
since if a move at a depth of 3 or higher leads to a continuation that is too
good the other player won’t allow it, because there are many ways to avoid the
situation in the tree of possibilities. The lower limit of one player is the
upper limit of the other.
Reduction in the computational cost:
The saving when using this algorithm is important. Suppose that a Minimax search tree has x nodes. The nodes of the Alpha-beta algorithm in well-written code can be the square root of x. How many nodes can we save? Well, it depends on how well-ordered the search tree is. If you always search the best possible move first you eliminate the majority of the nodes. Of course, you don’t always know what the best move is, or there wouldn’t be any need to search for it
Conversely, if you always examine the worst moves before the better ones no nodes are saved at all! For that reason, a good search order is very important and that’s one of the main parts of a good chess program. As Levin noted in 1961, assuming a constant b moves for each node visited and a depth of n, the maximum number of leaves in Alpha-beta is equivalent to Minimax, b^n. If we always consider the best move first, it’s
b ^ ceil(n/2) + b ^ floor(n/2) -1
where ceil(ing) and floor are functions that map a real number to the integer number above or below. The following table shows the minimal number of leaves (if b=40):
depth worst case best case
0 1 1
1 40 40
2 1,600 79
3 64,000 1,639
4 2,560,000 3,199
... ... ...
Now let’s move to looking at other techniques employed to reduce the computational cost:
In order to explain this technique we’re going to use an analogy with boxing. We give our opponent a “punch” or move for free, and if he still can’t knock us out then if it’s your turn to move you’re doing very well. The implementation of this technique is as follows: before making a complete search in a position you make a search with a lower depth where it’s your opponent to move.
If the result is that the “score” or evaluation is better than a certain “alpha” value, then you don’t have to continue searching in this subtree (there must have been sub-optimal play to reach this point, so it can be discarded). This saves time because it reduces the depth. Of course, if you’ve paid a little attention you’ll see that this technique fails in zugzwang positions! As you know, in those positions any move you make loses and you’d like your opponent to have the move. So this technique is not applicable to those positions.
Normally in our search the moves are ordered from better to worse. It’s therefore interesting to search the top moves in the list with maximum depth for each node and with lower depth for the later moves. In this way computing time is saved and the branching factor can even be reduced to 2.
Finally, to conclude this article, let’s focus on how a chessboard is “represented” in all of these algorithms.
The first problem encountered by the developers of chess engines was how to represent the chessboard and pieces in as simple and economical a way as possible. Initially the hardware capacity was very limited, so efficiently representing the game was a very important factor.
Bitboards were a great advance in the representation of the board, pieces and moves. Also known as bitsets or bitmaps, they’re used to represent the board within the program in a “piece-centric” way. Essentially, they consist of finite sets of up to 64 elements, with one bit per square.
In addition to modelling the board with the location of the pieces, some additional information is necessary in order to completely specify a chess position, for example whose move it is, whether it’s possible to castle or not, possible en passant captures and the number of moves without making progress (no captures or pawn movements) in order to handle the 50-move draw rule. In this article we’re only going to mention the structures in order to represent the pieces and the board.
A “piece-centric” representation is a list, array or set of all the pieces still active on the board, as well as the additional information indicating which squares they occupy on the board. A common way of achieving this is the so-called “set-wise bitboard” method where each piece is associated with a 64-bit word, with a bit for its location. You can use so-called “piece-lists”, “piece-sets” or “bitboards”.
Well, this has been a journey through the world of chess
machines! I’ve left some links that will be of interest for anyone who wants to
delve deeper into the topic.