General Dec 6, 2017 | 4:39 PMby Colin McGourty

DeepMind’s AlphaZero crushes chess

20 years after DeepBlue defeated Garry Kasparov in a match, chess players have awoken to a new revolution. The AlphaZero algorithm developed by Google and DeepMind took just four hours of playing against itself to synthesise the chess knowledge of one and a half millennium and reach a level where it not only surpassed humans but crushed the reigning World Computer Champion Stockfish 28 wins to 0 in a 100-game match. All the brilliant stratagems and refinements that human programmers used to build chess engines have been outdone, and like Go players we can only marvel at a wholly new approach to the game.

After DeepMind's AlphaZero the chess engine world, and the chess world, will never be quite the same again 

Only five days ago, in a more innocent world, Ian Nepomniachtchi could say after Round 1 of the London Chess Classic at Google Headquarters:

I hope there will be some big history of cooperation between Google and chess. It’s not about creating an AlphaGo, an AlphaChess, which will kill chess, but maybe in some friendly mode.

There were worrying signs, though, as AlphaGo, the program that defeated the human World Champion, had just been surpassed by AlphaGoZero, which learned the game merely by playing itself. 

DeepMind co-founder Demis Hassabis is a former chess prodigy, and while his team had taken on the challenge of defeating Go, a game where humans were still in the ascendency, there was an obvious temptation to try and apply the same techniques to chess as well. We’ve long recognised our human inferiority, but we could take comfort from the fact that the chess engines that beat us were also the works of human ingenuity and effort. That was about to change.

Now we know why Demis Hassabis, on Magnus Carlsen's shoulder, was in such a good mood when the London Chess Classic players began their games in the Google Headquarters | photo: Lennart Ootes, Grand Chess Tour

The bombshell came in a quietly released academic paper published on 5 December 2017: Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

You can download it below:

The contents are stunning. The DeepMind team had managed to prove that a generic version of their algorithm, with no specific knowledge other than the rules of the game, could train itself for four hours at chess, two hours in shogi (Japanese chess) or eight hours in Go and then beat the reigning computer champions – i.e. the strongest known players of those games. In chess it wasn’t just a beating, but sheer demolition.

Stockfish is the reigning TCEC computer chess champion, and while it failed to make the final this year it went unbeaten in 51 games. In a match with the chess-trained AlphaZero, though, it lost 28 games and won none, with the remaining 72 drawn. With White AlphaZero scored a phenomenal 25 wins and 25 draws, while with Black it “merely” scored 3 wins and 47 draws. It turns out the starting move is really important after all! 

In the paper DeepMind share 10 of the wins against Stockfish, which we’ve added below so you can replay them with some slightly lower level computer analysis (simply click on a result):

The games themselves are fascinating, and have already drawn huge praise from chess observers. For instance, in the first AlphaZero with Black decides to play with the bishop pair despite Stockfish having four pawns for a bishop:

Needless to say, the bishop pair won! In the last game AlphaGo decides not to defend the knight on h6 after 18...g5:

Instead it opted for 19.Re1!? with more deep sacrifices to follow. There's something for everyone:

We'll be returning to the chess soon!

How did they do it?

First, perhaps, how they didn’t do it is the most remarkable:

Instead the algorithm lived up to its name by starting from zero apart from the rules of the game. Then it began to play games using a Monte-Carlo algorithm, where initially random moves would be tried out until a neural network began to learn which options were likely to be more promising. It was only a couple of months ago that the former Top 10 player Alexander Morozevich could comment:

Until 2015 that was the only intellectual game in which professionals were stronger than machines, and only in the last year or year and a half have the first harbingers appeared saying that yes, the end of Go has come. For now it’s not quite formalised, but gradually, I think, they’ll follow the same path that we followed in chess. Machines, of course, will take up an absolutely dominant position, despite the fact that of course the calculating algorithms, the evaluation algorithms are quite different. As far as I understand it the algorithm used by AlphaGo, the most successful program, is a Monte Carlo algorithm. That was also one of the main computational approaches in chess, but it didn’t become common. Machines reached a maximum of 2400 with that. After all, our game is about more direct selection, while there it was possible even to use that algorithm, which is quite interesting.  

It turns out the approach may have been right after all, though the game-changer is perhaps phenomenal hardware. 

They used to say you needed 10,000 hours of deliberate practice to master something...

During training AlphaGo had access to, “5,000 first generation TPUs to generate self-play games and 64 2nd-generation TPUs to train the neural networks”. TPUs, or tensor processing units, aren’t even publicly available, since they were developed by Google specifically to handle the kind of calculations demanded by machine learning. The trained algorithm, meanwhile, was run on a single machine with four TPUs, and DeepMind stress the efficiency of their approach, with AlphaZero searching just 80,000 positions per second compared to 70 million for Stockfish. How does it achieve that efficiency?

AlphaZero compensates for the lower number of evaluations by using its deep neural network to focus much more selectively on the most promising variations – arguably a more “human-like” approach to search, as originally proposed by Shannon. Figure 2 shows the scalability of each player with respect to thinking time, measured on an Elo scale, relative to Stockfish or Elmo with 40ms thinking time. AlphaZero’s MCTS scaled more effectively with thinking time than either Stockfish or Elmo, calling into question the widely held belief that alpha-beta search is inherently superior in these domains.

So as time increases from the one minute per move of the games above we can expect AlphaZero to make bigger gains in strength than traditional “brute-force” approaches, while the authors also note there’s no reason they couldn’t implement some of the traditional tricks of chess engines as well:

It is likely that some of these techniques could further improve the performance of AlphaZero; however, we have focused on a pure self-play reinforcement learning approach and leave these extensions for future research.

What have we learned?

Generic machine-learning algorithms are game-changers, and not just for chess but the world around us. If we do manage to create some kind of very basic consciousness and intelligence – the true meaning of AI – it’s possible that on the same day or very soon afterwards reinforced learning will have transformed that into the most intelligent entity in our known universe. Meanwhile, though, it’s gratifying to see that the computer has justified 100s of years of chess development, since the program, entirely by itself, has ended up playing some of the best known human openings:

The stats under the chessboards refer to another 1200 "opening themed" games vs. Stockfish - the total score was 290 wins, 886 draws & 24 losses for AlphaGo, or 733:467

The graphs are fascinating to study, since you can see how certain openings became popular in the algorithm’s training games – such as the French Defence and the Caro-Kann – before dropping off in popularity as its strength increased. Note it looks as though there’s a reason for the popularity of the Queen’s Gambit at the very highest level, and that of another somewhat notorious opening...

Where do we go from here?

What happens next will depend largely on how keen DeepMind are on keeping their chess-trained algorithm active. Will it be “dismantled” like DeepBlue, or will it instead become available, freely or at a price, for chess players. You can imagine that the elite grandmasters, desperate for any edge they can find, would dearly love to get their hands on it. Will it soon be possible to use the “engine” alongside existing software to give evaluations and potential moves in games?

And where do traditional chess programmers go from here? Will they have to give up the refinements of human-tuned evaluation functions and all the existing techniques, or will the neural networks still require processing power and equipment not easily available? Will they be able to follow in DeepMind’s footsteps, or are there proprietary techniques involved that can’t easily be mastered?

There’s a lot to ponder, but for now the chess world has been shaken!

What happens next will be an intriguing story to follow in the coming days, weeks and months.

See also:

Sort by Date Descending Date Descending Date Ascending Most Liked Receive updates

Comments 113

Guest 20538546580
Join chess24
  • Free, Quick & Easy

  • Be the first to comment!

Lost your password? We'll send you a link to reset it!

After submitting this form you'll receive an email with the reset password link. If you still can't access your account please contact our customer service.

Which features would you like to enable?

We respect your privacy and data protection guidelines. Some components of our site require cookies or local storage that handles personal information.

Show Options

Hide Options