Hacker News new | past | comments | ask | show | jobs | submit login
Compressing chess moves for fun and profit (mbuffett.com)
179 points by thunderbong 10 months ago | hide | past | favorite | 135 comments



The optimum compression seems to be an index into the list of all legal moves.

But one can do better on average, by assuming that moves are not uniformly distributed. Let an engine assign each legal move m some probability p of being played. The worse the move, the lower the probability we assign to it. Then a more optimal code for move m is -log p bits, corresponding to an entropy (expected surprise) of sum -p log p bits for the distribution over legal moves.

Related: accurately estimating the number of legal positions of chess [1].

[1] https://github.com/tromp/ChessPositionRanking


Though, that engine evaluation is going to be absurdly expensive, when you're compressing/decompressing billions of games.

I think everyone's optimizing the wrong thing! The size of a useful, real-world chess database would be dominated by its search indexes—indexes that allow fast, random-access lookup of every position that occurs in every game. (Or even better: fuzzy searching for nearby, "similar" positions). This is the difficult, open-ended algorithmic problem. Disk space is cheap!

(Out of curiosity, does anyone here actually know how to solve the "fuzzy search of chess positions" problem?)


Agreed. Ultimately, the most common reason for needing to store a bunch of games is you want a chess database so you can look up a given position.

So probably some kind of tree structure so you can look up games by sequence of moves since opening. And something like zobrist hashing for looking up a given position (regardless of sequence).

Some of these ultra-small encodings would be counterproductive for this purpose.


- "And something like zobrist hashing for looking up a given position"

Which gets particularly good if you combine it with some hack like a Bloom filter. There's a trick, I don't remember if it has a name, where you can effect a time/space tradeoff by partitioning a search set and building a separate Bloom filter for each partition. A lookup is a constant-time query of each Bloom filter, plus a linear scan of each partition that signaled a match.

(This is a reasonable thing to do, because (0) the size of chess databases can get extremely large and (1) the queries are human UX interactions that need to be responsive).


I mean, chess engines use transposition tables to store encountered positions. In this case you can use the zobrist hash as the index to a hash table and it's an O(1) lookup for a given position.

Typically that's going to be the best way to index into full games. Since you can point to an entry via a game tree search (eg; each node of the tree is a position that you can use as an index to a big transposition table).

You can probably use a bloom filter to check if a given position has ever been encountered (eg; as a layer on top of the transposition table) though. For something like storing 10 billion positions w/ a 10^-6 false positive probability (w/ reasonable number of hashes) you're looking at 30+GiB bloom filters.


Right! The hash table you're describing is O(1) lookup, but its memory size is a constant multiple larger than the Bloom filter. A Bloom filter doesn't tell you anything about where the needle is–only whether the needle exists; so, if used for lookup, it reduces to O(n) brute-force search. In between the two, you can create an array of M partitions of the haystack, each with their own Bloom filter: this structure has something like O(M + n/M) lookup, but with the smaller memory size of the Bloom filter.

When I implemented this, I deliberately avoided constructing the hash table you're describing, because (to my recollection) it was larger than my remaining disk space! :)


> Out of curiosity, does anyone here actually know how to solve the "fuzzy search of chess positions" problem?

A classification based indexing would help here. The data points of the index would be not just the position of the pieces, but their relative as well. Things like the squares under attack, pinned weaker pieces, and maybe some evaluation of control of the board (again maybe via some attacking based metric).

This seems like the kind of thing that could be done by analyzing a larger dataset of games to create a set of functional categorizations.

Once the data points themselves have been identified, actually indexing them is the easy part.

Solving this more generally for the “find me games like X…” is much trickier without locking down that X to something concrete.


For example: "this position, but a few of the pieces can be in different places". (Maybe penalizing pawn moves with a higher weight—a pawn move is in practice a "larger change" to a position than a piece move).

This version is easy to formalize: it's just querying a bitstring x \in {0,1}^N, a bitboard vector representation of a chess position, and asking if there's any other needles within a short Hamming distance. I haven't a clue how to solve that (or if it's solvable at all).


I wonder if shingle hashes might help a lot here...

https://en.wikipedia.org/wiki/MinHash


You just need to enumerate for all possible states, the possible transitions and evaluate the probabilities for them. How many could there possibly be? Store in a small database and look em up on the fly. Easy peasy.


I know you’re joking, but it wouldn’t really be that hard in terms of a lookup for played positions which is what OP asked for - you could have Bitboards for the relevant info so finding a state would be a refined search starting with the busy/occupied Bitboard. Since you’re only storing games that have been played rather than all positions, the size of the index shouldn’t be outrageously large compared with other things already being stored.


- "and evaluate the probabilities for them"

I don't know any good, fast heuristics for predicting this (likelihood of a human choosing a specific chess move). Do you? Chess engine evaluation is computationally quite heavy.


You don’t need to make perfect predictions. There could actually be a lookup for a few common board states (say a couple million, to cover the first couple moves). Beyond that just use a simple scoring function, for example the value of your vs opponent pieces, whether any of your pieces (notably king) are in danger, whether u won, etc. This means u get better scores for the more likely winning moves, for capturing valuable pieces and for avoiding the loss of your own.

U could also play two or so turns of minimax, or perhaps use a neural network to evaluate the various reachable board states.

So for a given state, enumerate possible transitions, score the resulting states, and map to some sort of probability distribution, then use some prefix code (think Huffman tree[+]) based on the distribution of probabilities to encode the transition.

It’s perhaps not super fast, and not super accurate but if you can weed out the 50% of dumb moves, that already saves a bit.

[+] an easier and better approach is to map the probabilities into an interval between 0 and 1, and keep using fractional bits to „zoom in“ until one of the subintervals is uniquely defined, then recurse on that. Some of the common compression algorithms uses that (but I don’t remember the specifics, those intro courses were a long time ago).


Parent was making a joke. The best upper bound found for the number of chess positions is 4.8x10^44.

https://tromp.github.io/chess/chess.html


Realizing this is a joke, however you only move one piece at a time and there are limited valid moves. Standard chess notation depends on this.


I had built a demo like that a few months ago, it relied on pg_tgrm GIST indexes of the position's FEN string encoding and only a few bits of metadata per position. You could push it by "learning" some representation and perhaps relying on bloom filters or summat clever like that maybe...


> Though, that engine evaluation is going to be absurdly expensive, when you're compressing/decompressing billions of games.

It only has to be accurate enough to be beneficial though. Even a knowledge about two sets of more frequent and less frequent moves would be much better than the baseline.


Yeah unfortunately the blog post didn't really explain the use case well enough. There has to be a reason they want to waste space storing the redundant indicators of check and checkmate for instance.


If you turn an arbitrary byte string into a list of indexes into legal chess moves, then you can encode arbitrary data as chess games!

(My implementation: https://incoherency.co.uk/chess-steg/ and explanation at https://incoherency.co.uk/blog/stories/chess-steg.html )


That's really cute.

I'm surprised stalemate detection is harder than checkmate detection. Aren't they the same except checkmate has the extra step of detecting check?


It's just because of the chess.js API. It tells you if any given move is a checkmate, so you know to ignore those moves without playing them, but to find out if a move is a stalemate, you need to "play it" on a cloned game and then see if the opponent has any legal moves.


Not quite - stalemate is no legal move whatsoever for any of your pieces, and checkmate is king is in check and no legal move stops check.


But given that a legal move by definition ends with the king not in check, the distinction often kind of goes away. In implementation terms, I think you basically have to just look at every possible move of every piece and see if any of them end without being in check.

Stalemate can and almost always does involve pieces that can't move because they would create check.


Yeah, that's true, I see what you're saying. I guess knowing the king is already in check allows for optimisations, since you have to either move the king, capture the attacker or block the line of attack, and nothing else is legal. And if you're double-checked you have to move the king, no matter what. As opposed to stalemate where you really do have to check the moves of every piece.


I guess that could be true if you have ready access to things like "the attacker" and "pieces that are capable of moving to block the line of attack".



That's neat, so it looks like ln2(216) =~ 7.7 bits to encode any valid move, and then compresses to even better!


Yep. There's a paper somewhere claiming that the lower bound is ~5.1 bits/move, based on an exhaustive analysis of existing games.

https://www.chessprogramming.org/Encoding_Moves sort-of mentions it (search for 5.11), but I can't for the life of me find the actual analysis.


You might be interested in the author's follow up post where they reach 3.7 bits/move after compression. https://mbuffett.com/posts/compressing-chess-moves-even-furt...


Sometimes i wonder, how i got so far im my software developer career...


"Software", like "science", is a broad term. Just like there are many kinds of scientists and you could be, say, a biologist or a physicist, there are many entirely disparate fields of software engineering. At the end of the day, software itself is just a means to an ends: solving a problem. Solving some of these problems like the one that GP discussed have more to do with math than programming. The programming is mostly the implementation of the math. So if you're not versed in that math, of course you wouldn't understand the code that's being written to implement it.

Building machine learning systems is vastly different from building operating systems which is vastly different from embedded systems which is vastly different from networking which is vastly different what most of us do (myself included), which is building CRUD apps. We're just solving different problems. Of course there are common themes to writing good code, just like there are common themes to performing science, but the code and solutions will look almost nothing alike.


Because there are so many fields of software development, and you don’t need to be good in all of them. Don’t put yourself down for not understanding, you can always put some time into algorithms again if you need to :).


most of the optimizations discussed are actually kind of obvious if you know how chess notation works in the first place. like the reason the starting move "Nf3" makes any sense is because only 1 knight of all 4 on the board can possibly move to f3 on the first move. what the OP is doing is trying to take that kind of information density and trying to represent it, first by assigning bits to each part of a possible move and then by paring it down where the space of legal moves excludes entire categories of possibilities. there's another article that goes much further into how much optimization is possible here: https://lichess.org/@/lichess/blog/developer-update-275-impr...


Indeed, compression is just the dual problem of prediction, so the better you are at predicting the next move, the better you can compress.


So, Huffman encoding but for chess moves?


Kinda, but with a different dynamically determined dictionary for each successive position.


This is the kind of discussion that will get us micro-cybernetics in that ~1 year window.


Im confused, how does probability help here when you need to encode the possibility of all legal moves?


You use variable-length encodings—shorter encodings for more likely moves, longer encodings for ones that happen infrequently.

This is the thing the parent comment was referring to,

https://en.wikipedia.org/wiki/Huffman_coding


Using the probability distribution of moves at any given position, so that more likely moves take less space, will result in saving space on average.

This is the same as Huffman coding, just applied to a different domain.

For that matter, it might also resemble LLM compression of input text, although LLM compression is intentionally lossy and the above schemes are not.


Look up Huffman codes on Wikipedia.

In short, you put the most likely or most common values at the front of the dictionary. Values 0-255 take up one byte, 256-65535 take two bytes, etc. The lower your index, the fewer bits are needed to represent the corresponding state.


You're missing an important detail of Huffman codes: they need to be prefix-free so you can actually tell when one value ends and another begins.

A simplified example: suppose you have four letters in your alphabet that you want to represent in binary. You can't just represent these as A=0, B=1, C=10, and D=01, since it's impossible to tell whether "01011" is meant to represent ABABB or DDA or ACBB.

(Hamming codes on the other hand are intended for error correction, where you want maximally distant code words to minimize the chance that bitflips cause decoding errors.)


I don't really think that's too relevant to the core concept of using fewer bits for the most common codes.

That's really a problem inherent to binary streams in general, not just Huffman encoding.


It was a minor point re: this claim:

> Values 0-255 take up one byte, 256-65535 take two bytes

If you wanted to encode more than 256 values, then at best you'd be able to specify values 0-254 taking one byte, e.g. if you used 0xff as a special prefix to represent "use another byte for all other values", but that particular encoding means that you'd only be able to encode another 256 values with a second byte.


I think you mean Huffman codes, not Hamming codes.


Yup.


With a different engine for weaker players, whose moves would generally need a longer representation to encode. For most use cases where you're encoding a ton of PGNs, you also encode some information about the playing strength of the players, so you get that for free.


This probably isn't necessary. Anyone past beginner is still sometimes playing one of the best moves for each move, and when they don't it is probably not bad in a predictable way.


> Let an engine assign each legal move m some probability p of being played.

This assumes you're storing moves of players who are good at chess!


On the flipside, if the player is not good, the moves might take more space but there might be fewer of them :)


No, it just saves more space when the players are good. (Note the engine assigns each legal move a probability.)


They said one can do better on average this way. That conclusion does rest on an assumption.

Anyway, I wrote it in the form of a joke, but I was trying to suggest that for optimal compression, you may need to model how a player actually plays, not how they should ideally do it.


Yep, make an LLM but for chess moves.


what you're describing is arithmetic coding, correct?


FTA: “But they don’t account for the majority of what I’m storing, which are PGNs.”

If they’re storing chess games, why do they try to compress individual moves?

If you compress games, you can get much better compression. For example, in any board position, deterministically order all legal moves (they’re already ignoring some illegal moves in their setup, so I think ignoring all is within the rules of the game), and write down the number of the move made.

At game start, that will be an integer between 1 and 20 (16 different pawn moves, 4 different knight moves). Black’s reply similarly will be an integer between 1 and 20. For white’s second move, the maximum will depend on the first move.

To compress an entire game, use arithmetic encoding (https://en.wikipedia.org/wiki/Arithmetic_coding). If there are, on average, n legal moves per position, that will use 2log n bits per move.

https://www.chessprogramming.org/Chess_Position: “The maximum number of moves per chess position seems 218”, so that will give you less than 8 bits per move. In real life, it probably will be less than 6.

The only reason I see why this wouldn’t be an option is performance. Decoding would require move generation and thus be more complex.

To improve on that, make a deterministic predictor of how likely moves are and use that to improve on the arithmetic encoder. That probably isn’t worth it, though because of the increased decoding complexity.


I did end up doing something like this: https://mbuffett.com/posts/compressing-chess-moves-even-furt...

You’re spot on about the performance reason I didn’t want to do this originally, but I did some testing and turns out move generation in the library I use is Blazing Fast and wouldn’t be a bottleneck


I just added your blog to my RSS reader and noticed that your blog title in the RSS is a bit weird (in case you were unaware):

    <title>Posts on A blog</title>
    <link>https://mbuffett.com/posts/</link>
    <description>Recent content in Posts on A blog</description>


You don't seem to have a public repository for the code described in your posts (presumably a part of Chessbook?), but you might be using a textbook arithmetic coding algorithm which maintains both lower and upper bounds and does a renormalization by comparing topmost bits. If it's the case rANS would be much simpler to write and more efficient [1]. If your AC code is already well-optimized, there is not much reason to switch though.

[1] Fabian Giesen's sample implementation is already as good as is, and also contains a good alias table implementation if you want multi-symbol inputs: https://github.com/rygorous/ryg_rans/blob/master/rans_byte.h


It's also that PGN allows you to encode illegal moves, which is important if your dataset contains over-the-board games.

That you only got to about 10-12 bits per move is actually kind of sad in a way, because it means you're not doing substantially better than the approach where you record a move as just (delete-piece-at ____, recreate-that-piece-at ____) 12-bit pairs, where castles are implicitly only recorded by moving the king further than usual and underpromotion has to be explicitly recorded with an extra 12-bit move that is otherwise semantically impossible.


Such games should be infrequent enough that you don't need to optimize for them. A single unused code (or for arithmetic coding, a single symbol with a very low but non-zero frequency) can be used as an escape for a full non-optimized move format.


> At game start, that will be an integer between 1 and 20 (16 different pawn moves, 4 different knight moves).

And offer draw.

And resign.

Just to complete that thought.


Related follow-up from same author:

Compressing Chess Moves Even Further, To 3.7 Bits Per Move

https://mbuffett.com/posts/compressing-chess-moves-even-furt...


Awesome. Amusingly they didn't actually mention the final move size anywhere except the title.


Not sure the author's use case the naive solution seems like just using 6 bits for the piece's original position, 6 bits for the destination, and 2 bits to designate promotion type, for 14 total bits. Everything else can be found from the board state, and I can't imagine a scenario where you have a move, want to convert to algebraic notation, and also don't know what the board is.


I had very similar thoughts but I think one can do much better, even.

Each move only needs four bits to identify the piece because each side only has 16 pieces at maximum. The game replayer would need to keep track of each pawn's starting file though (including after promotions).

Variable-length encodings of the move options help a lot. Pawns only need two bits because there are never more than four legal moves for a pawn - except for promotions, but just handle those specially for those exact cases (if a pawn moves to the last rank then encode the promotion with two bits, otherwise encode the next move). Knights and kings each need three bits for the move - encode each king-castling option as a move to the appropriate rear diagonal (normally off the board so not otherwise legal). Bishops and rooks need four bits (rank and direction). Queens only need five (three bits for rank, two for direction).

This way you can get down to between six and nine bits per move.


> The game replayer would need to keep track of each pawn's starting file though (including after promotions).

That seems unnecessary. Don’t index the pieces based on their type and where they started — index them based on their current location. So the piece the lexicographically first (rank, file) is piece 0, and 4 bits trivially identifies a piece once you know the color.


True, I hadn't thought of that.


The problem is that the board state requirement makes searching for specific moves hard.

But if you allow board state, you can do less than 14. 3 bits for the type of piece, 6 for the destination, 2 extra bits. (Promotion if destination is last/first line, disambiguation for pawns in en passant)

You're still at about twice the theoretical minimum, I think. I vaguely recall an upper bound of 6 bits?


I was thinking the same thing. Often both squares will be close to each other, so I'm sure that a generic compression (eg zstd) can even reduce it even further by taking advantage of that proximity.


It's kinda complex.

I think my initial stab would be to encoding the source position (6 bits) and end position (6 bits) for a constant 12 bits per move.

You don't need to store whether it's a capture, you can figure that out from the game state. You don't need to store disambiguations, there are none in that format. You don't need to store check/mate, you can figure that out from the game state.

The only wrinkles (that I can tell) are castles and promotions. But you can get around this by the fact that kings and pawns have limited movement, so their legal destination squares are highly constrained.

So you could signal a promotion by encoding the destination with opposite rank, and using the file to encode which piece. Promoting to a rook on c8 gets its destination encoded as a1 - "a" for a rook, and "1" to indicate a promotion.

Similarly, you could encode a castle by encoding the king's move with opposite destination rank. Castling the king to f1 gets encoded as f8, and to g1 as g8.


> Similarly, you could encode a castle by encoding the king's move with opposite destination rank. Castling the king to f1 gets encoded as f8, and to g1 as g8.

You are overcomplicating this. For castling just record it as the King's source and destination. E.g., White kingside castling is e1g1, White castling queenside is e1c1, Black castling kingside is e8g8, and white castling queenside is e8c8.

All king moves other than castling move the king at most one file, so when you see a king on e1 and the move is e1g1 or e1c2 which is a move of two files you can infer that it must be castling.

For promotion, I suggest splitting it into two cases: promotion to a queen and promotion to something else. I saw a post once on Reddit from someone who analyzed promotions from all games in the Lichess database, and 98.7% were to queens, so we'll make that case the simplest.

I suggest that pawns that promote to a queen are simply recorded as moving to the promotion square. It is implicit that they promote to a queen. For example a pawn at b7 that moves straight forward and promotes to a queen would be recorded as b7b8. A pawn at b7 that captures on a8 and promotes to a queen would be recorded as b7a8.

For pawns that promote to a rook, record the move as a move to a square one rank back from the promotion square. For example b7b8 promoting to rook would be recorded as b7b7, and b7a8 promoting to a rook would be recorded as b7a7.

Similarly for promotions to bishops. Record the destination two ranks back. So b7b8 promoting to bishop would be b7b6. Similar for knights but three ranks back, so b7a5 for a pawn at b7 that captures on a8 and promotes to a knight.


> All king moves other than castling move the king at most one file, so when you see a king on e1 and the move is e1g1 or e1c2 which is a move of two files you can infer that it must be castling.

Yeah. For some reason I had a brain fart and thought that the two castles moved the king 1 and 2 files, instead of 2 and 3 files, and that made me think you needed to disambiguate a 1 file castle with a 1 file move.

Which is clearly dumb. I blame insufficient coffee.


Both O-O and O-O-O castles move the king by two squares. It gets more interesting if you want to also store fisher random chess, because the casting can involve a king move by any number of swuares or not moving at all. But even in this case the casting could be unambiguously recorded as a move by 2 squares(even if the king doesn't move at all)


As for promotion: in a valid chess position you could promote a single pawn at three files simultaneously. One by moving forward and then left and right capture. Additionally, you could have two pawns promoting on the same square, again with capture. And you can have a combination of both.

Encode a white's move exd8=N with white pawns on e7 and c7 and three black queens on f8, d8 and b8.


> you could promote a single pawn at three files simultaneously. One by moving forward and then left and right capture.

Ooh, didn't think of that, thanks. Still there's enough constraint in the legal destination squares to work around that. There's at least half the board that's inaccessible to a pawn about to promote, which should be enough to encode any of the 3 legal destination squares and 5 possible promotion targets.

Edit: maybe keep the destination file as-is, and use the destination rank to encode the promotion piece?

> Additionally, you could have two pawns promoting on the same square

The source square should disambiguate between those though, right?


You can encode a promotion as the destination file plus rank 1-4 for the 4 different kinds, since pawns can't move backwards


Promotions could be compressed to

Source position -> 6 bits destination position as ( forward left / forward / forward right -> 2 bits Target piece info Queen/Bishop/Knight/Rook 2 bits )


Use StockFish to predict the next move, and only store the diff between the actual move and the prediction.

Or, better, get a ranked list of all possible moves from stockfish, and use a variable-length integer to encode the position in the list. Then the best move takes ~1 bit, and worse moves take more bits. (And we can do fun things like compute how bad a player is by how big their game is after compression.)


The more advanced your predictor is, the slower your compressor gets. OP has 600 Million moves to encode, how long does it take to ask Stockfish for its opinion on 600M board states? (and then again, every time you want to decompress) (not a rhetorical question btw, I know little about chess engines)

I suspect the sweet spot here would be to use a much worse chess engine for the predictions, giving faster compression/decompression at the expense of the compression ratio.


Your intuition here is right on the money. I wrote a subsequent post with pretty much that exact approach: https://mbuffett.com/posts/compressing-chess-moves-even-furt...


I almost agree, but as other comment says, it's probaby better to use the StocFish points to generate a probability and then use Huffman encoding.


You'd probably want to use something like CABAC rather than Huffman, since it deals better with dynamic probabilities. https://en.wikipedia.org/wiki/Context-adaptive_binary_arithm...


It looks like a very interesting comparison. I still not sure how to tranform StockFish points to probabilities. (Perhaps proportional to exp(-difference/temperature), where temperature is a magical number or perhaps it's like something/elo??? There are too many parameters to tweak.)


Given OP's dataset, you could probably figure out how to map them, experimentally (e.g. by making a histogram)


I was imagining downloading the lichess database and trying different strategies to compress it, It's too much work for me, but I'd love to read a blog post if somsome does that.


> Use StockFish to predict the next move, and only store the diff between the actual move and the prediction.

This ties the algorithm down to one specific version of Stockfish, and configured identically (stuff like the hashtable size etc.), because all such factors will have an impact on Stockfish's evaluations. One factor changes, and you can't decompress the backup.


Related, compressing bingo cards:

https://github.com/diku-dk/openbanko

Among other things, contains:

Theoretically optimal compression of excessive amounts of bingo cards

GPU-accelerated massively parallel bingo simulation using Futhark

I suspect the research was triggered by a late professor of Computer Science who unironically calculated how many (Danish) bingo cards there are: https://sprutskalle.dk/blog/wp-content/uploads/bankoplader.p... -- to see the mathematical rigour played out on such a casual question is nothing but inspirational.


I would love to see this benchmarked against just running LZW compression on a list of the moves. I think the repeated nature of the moves means the entropy is actually rather low. It should be possible to come up with a compression dictionary which can save a lot more space than just a tight packing like this.


LZW would do very badly on a list of binary chess moves.

Huffman (and even more so arithmetic encoding or ANS (asymmetric numeral systems [0]) would be significantly better, if you're careful how you encode data.

[0]: https://en.wikipedia.org/wiki/Asymmetric_numeral_systems


Or a trained zstd dictionary.


> the repeated nature of the moves

On a single game? I don’t think the goal is to pack all plays together and unpack them every time. Every game must be individually accessible as I understand.


I went through this process many years ago and documented it here https://triplehappy.wordpress.com/2015/10/26/chess-move-comp...

Basically I demonstrate how you can compress to much less than one byte per move, but settle instead on a one byte per move scheme that is also very performant, something you'd have to sacrifice for optimal compression. I used this one byte representation to good effect in my chess GUI Tarrasch https://triplehappy.com


There was also a lengthy off-topic discussion about the compressed board representation this year [1].

[1] https://news.ycombinator.com/item?id=39065595


Lichess's equivalent blog post, using an even fancier technique.

https://lichess.org/@/lichess/blog/developer-update-275-impr...


Position to position. 12 bits. Plus occasionally, an extra 12 bits for promotion. Promotion found by impossible position prefix. Whether it’s a capture, check, castle, what piece, etc. all interpreted by a reader program. What am I missing here?


+1 You're not missing anything. And there are plenty of very simple ways to just use a few "special destinations" to encode any move including possible promotions in a constant 12 bits per move and be very fast to encode/decode. The proposal in the article is just silly.

But really the whole thing is silly. 100 million games is a lot of chess, but at 1KB per "inefficient" PGN you are talking a whopping 100GB--big deal. (At 12 bits and, say, 80 half-moves on average, you are talking ~12GB.) Plus the article says that he is "IO-constrained" but shrinking game size isn't going to help with random lookups--a PGN is already below the page size.

If you really did care about best compression you would simply assign all legal moves an index and use a heuristic (chess engine) to figure out which moves were likely (i.e. good). In chess, it's not uncommon that good players are usually picking between only a few reasonable candidate moves. I wouldn't be surprised if good compression of human games yielded something like 2-3 bits per move.


If you actually want good compression, this is definitely not the proper way to go.

The proper way to go is to compute all available moves on a given position, assign them a probability distribution and then perform arithmetic coding using it.

If you want simplicity, assign an uniform distribution. For optimal compression, use an engine such as Stockfish to evaluate how strong the moves are, and then apply a statistical model that converts move strength to weights to make a probability distribution for; also use an opening books for the openings and tablebases for the endgames.

My wild guess is that it will probably result in something like 3-4 bits per move on average.

If you instead want a simple encoding, then encode in 4 bits the piece that moved based on any order on the chessboard squares, then in 6 bits the destination square for 10 bits per move.


PGN is a highly redundant format, but it has the inherent advantage of being human readable. The problem is interesting, but I think it falls on the side of "fun" more than "profit". Storage is cheap, and PGN files are still small. An average PGN is still below 1 kilobyte. So one movie in BlueRay quality = about 20 million games. That's a lot. The practical problem is not storage, it's computation. Basically, querying the game database quickly. Compression gets in the way of that.

For example, I've just played a game, now I want to go through the opening and fetch all games from the database that went through the same initial moves/positions (that's not the same thing, as a game may arrive at the same position through a different order of moves; AKA transposition). Let's say, all the way until move 15 or 20, because it will only be at that point that a decent game finally becomes unique by deviating from all the recorded games in the database (AKA a novelty was played).

Or I want to find all games where an endgame of a Queen and a pawn against a lonely Queen occurred. There is actually a query language for that, named (surprise, surprise) Chess Query Language: https://www.chessprogramming.org/Chess_Query_Language

I feel that whatever a superior alternative to PGN might be, its strength would likely be better queryability rather than higher storage efficiency as such.


The problem I’m facing with storing roughly 600 million shorter PGNs is that the database is 100GB or so, and I’m grabbing thousands of them sort of at random. This makes the query IO bound, even though the finding the pages they’re on is virtually instant with the indexes. So a smaller database means less pages read when I do these large reads, ideally. I also have other ideas on ordering the database in a smarter way, but hoping this part helps.


Sure, I see your point. Obviously a wasteful format is also getting in the way of queryability. My point is that the main goal should be to improve for queryability, which inherently requires some optimizing for storage, but that's secondary. As opposed to optimizing exlusively for data size.

Because in the former case it may still be best to accept some compromise (in the form of redundancy/simplicity) to hit the sweet spot.

Especially in the context of many comments that seem to have taken an extremely "code golf"-like approach towards the problem.


What I haven’t seen discussed here much is optimising for developer maintenance. I think the author’s solution is great for this; it’s easy to understand and each move has a bit pattern. Easy to debug. Somebody taking over in the future will understand this compression.

If on the other hand you can squeeze another 10% storage from Huffman encoded inverse tree lookup tables that only neckbeards understand, you’re limiting your pool of people able to do maintenance on this system in the future when the author is long gone.


Hopefully you just use a library for the Huffman tables and there's not much to maintain.


It depends what the goal is. If i's to display moves to humans, then this approach seems correct (not sure if optimal). It doesn't require implementing a full chess engine just to code/decode a move, as others have suggested.

If the goal is to replay the moves, instead, you need to keep track of the position anyway. In this case I would suggest a simple scheme: piece-destination. Each player only has 16 pieces at most at any moment in a game, so 4 bits will suffice. Each piece in a given square can only move to a fixed number of destination squares. A Queen can move to 27 other squares (7 horizontally, 7 vertically, 13 diagonally when in one of the four central square). All other pieces have less freedom: Kings have 8 natural moves, plus 2 castles; Pawns have 2 captures, 2 normal moves forward, and 4 possible move-and-promote; and so on. So the 27 for a Queen is the upper limit, and that needs 5 bits. In total 9 bits per move. Unfortunately that's more than one byte, so you still need some bit-wise processing...


Some thoughts I had while reading this, probably not very original: the fact that chess moves are not random and usually adhere to opening theory to some degree means you could get some value out of using something like a trie of some small finite depth, trained on a database of games, to compress common opening lines much better, at the cost of making things like 1. a4 a5 2. h4 h5 that aren't in the trie much costlier to represent. A practical way to actually do this would likely look like applying arithmetic coding to all of the possible length-N prefixes of the game.

The mid- or lategame are also far from random and could probably be compressed in small "chunks" via a coder that would (effectively) learn to predict patterns of e.g. capture being followed by a recapture and cause such chunks to require fewer bits to represent.

I'm not very knowledgeable about compression algorithms, though; I'm sure others will be able to provide corrections or references.


That's exactly what the author seems to have done and describes in their subsequent post: https://mbuffett.com/posts/compressing-chess-moves-even-furt... (published yesterday)


You can reference specific move numbers from specific master games to compress. But you still need to compress how you store those master games. What you're describing for mid/late game might make more sense like this: you use a fixed version of stockfish to suggest moves (depth and version should be locked to static values so it reproduces the same engine move each time). If it's within the first 8 suggestions, you can flag that it's an engine move with 4 bits. Decompression time is significantly larger, but it maximizes the space for cpu trade-off.


Only tangentially related, but I think the mid/late game is a much more interesting/useful/hard avenue for chess tools.

Chess.com's opening explorer is pretty good, which it seems like this is kind of trying to compete against.

What I struggle with is the post game review, where I make a dubious move but I don't actually understand the reason why it's bad. Sure, the engine shows better moves and the refutation to my move, but I often don't know why. The automatically generated explanation is usually pretty poor.

I wish I had even a poor copy of Danya giving me commentary.

Perhaps I should just lean in and try to implement something like this, but focus it more on coaching than commentary.

https://arxiv.org/pdf/2212.08195.pdf


I think that AI, overhyped as it may be, will truly get to shine when it comes to this aspect of chess - basically, tutoring.


You could check to see if one of his games has that position using https://fenfinder.com (site I built). Of course it's a long shot given the huge possibility-space of unique positions, but hey, one can hope :).


I would be curious how this compares to a more-or-less off-the-shelf text compression algorithm like gzip. My guess is that over the entire database, this would be more efficient than the OP's ad-hoc implementation or any alternative mentioned here.


Unlikely. Gzip and the like will do well with getting rid of the inherent redundancy of ascii coding but it's a general algorithm and can't take advantage of known structure.


That’s my first thought as well. Plain old gzip should do pretty well and provides a baseline to beat.


Was discussed recent-ish-ly (Sep 2023) — this was positions, not moves:

https://news.ycombinator.com/item?id=37525348

Site linked there seems to be offline, so here's the archive copy: https://web.archive.org/web/20230928072950/https://www.ezzer...


Right now I think you could compress chess games for fun and profit .

Since chess databases have come along higher level chess players can look pretty far ahead. We are reaching the point where if you study your opponent you can pregame it but studying what openings they make and prepping for openings.

But now we have come to the point if you can memorize large portions of chess databases you know optimal play given the database because if anyone has played the game and gets any advantage out of it people will play the line. It would be interesting to take the chess databases see their compression ratio and then how long would it take to lose against swordfish.


Thinking about this has made me realize I really don’t know anything about compression other than which arguments to pass to tar to get it to compress, haha.

How would a typical compression algorithm do, losslessly compressing the whole game?

Can you lossily compress the game? If so, you would end up, after decompressing, with a game that had some ambiguous moves, right? But, only one is valid all the way through. Why not lossily compress and then step through each game, checking if each move is valid? Is that even still considered lossy?

Hypothetically you could end up with multiple valid games I guess… but they are just games, haha, who cares if a few are wrong?


It's not lossy if you can always recover the correct data.

Rather than use a general algorithm, your compressor would have to reason: I'd have to spend a bit to disambiguate x & y, but only x is a valid board state, so I won't bother and the decompressor also knows I won't bother so it will all work out.

This sort of implicit coding can work, but it is fragile.


> I run a site that stores a ton of chess lines, something like 100 million in total.

Because there are so many, one approach would be to sort the list of lines lexicographically to group similar ones together, then compress the result with a sliding window compression algorithm.

The sliding window compression will avoid storing repeated parts a second time, and the first part of the lines will be repeated a lot because of the sorting. There may also be some other repeated sequences.

This assumes that it's OK to sort the lines, though, which might or might not be true.


Funny, I was just working on this same thing last week. I ended up using 16 bits per move. As other comments here note, you could do it in fewer bits, but everything is a trade-off: with speed, size, and what the goal of your compression is. For me, fixed-size, two bytes per move seems good.

Also, I don't bother with storing capture, or check, because those you can infer from the game state. Again, depends on how you be using the result - I just wanted to be able to re-play the game state.


Has anyone tried columnar compression of full game states instead of bit packing moves?

64 squares * (Empty | (Black | White) * (King | Queen | Bishop | Knight | Rook | Pawn)) + (Black | White) * (Can castle queenside | Can castle kingside)

It could be encoded as 64 4-bit int columns + 4 bool columns = 260 bits, most of which don't change between moves. Normal columnar encoding and compression strategies could probably reduce that to a few bits per board state.


I have actually, alongside my work of compressing moves. My compression scheme averages about 150 bits per position, or about 35% of the standard text notation in EPD format.

The thing I optimized for is that there’s very often repeated blank spaces or repeated pawns on the same rank.

Also instead of storing castling status separately, it’s stored as a separate piece on the appropriate rook.

These take advantage of the fact that there’s 6 pieces, so 3 bits to encode them leaves two options remaining. One is PawnRepeating and the other is RookCastleAvailable, in my scheme.

There’s probably improvements to be made. I’ll write a post on it when it’s finalized.


Using the extra available piece indexes for "rook that can castle" and "pawn that moved two squares" is a great idea. So e.g. moving the king would not only change the squares where the king moved from and to, but would also convert both of the rooks from "rook that can castle" to "rook that can't castle". Same for a pawn that moved two squares and can be captured by en passant.

I guess you also need some way to encode which player's turn it is. Though maybe you could eliminate that by flipping the board on black's move and always encoding from the perspective of the last player's move?

I'm curious about whether a naive columnar encoding scheme could beat a more complex encoding scheme, after compression. Not columnar in the sense of ranks and files, but columnar in the sense of data science storage formats (e.g. parquet), where each 'column' is the state of a specific square across different board states. Given 64 such columns, a game state is a single row. The hypothesis being that if you're encoding all the game states of a large number of games (say 1000), after RLE and compression etc you would see a net compression better than more complex encoding schemes. Given a big block of game states like this, then a single 'game' would be a list of offsets into the game states. This would probably also compress very nicely in columnar format.

Now I want to try it...


> If you're into chess, I've made a repertoire builder embeds the link chessbook.com, It's a great way to learn and practice chess.


I had my own idea compressing chess moves, using one byte per move, because I think the number of legal moves can never exceed 256.

Further compression may be possible due to the number of legal moves may be much less, so some of the previously numbers may be used for common sequences of moves.

(It would also be possible to use some of the unused numbers to indicate e.g. resigning, agreement of a draw, and unfinished games.)


With a better coding you don't need more than 8 bits for a move (less for some).

The implementation should store a state machine for the game, a struct for each pieces, and also the board with index to the pieces (occupation).

There is only 16 pieces for each player, the two players move one another.

The Naive would be that you index the pieces (4 bits) and store the movement after that,

Pawns 2 bits for movement (2 forward, 2 diagonal)

King 4 bits for movement (normal move 1 + 3 bits, castle 1 + 1 bits)

Knight 3 bits for movement (8 possible move)

Rook 4 bits for movement ( 1 bit orientation horizontal/vertical 3 bit new position )

Bishop 4 bits for movement ( 1 bit orientation left/right diagonal 3 bit new position)

Queen 5 bits for movement ( 2 bit orientation horizontal/vertical/ left/right diagonal 3 bits for new position)

This way you need

Pawn 4 (index) + 2 (movement) = 6 bits

King 4 (index) + 4 (movement) = 8 bits

Knight 4 (index) + 3 (movement) = 7 bits

Rook, Bishop 4 (index) + 4 (movement) = 8 bits

Queen 4 (index) + 5 (movement) = 9 bits

But you can encode the pieces index in another way, that the Queen got 3 bit index and other pieces got longer index (Pawn 5 bits)

index bits + movement 000 + 5 movement Queen -> 8 bits

00100 + 3 movement King Normal -> 8 bits

00101 + 1 movement King castle -> 6 bits

0011x + 3 movement Knight (2) -> 8 bits

01xxx + 2 movement Pawn (8) -> 7 bits

10x + 4 movement Bishop (2) -> 7 bits

11x + 4 movement Rook (2) -> 7 bits

Also other bit allocations are possible, I don't know if it's worth it to make the Pawns index 1 bit longer, but in this way the max 8 bits required for a movement. (You need the state machine for the "decompression", also have to track that the black and white Pawns are moving in the opposite direction. )

There can be further compression, if you check the possible move for the piece. ex. the first possible Knight movement is only 1 place, so no need to store the movement. But this need more calculation.


Anyone know of a discussion of say, 8 bit chess computers and their techniques and tricks?


Is this what you are looking for?

https://news.ycombinator.com/item?id=36431917 - Video Chess disassembled and commented (2023-06-22)

https://nanochess.org/video_chess.html

Video Chess for Atari 2600 worked with just 128 bytes of memory.


SCID (the open source chess database) has a very efficient scheme.

https://scidvspc.sourceforge.net/doc/Formats.htm


For optimal compression, look into Huffman coding: https://en.wikipedia.org/wiki/Huffman_coding


I mention Huffman in my follow-up post: https://mbuffett.com/posts/compressing-chess-moves-even-furt... , but chose arithmetic encoding instead.


I think Huffman would be good, but I don't think it would be optimal (as in, the smallest possible amount of bits)


Huffman requires minimum one bit IIRC whereas Arithmetic coding or ANS ( https://en.wikipedia.org/wiki/Asymmetric_numeral_systems ) can use fewer if there's a skewed distribution.

Arithmetic coding has been avoided in practical situations due to patents, but many of those have expired ( see the patent section of https://en.wikipedia.org/wiki/Arithmetic_coding or maybe not, lawyers tend to advise not even reading about patent specifics )


The core tech patents have expired. They did have some interesting ones around hardware implementations, iirc.

Overall the chilling effect was interesting - I found that many people doing academic work around compression didn't really know much about it, didn't teach it, just because it was a mess.

The core idea is elegant, and easily implemented.


I wonder how this compares to using zstd with a dictionary.


I feel like zstd trained on a randomized sampling of games from the database would come very close to (or better than) this with nearly zero effort.


Just Huffman code the text.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: