Hacker News new | past | comments | ask | show | jobs | submit login
How some common material imbalances affect your win-rate (lichess.org)
163 points by Luc on June 13, 2023 | hide | past | favorite | 90 comments



Honestly, two rooks for a queen being a disadvantage is kind of a relief.

I’ve always been taught the two rooks are better and who can argue with 5 + 5 > 9? But also, I’ve also lost almost every game where I’ve had the rooks. I always thought that I just needed to be a better player to take advantage of it. Glad to know it wasn’t just me.

Just goes to show that these shortcuts, like the point system, are only heuristics, and pretty shallow ones at that. Knowing that being up a bishop gives you slightly more of an advantage than a knight is better than nothing. But learning in which sorts of positions a knight is actually better than a bishop will give you a much deeper understanding (and correspondingly more wins).


Yeah, agreed. 2 rooks vs queen is much easier to play for the queen in most cases, I've found.

The queen can be good as a single piece whereas the rooks have to coordinate, which is hard when there's a queen on the board ready to fork. Often the rooks end up having to defend eachother, and if this happens suboptimally they can become very immobilised on a useless file or rank whereas the queen can fly around the board attacking stuff at will.

Like you said, material imbalance is very difficult to evaluate and understand. I generally recommend not to incorporate them into one's play until ~1700 FIDE elo. Though sometimes you're forced into it of course.

It's also a matter of style. I personally love materially imbalanced positions, and I'm pretty good at them, so I incorporate a fair bit of exchange sacrifices into my play. Because then I often get positions I understand better than my opponent and that makes up for the material on its own, I find.

But other players just prefer a different style of play and maybe shouldn't go for it even if objectively it's the best move, because they'll end up misplaying the position.


As someone low in rating, there are few things in chess more satisfying to me than when the engine confirms I was correct in sacrificing an exchange or a full piece for some sort of positional advantage. The opportunity itself comes up very rarely for me, so recognizing it just makes me giddy.


Yeah, there's nothing that feels better in chess than that. I still remember my first win in the norwegian championships(in the noob category, not bragging, they let anyone play in it) almost 20 years ago(yikes); found, correctly calculated and played a rook sacrifice leading to mate(it was like a 3 move combination, all checks, nothing too fancy but I did consciously threaten it the move before). I also remember walking into mate in 1 in another game that tournament, the rest is a blur.

And in fact most of the wins I remember the most clearly involved some kind of well-found sacrifice. The others tend to blend together too much in my mind. And most of the losses I remember most clearly involved horrible blunders.


edit: my comment was silly, removed.


I don't think being up a full piece counts as material imbalance.


You are absolutely right, I deleted my comment as it was misleading.


The author doesn't list which time control he studied (maybe a mishmash of bullet, blitz and rapid?), but I'm thinking that shorter time controls favor the queen.


The pieces definitely vary in value by position. Chess engines that people have no chances of beating will occasionally sacrifice a rook to take an enemy knight on a good outpost square (past ability to get attacked by enemy pawns and attacking enemy space). Hard to argue they don't know what they're doing when they can beat all humans.


This thread makes me want to build a simple chess-practice tool that works like this: 1) Get a big database of positions

2) Use StockFish to evaluate all the positions

3) Pick a position at random, show it to the user, and ask for their evaluation

4) User gets a score based on the difference b/t their eval and StockFish's.

The idea being that this could allow you to rapidly hone your position-evaluation skill. That might be a faster way to improve at chess than just grinding through many games.


Stockfish eval is often misleading because some positions are very sharp - if you see mate in 15 you’re ahead, but otherwise you’re far behind at the most pathological case.


You could do some basic filtering on e.g. top 3 moves are all within 0.5 or something


I think the problem then is a position where there is only one winning move that is 8+moves away. If one is mate in 10 and the others are mate in 3 for your opponent how do you reconcile those into one evaluation?


Not to rain on your fun, but this already exists. I watched Hikaru Nakamura play it online once. It just gives a random position and you have to guess the centipawn loss of the position. Kinda like Geoguessr for chess.


Do you know what the site is?


There are endless tactics trainers out there.

But it takes far more to build something useful for humans than just asking stockfish to give you the best move.

For one thing you don't want to learn to play like machines. Often the moves that machines make are terrible human moves. Machines walk into extremely dangerous positions all the time; positions where you need to make a series of only moves to gain a small advantage. Humans shouldn't be doing this.

Machines also rely on tactics heavily. Unless you understand the tactics in play it will make your play much worse. It might appear to you take a piece can't be taken, but stockfish says it can. Ok? Unless you understand the reason why, you aren't going to gain anything from this exercise.


I wonder how true this will be over the next decade or so when the top humans are people who have trained with machines their whole career.


Sort of like this (but without guessing the best move):

https://makotoe.github.io/guess-the-eval/


The problem I see with this is that it’s not very useful for people to know if something is a +3 or +4.1 position. The qualitative descriptions that underlie that rating is what’s important. How does my pawn structure compare to my opponents? What is the balance of minor pieces (knight and knight vs bishop knight, BB vs KK, etc.)? That’s how you get to understand chess.


Yeah, this is probably better for a game like Go where evaluating raw point differentials is a common thing that professionals (and even strong amateurs) do.


I just play often 10min games on lichess, then look at all the mistakes I made and try to memorize simple rules of thumb for better moves.



A guy on reddit made a website that does exactly that, but I cannot find anymore at the moment


Chess engines already do this.


Am I misunderstanding something or is there a glaring statistical error here? I think this analysis works if the matches are all between two players of truly identical skill that will roughly win 50% of the time against the other, but that's not really the case here. They attempt to do that by controlling for Elo but I'm decently sure that you could pick any two players of the same Elo on lichess and one will win the majority of games.

Now if we assume there is often a slightly better player in these games, the better player will more likely get into an advantageous position early in the game and win more often, but not entirely because they got into the more advantageous position. What I'm trying to say is that someone who is up a rook will win partially because they're up a rook and partially because they are likely a better player in the first place and will continue to play better than their opponent.

I think to do this study correctly you'd need to place new players in a random position drawn from the dataset and actually evaluate the win rate without the confounding factor of having gotten into that position in the first place.


> I'm decently sure that you could pick any two players of the same Elo on lichess and one will win the majority of games.

Interesting hypothesis, but my intuition is that this effect is small. It should be easy to measure. Just take random matches, and study winrate when a player has beaten their opponent once before.


Players can also swap back and forth between winning a majority of games, so it depends on the amount of games we're talking about.


That's exactly what Elo controls for: all things, being equal, what is the skill component. Now there could be a second-order effect in which a higher Elo player is even better at winning from an advantageous position, but given large enough samples this should be in the mix.


Yeah I guess my claim is that two people of the same Elo on lichess are likely to not be perfectly equal. If the average winrate of any random two players with the same Elo is something like 60/40 (random speculation) then you'd bias the results in the article enough to create the kind of graphs shown.

The problem is that this doesn't really average out with more samples.


Over a long series of games, two players with equal Elo playing the same number of games as white will have a 50/50 score.


I mean I agree that's the ideal but not necessarily true in practice. I bet if you took two random players of the same Elo and pit them against each other many times, it would not be 50/50.


True, but that error will average out across a large section of players, and lichess has probably the largest database of players in the world.


I accept that players with the same ELO can have unequal strength.

But I don't see how that wouldn't average out over lots of games?


The probability of player A being up a rook over player B is higher when player A is better than player B, even if their Elo is the same. So the author has picked scenarios where player A is already more likely to be better than player B but assumed they were of equal skill, which will cause the results to be biased.

If I have time later I'll try to whip up a small coin-tossing demo in Python that demonstrates what I mean, I'm definitely not the best explainer!


OK, that's a reasonable argument at least.

My gut feel is that the effect is minimal at best, but I don't have an argument for that.


Usually a chess game is started and finished by the same player, so why does this matter? The question you want to answer is "I'm up a rook, how likely am I to win?", which this analysis answers.


> I'm decently sure that you could pick any two players of the same Elo on lichess and one will win the majority of games.

I think that's true at 2000+ but at my rating (500) it's just a comedy of blunders


I'm surprised that in the higher ELO ranges, being up a clean pawn has only a comparatively minor impact - I thought that one clean pawn ahead and no other major positional or structural disadvantages was more or less "GG" among top players.


I suspect the author only counts raw material. So players who lose a pawn but gain compensation reduce the perceived impact of losing a pawn. Including many openings where you temporarily give up a pawn (queen's gambit, benko, etc).


I was thinking the same. It could be true for higher ELO players being down a pawn is a good thing, since your opponent took the bait.


Just a friendly pedantic point, it's "Elo", not "ELO". Rather than an acronym, it's a person's name; that being the rating system creator, Arpad Elo.


If you want to be fully pedantic, it's Élő, not Elo - the way in which you pronounce the former is dramatically different from the latter.


To be correctly pedantic, it's named after Élő but it is called the Elo rating system


if you want to be fully fully pedantic its Glicko-2 rating system on Lichess, while FIDE uses Élő.


Rather, players have a FIDE rating within FIDE, and a unique rating in each system, because no organization follows the exact, original Elo system.


For years the guy calculated the FIDE ratings by hand...


Many pawn-up endgames are still technically a draw (king and pawn vs. king [1] in the most extreme example)

[1] - https://en.wikipedia.org/wiki/King_and_pawn_versus_king_endg...


Take care that not all of king and pawn versus king are drawn. Their result depends on relative position of kings.

Nice example of more extreme imbalance is that sometimes King+Queen vs King+Pawn is drawn - that is when pawn is on seventh rank of bishop or rook file with defending king in front of it, while attacking king is not close enough to force checkmate.

See: https://en.wikipedia.org/wiki/Queen_versus_pawn_endgame#Quee...


Yeah. More commonly rook endings are notoriously hard to win.

I gained a lot of rating once I started really studying the ins and outs of all the different rook and pawn vs rook endings. Because no matter what the tablebase might say, you have to be very precise with or without the pawn.

So I started getting a lot of wins out of equal positions, and got a lot better at saving a draw in bad positions.

Pawn and minor piece vs minor piece is also fraught with drawing chances, especially with bishops. Because you can just sac the piece for the pawn and there's insufficient material for mate.

Queen and pawn vs queen is basically a 3 result game at the club level. So many ways to blunder your queen. And at GM level it tends towards a draw I think. Though it's so stupidly complicated I never even tried to properly study it.


> Yeah. More commonly rook endings are notoriously hard to win.

Oh, so it's not just me.


There's a famous saying in chess: "all rook endings are drawn".

Of course, this isn't actually true. But it's a useful rule of thumb to just assume it's a draw unless you're sure it's a win, because it probably is, fucking somehow.


That depends both on the position of the pawn and the king...


You are correct, but this uses Lichess ratings, which are a bit inflated, and probably a variety of time controls including blitz. At levels of play below super-grandmaster with full time control the accuracy of play can be rather low, and picking a suboptimal move will often cost you half a pawn (or more) in engine evaluation. Add this up over an entire game and a one pawn advantage is easily lost in the variability of both players making compounding errors.


Neat results! I am very much a rank amateur, but it looks like these results roughly align with the traditional 1-3-5-9 point values for the pieces, with a couple exceptions. Advantages of a single pawn, minor piece, or rook have proportionate values similar to what you'd expect. In particular, bishops and knights are very close with bishops being worth a tiny bit more. Winning the exchange is a sizeable advantage. Trading a rook for two minor pieces is an advantage except at the lowest skill levels.

Two rooks turn out to be significantly worse than a queen instead of slightly better. But the most surprising thing to me is that having the two bishops seems to be worth almost nothing -- close to 50% odds across skill level. A single pawn advantage is more valuable! (The article says "4-5% more likely to win" at Elo 1200-1400, but that doesn't match the graph.) These surprising results were also more consistent across skill level, while the well-known advantages are worth significantly more to skilled players.


> In particular, bishops and knights are very close with bishops being worth a tiny bit more.

I don’t think so, and I would like to see the author look in the data for 2 knights vs 2 bishops. My gut feeling is that knights are hard to block and can attack every square, while bishops can only attack their color, so half the board. They can’t protect each other and rely on pawns for this.

I’m happy to trade my knights for the pawns, but I would like to see if I’m wrong.


Bishops have a great deal more mobility than knights. A knight in the wrong corner takes several moves to catch up to a distant passed pawn. A bishop needs far fewer.

Generally the trade off is a bishop pair, where you do have all the squares covered, vs the slower knights, who are also the only pieces that can threaten a piece without being threatened back.

The advantages are highly position dependent, with closed games largely going to the knights and open games going to the bishops pair.

But most games turn into open games at some point, so statistically the bishops end up better.


Really nice insights.

Though

"Even at lower Elo ranges like 1200-1400, you're 4-5% more likely to win if you have the only bishop pair. 1200 Elo players don't know how to take advantage of having the bishop pair, and yet it helps them win all the same [...] A 1000-1200 Elo player is only about 8% more likely to win when up two minors for a rook."


Nice article!!

But this is not what you're testing. You're not determining the effect of imbalance alone, but imbalance + particular position of pieces. To determine the effect of imbalance you should study how does not having a certain piece since the beginning affect the result of the game.

Also, being a pawn up in the opening is much more different than being a pawn up in the end game. It would be interesting to redo this analysis but splitting by opening, middle game, and end game.


I am amazed that you think the author doesn't already understand that.


Well, I don't know if the author understands that or not, but I guess they do. I'm just pointing to the fact that the analysis can be misleading, and I just shared a proposal that I think it would make it more interesting :)


If you take into account all variables aren’t you just writing a chess engine at that point?


Maybe I'm just a choke artist but if I win the exchange I'm probably losing the game. Opponent always seems to "turn on" and I just get blown away


I wonder how this would change if you only looked at highly rated chess engines. For example, is a rook pair really not as good as a queen or is it just that _humans_ aren't as good at using the rooks effectively?


Would be interesting to see material imbalances as a function of plies into the game. As in, we know an extra pawn is more important in the endgame than in the opening, but how much more?


"1200 Elo players don't know how to take advantage of having the bishop pair."

How exactly does one take advantage of having a bishop pair? Why is that not obvious to players under 1200?


Generally the advantage of a bishop over a knight is its increased mobility. The bishop pair is more mobile than N+N or B+N and can control both color complexes. To exploit the mobility advantage, good players tend to try to "open up" the position more to increase the scope of the bishops. A classic example of a strategic goal if you want to play for an open position is to keep the pawn structure fluid (as opposed to locked up), which can be too subtle for weaker players.


I'd love to see the winrate data for Stockfish against Stockfish, for each data set. To get a sense of what the objective winrate is.


Once a computer gets to an advantage of +/-1.0 or so, the win is virtually assured in all positions, barring a few weird theoretical draws and misevaluations of the score of the position (e.g. closed positions with no way to make progress), although the latter problem has gotten a lot better with NN engines.

(I used to play computer correspondence chess.)

Edit: I should add that +/-1.0 is roughly a pawn's worth of material. So a computer up a knight or bishop -- nominally 3 points of material -- should virtually always win.


In case the author sees this, the chessbook discord invite isn't working (at least not for me). bummer... this looks great!


Thanks for mentioning this, has been fixed!


What does "clean" mean in this context? What makes you up a pawn "cleanly"?


I agree, this is the first annoying question to ask. As a seasoned club player, i believe to know that "clean Pawn up" means chess wise roughly what you think it means, namely: a) You are in a position, where it really does make sense to count pawns, a so called "quiet" position would be one of those. b) Your opponent has no pawn compensation of any kind, not even in the form of a nuisance. I personally learned this expression from GM Nakamura. Clean pawns were unknown before the "golden era of chess", you could utilize the idiom "ceteris paribus" instead, this was used by the old german vice WM Tarrasch.

Hence it should be clear to anyone that without engine assistance it is totally impossible to determine for 10^6 games if a certain material difference was "clean" or not. Most likely he just checked for a stability over n consecutive moves, at least this is the usual way. Others have noted more problems, that might help put his highly deviating, washed out results in order.


Contrary to the definitions in the other answers, I'd assume that this refers to selecting positions from the database where that's the only difference in material - so "clean up a pawn" means that all the other figures are the same, that there isn't also a bishop vs knight difference, for example.

I really doubt if there was any attempt to check if there's some "strategic compensation" - how would you do that at a large scale? I doubt that even running a solid chess engine evaluation on all these positions is feasible, you need something where you can simply/cheaply filter positions from the database and then just count the winrate.


Apart from what WJW said, it's not an objective criterion but it essentially means that you don't lose out in terms of initiative or position. A counterexample might be accepting the queen's gambit with black (i.e. 1. d4d5 2. c4cxd4), that way you win a pawn but you arguably give white an advantage in development (an in many cases white will be able to recapture the pawn as a result)


"Cleanly" in chess parlance means you're up a pawn, and there is no "compensation" for the other player. I.e there are no other advantages in the opponents position that make up for the fact they're down a pawn(things like an attack on your king, piece activity, pawn structure).

In simple mathematical terms a clean material advantage is:

Overall eval >= material eval

If overall eval < material eval, then compensation(other player) > 0 = "the opponent has some compensation for the pawn"


I agree with this question. How was clean determined in the dataset?

Knowing how the word is used in chess is not enough to know where the line was drawn in the data here, which is the basis for the entire analysis shown.


I guess you put the game position right after the pawn capture into a good chess engine and check if you're one point up a a result.


This would mean that, in the case of "up 2 rooks for a queen," the board would evaluate to a player's advantage even though that player would tend to lose at the highest level. I think something weird must be going on.


From the other imbalances listed (like being up "the exchange"), I take it that they mean being up a pawn without having to give up anything in return. For pawns that is kinda obvious, but for more valuable pieces like a rook you can win it "cleanly" or maybe you take it with a pawn and then immediately lose the pawn.

You can of course give up material for positional advantages as well, but from the article I don't think the author analyzed that. It would be difficult to accurately measure that anyway, some positions can be up +5 points according to the computer but only if you find 20 perfect moves in a row. Needless to say, most humans would not find those moves especially in the lower ELO brackets that the article analyzed.


Should have been defined (along with more information about the data set), but I assume it means that the opponent has no tactic that wins them material. For instance if I just take their protected bishop with my bishop then I'm briefly up a bishop. So it would not include that situation or where the trade is more complicated.


I think it means that the opponent does not gain a strategic advantage from an exchange. You both have good enough positions and you are simply a pawn up.


(Please Lichess, this is not ELO)


Honestly "Elo" is commonly used to denote most Elo-like systems. I don't think it's confusing here.

Also, it's "Elo" not "ELO".


One of my favourite things on the internet is when someone tries to make a pedantic correction and someone else comes along with something that's even more pedantic.

Thanks for giving me a good start to the morning!


"The Elo system was invented as an improved chess-rating system over the previously used Harkness system" - I can use a spoon to repair my coffee machine. It still wasn't invented for that.


This is, if I see correctly, not official Lichess blog, but personal blog of marcusbuffett (hosted on lichess). Lichess offers personal blogs for some months now.


What does this mean?


Quoting Wikipedia because I'm lazy:

> The Elo rating system is a method for calculating the relative skill levels of players in zero-sum games such as chess. It is named after its creator Arpad Elo, a Hungarian-American physics professor.

https://en.m.wikipedia.org/wiki/Elo_rating_system

There are other rating systems that are similar, but people commonly call any similar rating system "Elo". In this case, Lichess uses https://en.m.wikipedia.org/wiki/Glicko_rating_system, so the GP is arguing against calling it "Elo". I think it's kind of a pedantic point given that more people will understand "Elo" than "glicko-2", but I suppose "rating" would be clearer than either term.

https://en.m.wikipedia.org/wiki/Chess_rating_system


lichess uses glicko 2


OP is making an unimportant pedantic point.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: