Does GTO Strategy Create the Potential for an Evolutionary Stable Strategy?

Posted by

Posted by posted in Gen. Poker

Does GTO Strategy Create the Potential for an Evolutionary Stable Strategy?

Let X and Y represent different GTO strategies for NLHE that are at Nash Equilibrium with one another. Would, for example, a 6 max table of 4 people playing X and 2 playing Y result in X being more profitable than Y?

38 Comments

Loading 38 Comments...

GameTheory 8 years, 5 months ago

This question makes no sense at all.

First off, the only sensible definition of GTO strategy is a Nash Equilibrium definition. So using both terms in the same sentence is just silly.

Second, a Nash Equilibrium at a 6-max table means that six(!) players are at equilibrium. Not just two.

Third, positions matter in 6-max NLHE. A strategy is a function that assigns an action to every potential point in the decision tree. Clearly these decision points are different for every player at every decision, because it matters how many players have folded before him. As a result, the Nash Equilibrium strategies will be different for every player/position.

Lastly, a new hand means that all players are in a different position. They are not commited in any way to their strategy from the previous hand or any subsequent hand. So if one player is very loose on the BTN, he can be very tight in the CO and vice versa.

MuckMyNuts 8 years, 5 months ago

I don't disagree with anything you say. I guess I didn't word my question well. I'm wondering if this theoretical situation can occur at a poker table, specifically what is described in the section "Nash equilibria and ESS."

https://en.wikipedia.org/wiki/Evolutionarilystablestrategy

I'm wondering if this concept applies to something as complex as poker. It's purely hypothetical and not of any strategic value, but it makes me wonder, if you were forced to play at a horrible table with 5 other players playing exactly the same GTO strategy for every imaginable situation, would you have no choice but to emulate their strategy, or automatically become the biggest fish in the game?

In nature my understanding is that there are certain things that can disrupt ESS states that have existed for generations, often due to human shortsightedness (eg. introduction of foreign species or bacteria/virus). Got me thinking about whether there could be a strategy that would break an ESS poker table, if such a table could even exist, and what that strategy might look like. My mind goes to strange places when I am trying not to go crazy around my family (Canadian Thanksgiving).

antihero 8 years, 5 months ago

From the wiki article:

Evolutionarily stable strategies are motivated entirely differently. Here, it is presumed that the players' strategies are biologically encoded and heritable. Individuals have no control over their strategy and need not be aware of the game.

Interesting article, but obviously this has no relevance to poker at all.

Also by definition, in a Nash equilibrium no player has an incentive to change their strategy to achieve a higher payoff. So there can't be an exploitative strategy that is +EV against a whole table that is in a NE state.

MuckMyNuts 8 years, 5 months ago

It was this part of the article that made me particularly curious:

Some games may have Nash equilibria that are not ESSes. For example, in Harm thy neighbor both (A, A) and (B, B) are Nash equilibria, since players cannot do better by switching away from either. However, only B is an ESS (and a strong Nash). A is not an ESS, so B can neutrally invade a population of A strategists and predominate, because B scores higher against B than A does against B. This dynamic is captured by Maynard Smith's second condition, since E(A, A) = E(B, A), but it is not the case that E(A,B) > E(B,B).

If you look deeper into game theory there are more situations where Nash Equilibrium can be disrupted by adding additional actors. Specifically for poker:

http://blog.gtorangebuilder.com/2014/03/gto-poker-outside-of-heads-up-what-it.html

If you scroll down to the section entitled GTO 3-handed The author argues that even if two players are playing different strategies at Nash Equilibrium, adding a third player to the game who is playing less than optimal and making mistakes will cause one of the original players' strategy to become -EV.

Nothing about the definition of a Nash Equilibrium guarantees that the button's change in strategy won't also decrease the hero's ev.

If you imagine that the Big Blind player is a smart reactive player it can get even worse!

It is important to note that the above are not due to ICM, they appear even in cash games. In SnG situations where ICM is a factor there are even bigger and more obvious instances where the presence of a fish can make a nash strategy -EV.

The author only analyses instances of push/fold but it seems to me his reasoning is sound and would extend to other poker situations. I'm curious if other people have gone beyond his study of GTO in these situations, considering this post was made in 2014.

GameTheory 8 years, 5 months ago

The fact that you can deviate from the Nash Equilibrium 3-handed and hurt one player, while gaining no EV yourself shouldn't be a surprise. But this has nothing to do with invading a population with a better strategy to gain any EV for yourself.

For instance in a satelite that pays 2 tickets, when you and another player have 20bb and one player has 2bb, you can call the shove of the 20bb player with 33. You hurt yourself and you hurt the other bigstack, and the 2bb player is very happy.

GameTheory 8 years, 5 months ago

I'm wondering if this concept applies to something as complex as poker. It's purely hypothetical and not of any strategic value, but it makes me wonder, if you were forced to play at a horrible table with 5 other players playing exactly the same GTO strategy for every imaginable situation, would you have no choice but to emulate their strategy, or automatically become the biggest fish in the game?

By definition, if you have five players and six GTO strategies (X@UTG, X@UTG+1, X@CO, X@BTN, X@SB, X@BB) and these five players play their respective strategy for the position they have at the table, since the complete sextuple of strategies is a Nash Equilibrium, if you join as the sixth player at this table you must emulate the strategy for your position that they play. You can make some deviations in spots where you are indifferent, but by definition you cannot gain any EV.

However, if you join with two players, this is no longer true. It is (theoretically) possible that by two players deviating from GTO, at least one GTO player will lose EV while a deviating player gains EV.

jonna102 8 years, 5 months ago

There's something that doesn't make sense to me here. You're both talking about the best strategy as if it were some fixed unique strategy that a player can choose in isolation. But if you change the strategies involved, it should be obvious that the best counter-strategies also may change. I can't imagine that the equilibrium(s) for 6 players would indicate the exact same strategies as the equilibrium(s) for 5 players. If somebody plays in a game with changing conditions but without adapting to those changes, that has nothing to do with game theory. That's just plain silly and irrational.

The other thing is that you're suggesting that each individual hand of poker is an asymmetric game because of position. That's of course true, and there are several more asymmetries involved as well. But it becomes a symmetric game if we assume the same players to continue playing an infinite number of hands. That's clearly just a theoretical model, but I think it's the one more commonly used when considering poker from a theoretical perspective.

Of course, it always makes sense to choose the model with respect to what we want the model to achieve. I don't quite see what benefit an ESS perspective would give. But the plain nash equilibrium is equally a fairly poor solution concept for many things that it's used for in poker, so it certainly makes sense to consider refinements.

MuckMyNuts 8 years, 5 months ago

Thanks for this. Interesting to me that in three-handed you could break a Nash Equilibrium but when six-handed it would not be so. I guess it does stand to reason that unless the Nash players diverged from their strategy to play an exploitative strategy to account for your presence, you would just slowly (or quickly) lose money based on your play. If one did diverge to exploit you, then I guess all players could benefit from diverging, but otherwise they would just all happily collect the dead money in the game while sticking to their pre-established strategy.

I imagine that, while 3-handed the presence of a player making mistakes will make one player sticking to his GTO strategy -EV compared to the other GTO player, simply by relative positioning, in 6-handed game this effect would be diminished such that even if one or more players' EV diminished relative to the other GTO players, the player playing a new strategy would by far have the lowest EV, and the other changes in EV would be minuscule.

Funny that you made this post (which inadvertently answered some of my further thoughts) while I was writing the above post.

jonna102 8 years, 5 months ago

I don't think there's a lot of practical knowledge about multi-player equilibria in poker in general. It's easy enough to say things about the existence, but beyond that I don't believe it's even known if there are efficient algorithms to calculate such equilibria. I know of some tools that can still calculate simple ones, but none of them are poker solvers. This is something I don't hear mentioned a lot when people use 2-person solvers in order to draw conclusions for multi-player games.

antihero 8 years, 5 months ago

The thing is, a Nash equilibrium doesn't mean "best strategy". It just describes a game state, in which none of the players has an incentive to switch to a different strategy. Think of two players playing rochambeau and each of them plays exactly (1/3 rock, 1/3 paper, 1/3 scissors). The players are always gonna break even long-term but there is just no possible way to switch to a strategy with a higher payoff even in full knowledge of your opponent's strategy. If you know that your opponent plays (1/2 rock, 1/2 paper) then ofc you wanna switch to a 100% paper strategy but then you're no longer in a NE (as now your opponent has an incentive to switch his strategy to exploit yours, etc.).

We could see one orbit of poker as a symmetrical game, then the NE for the whole game would be playing the NE in each subgame (hand). We could also assume poker is an infinitely repeated game. In some examples in game theory, equilibria in repeated games can differ from the single-stage NE. But this is mostly because of cooperation/punishment or discounting, both of which shouldn't play a role when considering poker.

But the plain nash equilibrium is equally a fairly poor solution concept for many things that it's used for in poker

It's absolutely true that the NE won't tell us what to do in a certain spot against a certain player. I think you can still learn a lot from studying "pseudo equilibria" (i.e. solver solutions).

Edit: This was meant as reply to your first post jonna... just took me 30min to write haha

jonna102 8 years, 5 months ago

The thing is, a Nash equilibrium doesn't mean "best strategy". It just describes a game state, in which none of the players has an incentive to switch to a different strategy.

Sure, this is the point I want to make, but maybe I didn't express that clearly. People talk about "the GTO strategy" as if it were the best strategy. We clearly know that there is no such guarantee. But why would we ever want to play anything but the best strategy in a game like poker? I think that this confusion where a lot of the silliness is coming from.

We may still disagree on what the "best" strategy is. Some would think it's the max exploit strategy, but there's a cost/benefits tradeoff that also needs to be considered. But that's a whole different thread.

In some examples in game theory, equilibria in repeated games can differ from the single-stage NE.

Repeated prisoner's dilemma is perhaps the classical example of this. PD also shows how NE can be inefficient, which we may or may not be interested in when it comes to poker. But one thing I do find interesting is that you also mention collusion. Even just 3-player equilibria allow for some situations that could very well lead to implicit collusion in practice. It can also be argued that poker players do not necessarily represent rational decision makers (think tilt), and suddenly the foundation for NE becomes a lot more shaky.

I still agree with you that we learn a lot by studying equilibria, but we also need to remember that solvers and solutions are quite possibly much more complex that we usually give them credit for. So much so that some are already coming up with ways to exploit "GTO players" who misunderstood concepts.

jonna102 8 years, 5 months ago

It's easy enough to say things about the existence, but beyond that I don't believe it's even known if there are efficient algorithms to calculate such equilibria

Just a clarification, I had to look it up. It looks like there is a 2005 paper that proves that such algorithms are indeed known. I don't believe that paper deals with games with imperfect information, but another paper from 2009 seems to settle that as well.

MuckMyNuts 8 years, 5 months ago

With regards to the algorithms, would they be along the lines of solving for multiple actors in a Bayesian game? I only took an introductory game theory class as part of an econ degree so I don't really have a firm grasp on how to learn about more advanced topics. Do the formulas on Wikipedia hold up?

https://en.wikipedia.org/wiki/Bayesian_game

A Bayesian Nash equilibrium is defined as a strategy profile and beliefs specified for each player about the types of the other players that maximizes the expected payoff for each player given their beliefs about the other players' types and given the strategies played by the other players.

If the variables could be accurately defined (which would be infinitely hard), it seems like this is, in theory, a formula for maximum exploitative play. Would this be the exact opposite of GTO play (in that it accounts for our beliefs about each player) or are the two relatively closely linked?

jonna102 8 years, 5 months ago

I don't see any formulas on that page, so I'm not quite sure what you are referring to? I also don't quite know what you mean by "exact opposite of GTO play". I don't see how beliefs would cause something that we'd perceive as the exact opposite to occur. But what's the exact opposite of an equilibrium? It's not immediately obvious to me how you would define that.

MuckMyNuts 8 years, 5 months ago

I can't copy the formula here (its just a bunch of html code), but they define a Bayesian game as G=(...), and then define a Bayesian Nash Equilibrium as Gprime=(...). In theory, if we could calculate they Bayesian Nash Equilibrium it might give us a more accurate/ detailed model of a poker decision than a simple Nash Equilibrium. The example of the sheriff game at the bottom, as well as the example of the chicken game in the related article about correlated equilibrium, illustrate situations where simple Nash Equilibrium achieve less than optimal results.

I again didn't word what I was saying very well, or think it through well enough. People in the community always seem to compare GTO play and exploitative play as if they were polar opposites. My original understanding of a Bayesian Equilibrium was that it described a state where we were maximally exploiting our opponent, but of course this is wrong because for it to be an equilibrium our opponent would need to be maximally exploiting us, and therefore it is actually just a typical formula for Nash Equilibrium when we have imperfect information. So I hastily jumped to some incorrect conclusions.

What is quite interesting to me now is the idea of Bayesian-optimal mechanisms. They are typically used as an approach for setting prices with imperfect information about the market or potential consumers.

If we assume that our opponent's willingness to call a bet is represented by a single-parameter utility function (fancy speak for he will call anything less than a certain amount and fold to anything more than said amount), this approach could be a useful way to look at bet sizing. I haven't actually played around with it, and it may be quite clunky and not the most efficient way to study/build a strategy, but always fun to find new ways to think about poker decisions.

jonna102 8 years, 5 months ago

Hmm, so what you're calling formula is just the specification of the game? Those specifications just give names to things that need to be formally reasoned about. They're not giving any other interesting information really.

Sheriff's Dilemma is of course the classical example to describe bayesian games. But you're going to run into some trouble if you try to apply this to poker. You'll need some idea of what the beliefs are, but you'll quite likely end up with way too many solutions and not really have gained much unless you can also say something more meaningful about the beliefs.

People in the community always seem to compare GTO play and exploitative play as if they were polar opposites.

Well they are probably not game theorists then :) This is a common misconception. An equilibrium is just what happens when two (or more) players max exploit each other and both make proper adjustments. For some reason this has been named GTO play, but I think it's a misleading term and not one that I use a lot.

MuckMyNuts 8 years, 5 months ago

Yay! People smarter than me are talking. :D

So much so that some are already coming up with ways to exploit "GTO players" who misunderstood concepts.

Yeah, I think GTO strategy is misunderstood by most who try to practice it and thus can be manipulated, if you can figure out the individual misunderstandings and where they create repetitive errors. Myself, if I were to attempt to base my play off of some GTO strategy, having not put in any work with CREV/PIO/GTORB other than watching videos, would just become a huge fish and easy to exploit by any semi-competent player, even those with no concept of GTO play.

I do agree that it is a very useful concept for building our strategy around, and am planning to purchase PIO to do some work once I can wrap my head around the $250+ price tag. I'm a nit with my life bankroll.

GameTheory 8 years, 5 months ago

Thanks for this. Interesting to me that in three-handed you could break a Nash Equilibrium but when six-handed it would not be so.

Wrong. Think about the satelite that gives two tickets. You shove with 25bb, I call with 25bb and 33 again, the other 4 stacks have 2bb. Clearly I deviate from GTO, and I cost myself and you EV. All the other players gain EV.

Also there is no 'breaking' of Nash Equilibria three-handed.

MuckMyNuts 8 years, 5 months ago

Also there is no 'breaking' of Nash Equilibria three-handed.

Wrong. If the presence of a player making mistakes provides incentive to diverge from a previous Equilibrium strategy, then by definition you have broken the equilibrium.

GameTheory 8 years, 5 months ago

Wrong. If the presence of a player making mistakes provides incentive to diverge from a previous Equilibrium strategy, then by definition you have broken the equilibrium.

If one players persists in diverging from the equilibrium then he will not reach a new equilibrium. To reach a new equilibrium it is required that the diverging player has no incentive to change his strategy. When no new equilibrium is attained, it is wrong to talking about the 'old' equilibrium being broken, the 'old' equilibrium is and always will be an equilibrium.

Consider 400bb HU NLHE push/fold, the equilibrium is to push AA,KK,AK and to call AA,KK. Now if you start pushing any two cards in this game, do you really break the equilibrium just because you give me an incentive to deviate from only calling with AA,KK?

ZenFish 8 years, 5 months ago

Antihero: The thing is, a Nash equilibrium doesn't mean "best strategy". It just describes a game state, in which none of the players has an incentive to switch to a different strategy.

True, but when you study solver solutions (been doing a lot of pre flop work in Pio lately) you get a real eye opener when you see how much loose-aggressiveness you can get away with (unexploitably) in that game state.

Take that understanding to your game and apply it in a pseudo GTO'ish fashion in HU spots (say, 3-betting SB vs BB) and you will probably be perceived by many to overdo the aggression, but you are not. And they can not punish you for it.

For most players, especially those at small stakes who are used to rather passive play, miles away from an appropriate level of aggression, mimicking GTO pre flop frequencies will lift their game immensely and actually exploit a passive field without trying (I'm talking to you, struggling Zoom players).

Here's an example:

You struggle in SB vs an aggressive BB who 3-bets your (rather tight) 45% open range 20% of the time, and you feel exploited. Guess what, he can do so unexploitably (run the pre flop sims and you will see), even if your average aggro small stakes opponent is rarely pushing it beyond 15% 3-betting in that spot. If you are not willing or able to defend your 45% open properly vs 20% 3-betting (and 45% open is definitely not overly loose), you are letting BB's GTO(ish) strategy exploit your sloppiness without him even trying.

Your choices are to tighten up your opens (he gains pre flop with zero effort), or defend proper frequencies pre flop without being able to play your ranges properly post flop (he gains postflop with little effort), or you can grit your teeth, figure out the GTO'ish pre flop defence vs his 3-bets and learn to play those ranges properly post flop (you will still lose to his 3-bets, but as little as possible).

DirtyD 8 years, 5 months ago

True, but when you study solver solutions (been doing a lot of pre flop work in Pio lately) you get a real eye opener when you see how much loose-aggressiveness you can get away with (unexploitably) in that state.

I find this is especially tilting for many players, who assume you are out of line and they should be able to easily exploit you. Because your strategy is actually sound, the easy solution they're looking for doesn't exist, and they become more and more frustrated.

antihero 8 years, 5 months ago

@ZenFish: I agree that studying the solver equilibria is eye-opening at times. I'm using only the basic version so far so only postflop for me... but what you're doing sounds extremely interesting... wish I had some more time to dive deeper into analysis.

Would also be interested to hear if anybody has looked into the PioCloud pack and if it's recommendable

IamIndifferent 8 years, 5 months ago

I have Pio Basic. Basic allows you to browse preflop solutions whether from PioCloud or from Zenfish or others. Piocloud has sample you can download and browse in PioSolver Basic.

ZenFish 8 years, 5 months ago

Solution packs are probably a good investment for those who are hellbent on getting strong pre flop ranges from computation. The alternative is to run Pio in the cloud.

For pre flop calculations simulating over a biggish weighted flop set (say, the 100 flop set), you will need ~60 GB of RAM, preferably more for wide-range scenarios. Amazon EC2 charges $3/hour for the biggest configuration I use there (c4.8xlarge = 36 virtual CPUs, 60 GB RAM) and that is just about enough RAM to do SB vs BB after choosing an open range for SB (throwing away all the folds) and chopping away the worst hands from BB's range (stuff BB would never play like small offsuit rags, and you can do a smaller simulation first to identify those).

Amazon EC2 works very well, but there are other options for cheaper (like OVH). What I like about Amazon is that you can resize your virtual machine in an instant according to need, and you only pay for what you use. Not sure if snap-sizing up/down is possible with OVH, but I might check it out in the future.

Since you will burn much more money (lot of CPU hours are spent figuring things out along the way) on renting computing power than what you would pay for a pre flop solution pack, buying it makes good sense. And you can save yourself the Edge license. However, if you have money to spare, you will appreciate the learning that comes from crunching ranges yourself. There is no substitute for the hands-on approach when you want to learn something.

And if you want to study how to adjust your pre flop strategy to exploit someone maximally, you simply have to get Edge and run your own pre flop calculations vs model opponents. I recommend Pio Edge strongly for those who are dedicated and willing to put in a lot of time and effort. It's mind-blowing to explore pre flop play with Edge. If you're not that type, the investment probably won't pay off.

IamIndifferent 8 years, 5 months ago

Zenfish, have you simmed SB vs BB with SB able to either limp or raise? I think Piocloud solution pack only has raise but one of PIO's vids shows limp and it changes solution dramatically.

ZenFish 8 years, 5 months ago

Limping + raising makes the game tree huge, and I don't think it's worth the effort to run those sims. I would rather buy them (I think they are part of the PioCloud package).

The solution will be very complex and hard to implement well, so I'll abstain for now. Also, if you do simulations with rake, you will see a pattern where strategies get rewarded when they reduce the amount of post flop play. For example, if you push a 3-bet-or-flat strategy towards 3B-or-fold which will result in more pots won without seeing a flop, you may increase EV even if you end up folding a 1/3 chunk of hands that could have been used as +EV flats (but can not now, when your best hands are always 3B, so your flatting range would be very weak if you had one).

Rake effects are complicated, but I have seen that pattern pop up in several spots. For one sim I ran, changing a 3-bet range a bit away from Nash made it win more by getting exploited. In a rake free game, deviating from Nash will cause a loss against a maximally exploiting opponent, but the change I did made the Nash opponent switch to a pure 4B/fold defence, which reduced rake for both. So both players gained EV when one deviated from Nash and the other exploited it.

Raise/fold from SB is probably superior to a mixed limp/raise strategy if you play small stakes with high rake (and the average player tends to under-defend BB anyway, which makes raising even better).

Be the first to add a comment

Runitonce.com uses cookies to give you the best experience. Learn more about our Cookie Policy