Breaking apart misconceptions about GTO

I approve this post.

We're playing a game, and the assumption is that we want to win. Extrapolating a strategy that can't lose, such as picking at random in rock-paper-scissors is not the same.

But "optimal" confused people since it seems like an optimal strategy should be one that is as profitable as possible.

I don't get this. You're equating this to a Nash equilibrium, which makes sense. It seems like you're saying that GTO isn't playing to win, and instead it's playing to not lose. How could that be optimal?

I would argue that the people who thought optimal meant we're trying to be as profitable as possible got it right.

"It isn't just the best strategy to play as informed by a study of game theory."

Right, it is the best strategy.

This all comes from the problem: since we don't know, and probably can never know what a true Nash equilibrium strategy would be in big bet poker, how do we work towards creating an optimal strategy?

When reading about the evolutionary process of a Nash equilibrium, it seems that we should be trying to win the most, and that this will evolve into an "optimal" strategy. Defensive strategy is important as well, but it is not the entirety of what all equilibrium strategies would look like.

willt 11 years, 10 months ago

So, think about it like the word "work" in the physics community. It has a well-defined, technical meaning: force times distance. Sometimes this definition does not agree with people's intuition about what the word should mean. For example, if you carry a weight across a room, such that the force you exert on the weight is aways vertical (to support the weight) but the direction you move it is always horizontal (across the room), then you have done 0 work on the weight.

But it was heavy, and you carried it all the way across the (long) room! Doesn't matter. It turns out that "force times distance" is a very useful concept in physics, and someone gave it the word work, and that was that. Now, whenever we say work in physics, we're refering to the very specific concept that may or may not always match the colloquial use of the term. You definitely shouldn't try to make up your own definition in physics for the word "work" which already has a widely-used and precisely-defined meaning.

In The Mathematics of Poker (and some but not all game theory literature), optimal means Nash equilibrium. As this thread proves, the terminology confuses people in the poker community. So, "game theory optimal" is useful, since it's less likely than with "optimal" that people will see it and think they understand what the writer means when they really don't.

But anyway, Nash equilibrium has a very specific meaning: a set of strategies such that no player can increase his EV by unilaterally deviating. This is a cold, hard, dry definition. There's no touchy feely qualitative stuff like "defensive", "playing not to lose", etc. But it turns out that, in the case of a 2-player zero-sum game like HUNL, this is enough to guarantee some pretty strong properties: in particular, at least break-even expectation when our EV is averaged over both positions.

The Nash equilibrium may not have all the properties you want for your strategy. In that case, tell us what properties you want, and then, hopefully, show that such a strategy exists and show us how to find it. And give it a name. But not GTO, because that name's already taken. :)

optimal means Nash equilibrium

I agree.

But anyway, Nash equilibrium has a very specific meaning: a set of strategies such that no player can increase his EV by unilaterally deviating.

I haven't come across this definition, and it contradicts with the meaning I've understood from reading about Nash equilibriums. Could you please share where you got it from?

willt 11 years, 10 months ago

First two sentences of http://en.wikipedia.org/wiki/Nash_equilibrium :

In game theory, the Nash equilibrium is a solution concept of a non-cooperative game involving two or more players, in which each player is assumed to know the equilibrium strategies of the other players, and no player has anything to gain by changing only his own strategy unilaterally.[1]:14
If each player has chosen a strategy and no player can benefit by
changing strategies while the other players keep theirs unchanged, then
the current set of strategy choices and the corresponding payoffs
constitute a Nash equilibrium.

edit: where do you see it otherwise?

Thank you.

In game theory, the Nash equilibrium is a solution concept of a non-cooperative game involving two or more players, in which each player is assumed to know the equilibrium strategies of the other players, and no player has anything to gain by changing only his own strategy unilaterally.

The problem here is that no one is playing a Nash equilibrium, and instead we're working towards it. I can concede that what I'm saying doesn't fit with a pre-made GTO strategy, but the same can be said of anything anyone says. It's all theory of what a possible GTO strategy would look like (GTOT?).

We also wouldn't be able to know the other players' equilibrium strategies, and have no idea if there are any points where there is anything to gain by unilaterally changing strategies.

willt 11 years, 10 months ago

Well, it's true we don't know the actual equilibrium of the whole game, but that doesn't mean it's not a useful concept in theory.

As far as practice goes, there's no way to find a perfect maximally exploitative strategy with any certainty either. We have to make simplifying assumptions, approximations, etc, and do our best. Similarly, we can find equilibria of approximate games which are simplified by various assumptions and can often gain a lot of insight by studying these.

In fact, the question of whether we have anything to gain by changing our strategy is the same as the question, are we currently playing as exploitatively as possible?

willt:

I agree, thank you for your input :)

edit: where do you see it otherwise?

in your own post, when you wiki it, you cant find word optimal anywhere

willt 11 years, 10 months ago

Michael: np, glad to help :)

Aleksandra: yea, that was the Wikipedia page for Nash Equilibrium. To see that the authors of MoP use "optimal" to mean the same thing, see the top of page 103 in the book. For a discussion on this terminology (to which Jerrod Ankenman contributes), see the terminology sticky in the Poker Theory subforum on 2p2.

I approve that GT approve posts on rio ( not that im aloud to approve but ...we shouldnt go into so many details :D )

There´s a contradiction imho. You can´t simultaneously exploit your opponents and stay unexploitable yourself. Once you start to attack the weakness of your opponent you have to let shields go down.

It´s like the boxer who can´t fully cover himself AND attack his opponent at the same time.

Or - with RPS - when you want to play unexploitable, you have to play 1/3 - 1/3 - 1/3, perfectly randomly. Once you recognize that your opponent prefers rock - you deviate from the optimal path and exploit that by not choosing scissors anymore and choose paper more often instead. HOWEVER - now you´re exploitable yourself and your opponent´s best strategy becomes to not choose rock anymore and instead choose scissors more often. That means you´re not playing "optimally" (in the GTO sense) anymore.

You can´t simultaneously exploit your opponents and stay unexploitable yourself. Once you start to attack the weakness of your opponent you have to let shields go down.

While this is true, it would be possible that they don't know you're exploitable, or that they don't know how to exploit the weakness you're making available to them.

Or - with RPS - when you want to play unexploitable, you have to play 1/3 - 1/3 - 1/3, perfectly randomly.

"Perfect randomness" is not possible. We can get close enough to it that it would be indistinguishable though.

It would be possible for an exploitative strategy to either look random to your opponent, or for whatever they do to try and exploit your strategy to hurt themselves in the end.

There's a difference between being unexploitable, and being effectively unexploitable. Kind of like the nuts, vs. "the nuts". They're not the same, but to your opponent they would be.

It's impossible to look for and catch every possible pattern in someone's play. Because of that, it's "effectively unexploitable" to allow undetectable patterns to show up in your play because they will still look random to your opponent. To them, you would still be playing "perfectly unexploitable".

What is randomness?

Effectively, randomness is unpredictable and perfectly distributed. There will appear to be patterns, but they will appear as often as they're supposed to, which means they're effectively noise. If there is less of a certain pattern than there "should" be, that would also show a lack of randomness.

Randomness can't be proven or disproven. We can look at a given "random" output and look for how probable that given output is to be random.

If I were to get a list of 10 random numbers, 1 to 100 and present them in a list, here are some examples:

1, 2, 3, 4, 5, 6, 7, 8, 9, 10

53, 91, 5, 91, 32, 19, 2, 77, 43, 21

11, 11, 11, 11, 11, 11, 11, 11, 11, 11

In effect, all of these lists are just as likely as each other, or any possible list to be random. The presence of obvious patterns in the first and third make it extremely unlikely that these are random. However, over enough samples, this would eventually come out randomly.

What I'm getting to is that it's not randomness we're after, but perceived randomness. Your play could be perceived to be random, and at the same time be actively working to exploit your opponents weaknesses.

I think tho there seems to be contradiction, if you are start expoitative strategy and your opponent doesnt adjusts to it ( for example 3 betting him any 2 cards and he doesnt lets say in theory start defending wider ) as long as he doesnt adjust in any way to your expoilting him, you are having some sort of merge of GTO and exploitative same time

Its like boxer attacking and his opponent doesnt see where he is open to counterattack

If you are able to do readjust yourself faster then opponent finds an answer, you got a perfect play that merges optimal and exploitative

No doubt about the following points:

- maximally profitable play against non-optimal-playing opponents incorporates GTO - AND exploitative strategies

- there´s no chance nowadays (and probably for at least the next 100 years ;->) to solve the nash equilibrium for big stack poker

Nevertheless - GTO or "optimal play" or Nash Equilibrium (which "optimal" play is based on according to the GTO-term) is a very technical and very defined construct. It´s not sth. that is "perceived" in any way because we´re imperfect, it´s the "perfect" balance between two (or more) best response strategies to each other.

So, when you say, there´s some misconception about GTO in the sense that GTO is actually a winning strategy and incorporates non-optimal, exploitative lines as long as the opponent doesn´t recognize, you´re not strictly talking about "GTO" (in the technical term) anymore.

I disagree, could you please show how I'm wrong by definition?

imawhale26 11 years, 7 months ago

- maximally profitable play against non-optimal-playing opponents incorporates GTO - AND exploitative strategies

My understanding would be that maximally profitable play is 100% exploitative. Vs a non-adjusting/non-optimal opponent there is absolutely no reason to construct any GTO ranges in theory. In practice we might want to construct some GTO ranges because even though we know that villain is exploitable, we don't know exactly where he is making mistakes and/or how to exploit them.

So in theory the most profitable strategy against a non-GTO opponent will always be 100% exploitative, but in the real world we may want to approximate some GTO strategies in spots where we are unsure of our opponent's strategy.

computerscreen 11 years, 10 months ago

"Let's take rock-paper-scissors. An unbeatable strategy would be to
throw each of your three options that has no detectable patterns to it
so that your opponent can't beat you. So GTO would be to just pick one
of your options are random every time, right? The problem with the idea of picking everything at random is that you will also never "win". You can't win, and you can't lose."

You'll win if your opponent doesn't randomize their response.

You'll win if your opponent doesn't randomize their response.

If you keep picking your choice at random, you would breakeven over the long run.

computerscreen 11 years, 10 months ago

for instance villain chooses rock 100%, and you are playing 1/3 of P, S, and R. You'll win 66%.

You would win, lose and tie 33% each

YUP thats it i knew i missed out something

in that case opponent will pick the thing that kill rock 100 percent of time and u will insta lose 100 percent of time..those are deviations of GTO and in poker they are i guess reffered as adjusting and exploitative strategies

not sure 100 percent~ i had it all sorted nicely in my head and now this tread kinda confused me

Yeah, that's how I felt before I started looking into this more... I'm hoping that we can all learn something by discussing this. I'm less sure about what GTO means as time goes on.

Hehe the good old "GTO is breakeven in RPS, hence it is breakeven in poker" discussion.

I'm not saying that GTO is breakeven, I'm saying that exploitative play is a part of GTO.

Sauce123 11 years, 10 months ago

There are tons of dominated strategies in poker, and they are non obvious. These dominated strategies lose tons of EV to GTO play.

The piece linked in my most recent blog post explains the point more thoroughly: http://www.leggopoker.com/blogs/sauce123/

(normally I hate linking my own blog but I think in this case it's actually helpful and not just attention whoring!)

Thanks Ben, I enjoyed reading this!

I have some comments/questions after reading. You mention that everyone still plays dominated strategies to some extent, yourself included. I agree that reducing these is good, and is a step towards GTO.

I guess the question I see now is: can all dominated strategies be completely removed from someone's game? (theoretically...)

And a follow-up question: when you see a dominated strategy in your opponent's game, do you try to exploit it?

My main reason for writing this is that I believe exploitative strategies are a necessary part of GTO, and the evolution towards GTO, and that no one plays (and probably ever could play... but for now I'll leave in that it could be possible) a truly GTO game.

I agree with your assessment that exploitative play has been over-used, and that its effectiveness isn't the same as it used to be. What I'm getting at is that I still think it's an essential part of the games of many people. Is this reasonable, or do you see yourself avoiding exploitative play every chance you get?

Thanks Ben, that's a great text. It leaves me wondering though if there is really all that much difference between the two sides of the fence. If an old school player is seeking to maximize EV in a certain spot, doesn't that also automatically become the best GTO play in that spot? Maybe this is what's been discussed elsewhere in the thread. It just seems to me like the differences are more in how you approach the game, and you may still arrive at somewhat similar strategies in the end. The GTO approach may well get you to those strategies faster or closer to the theoretical EV max, which would obviously be a good thing.

I'm assuming here that GTO play must adapt to opponent's mistakes, just like in the tic-tac-toe example, but maybe I'm getting that wrong. I think you indicate in the text that a GTO strategy would not take opponent tendencies into account.

I'm a bit confused, but it may well be a confusion of terminology since everyone seems to mean slightly different things when they say GT or GTO.

Not that i wanna mess around, but i think GTo is exploitative also, because if you re playing perfect GTO and opponent is unbalanced u gonna win, hence exploit him

Yes, I agree. And it may be more +EV to make your strategy exploitable to further take advantage of your opponents' weaknesses.

When I first started learning about this, I thought there were two ways to play poker. One was GTO, and said that you want to be unbeatable. The other was exploitative, and said you want to beat your opponents. The "real" GTO uses both. You are both trying to exploit your opponents' weaknesses, and also not be able to be exploited. That is the goal, and the way that you would be able to create a true game theory optimal strategy.

Can you give an exact definition of what you mean in non contradicting terms, (and prove that such a strategy always exists)?

I'm going to claim that the current "proof" of game theory doesn't apply to poker because it is infinite. Timing, psychology (basically anything that people do), come into play that make poker a game of infinite possibilities, which isn't covered in Nash's proof.

The problem is that this can't be proven either way. Nash's theories give us an evolutionary framework to work with though, in that to find the optimal strategy, we must be trying to optimize every decision.

Another problem is that current ideas of GTO strip away valuable information in a search for a solution. The game of poker has much more to it than just the cards, ranges, etc.

I'd love to hear your definition of GTO, how you know it will work, and how it takes into account information about people.

I approved the first post of this thread. Talking about 'personal' definitions of what GTO means is silly, right now it just means Nash equilibrium.

Your definition seems to be the strategy that will maximally exploit every opponent based on all information available.

This is not a formal definition, for instance it depends on the model that you use. Suppose that in the Nash equilibrium of 1000bb HU NL you should flat AA against a button minraise with 17%, and the first two times that AA is observed it was flatted. Your opponent could still be playing a Nash equilibrium but there are models in which it is now assumed that AA gets flat >99%.

The model will only work if the assumptions are correct. Also you have to be extremely careful to not get 'fooled'.

For instance if I could give up 5bb by playing my range bad in one spot and make sure your 'GTO' notices this and it will give me 10bb in the next spot that comes up by deviating from GTO, I now have a strategy that will win 5bb in expectation against your 'GTO' strategy.

I approved the first post of this thread. Talking about 'personal' definitions of what GTO means is silly, right now it just means Nash equilibrium.

Fair enough. I'm essentially explaining my take on the Nash equilibrium, how I find that it applies to poker, how I see other people seeing it, and what I think the difference is.

Your definition seems to be the strategy that will maximally exploit every opponent based on all information available.

Close. I would say it is to make the most +EV decision at every possible opportunity. This also includes decisions to say or not say things. This includes timing. This includes being aware of all information made available to you. This includes everything.

How do you decide which 17% to pick? Is it truly random? Are you picking times that you think are "better"?

The model will only work if the assumptions are correct. Also you have to be extremely careful to not get 'fooled'.For instance if I could give up 5bb by playing my range bad in one spot and make sure your 'GTO' notices this and it will give me 10bb in the next spot that comes up by deviating from GTO, I now have a strategy that will win 5bb in expectation against your 'GTO' strategy.

Yes, exactly! Your example is inherently exploitative. Do you consider playing your range bad to get future rewards GTO? If so, then we're saying the same thing.

If you can "fool" me by exploiting me, that would be a deeper level of strategy. If you have the opportunity to do this to me, and you don't do it, then your decision wasn't the most +EV that it could be.

Im gonna post cause i need to by writing to somehow clean lil mess in my head

In this case player computerscreen with opponent adjusting to his desicion to pick rock started to win 100 percent of time which is massive EV plus compared to what doll said is GTO ~ play answer which would bring only break even to player if he choses 1/3 of time of each, hence why ppl i guess think GTO is bad play

In reality , player computerscreen ( i chose this just cause he posted , sorry ) will soon realise that picking rock 100 percent of time with attempt to win more is actually losing him money 100 percent so he wil adjust and in time he will be back to perfect GTO play making him at best break even, because no matter what strategy he picks he will be just counter exploited

so we got Point of Game theory optimal is to find perfect strategy for player to maximise his wins, and maximum is break even in game RPS

and i think thats where comes highstakes play and balance, and that players who have balance actually profit from players who are unbalanced and are trying to find some exploitative strategy that will work.

RPS isnot so comarable to poker but game theory trying to solve it came out with GTOptimal

In regards to poker its far complicated game and i dont think terms exploitative and optimal should clash so much , and IMO merge of 2 in one concept is much better , balance wont make u as much as explotative play can when u have opponent not adjusting properly and is making a mistake ~ as dolle said to GT *If you can "fool" me by exploiting me, that would be a deeper level of strategy. If you have the opportunity to do this to me, and you don't do it, then your decision wasn't the most +EV that it could be, and on other hand i think thats where comes highstakes poker and i dont think you can find as xploitable spots in highstakes poker ( i might be mistaken ) and where if you re balanced you will encounter many opponents that will try to use lowstakes exploitative strategies against you and where having balance is what will win you the money

In GTO ~ optimal is what makes most perfect strategy for both sides in perfect play pair, , in game RPS game is easily solvable and i think ppl should stick to meaning of the word ~ optimal~ most perfect strategy, so if it opens room for you to exploit someone i do think its optimal :S even if you go out of optimal balanced play ( because balanced play is intended for you to play against perfect play pair that plays back perfect optimal balanced strategy against you~ when this isnot a case , you start to do exploitative strategy~ which in meaning of the word is OPTIMAL for you

I dont think meaning of the word optimal and GTO as such should be narrow and as strict when it comes to poker, i dont think exploitative contradicts with it, and i dont think as well that playing close to GTO is not ev plus, i think more sophisticated your opponents are, more balance yu have will result in better results and more win, so making the GTO ev plus play ( since its not as easy game as RPS ) and by winning by playing balanced GTO based poker game we may say you re exploiting your opponents ..so kinda not sure why ppl argue that much abt this 2 being in so much contradiction

What seems to me that perfect poker play is merge of GTO and exploitative strategies and adjusting timely to your opponents, and i dont find this 2 contradict each other but complement each other....:S

OK sorry i had to talk to myself i got mess in my head i tried to put it in order :)

Thanks for the reply Aleksandra! I agree with you, and have a few things I'd like to share:

Computerscreen asked in post above what if i pick rock 100 percent of time, and got my low stakes player brain answer him immidately with opponent will chose thing that kills rock ( i forgot what kills what except scissors kill paper ) which may be viewed as exploitative strategy since player computerscreen isnot playing optimallyIn this case player computerscreen with opponent adjusting to his desicion to pick rock started to win 100 percent of time which is massive EV plus compared to what doll said is GTO ~ play answer which would bring only break even to player if he choses 1/3 of time of each, hence why ppl i guess think GTO is bad play

If a player is predictable in RPS, that allows us to exploit their strategy. Against someone throwing rock every time, the optimal play would be to throw paper every time assuming our opponent won't adjust to a better strategy. You bring up a great point that if I throw paper every time, it will be easy to figure out that throwing rock every time isn't ideal.

What if my adjustment is more subtle though? What if I throw paper half of the time, and still throw rock and scissors 25% each? Now I win 50%, tie 25%, lose 25%. This isn't ideal against someone that won't adjust, but it also could mean that you're less likely to notice that I'm attempting to exploit your strategy. This would be even more useful in poker where the "same" spot doesn't come up all of the time. (eg. you fold too much to 3bets, how often should I 3bet you? should I do it every time? you'll likely notice that. what if I do it more than normal, which exploits your high fold to 3bet, but it flies under your radar so that you aren't able to separate my additional 3bets from random variance?)

Another thing I'd like to bring up here is that you mention that the person playing rock every time would switch to a "random" GTO based strategy. I don't think that's the case. I'd expect that person to do "better" by not throwing rock every time, but I would still expect them to have predictable tendancies, although not quite so obvious as 100% rock.

I agree, and from my understanding exploitative play is a part of GTO... although that seems to have started a bit of a debate :)

What seems to me that perfect poker play is merge of GTO and exploitative strategies and adjusting timely to your opponents, and i dont find this 2 contradict each other but complement each other

Yes, I agree :) My take on what I've learned about game theory is that if an exploitative play is more +EV, then that is a part of the evolution towards GTO.

Since no one can truly know what a GTO strategy would look like at this point, it's all speculation until then. I felt it was important to bring this up because it seems like exploitative play has been separated from GTO, and I believe it should be integrated instead.

To add i tried google , wiki etc and various game theory definitions, and actially cant find word optimal definied anywhere so i think proclaiming GTO to be this and that is farfetched, game theory optimal is simply ..

Game theory is i believe part of maths that tries to solve different games and find optimal strategies for each game, so if we re discussing poker, optimal should be what is EV most plus?

Not sure , im still v confused

You're saying what I'm thinking... and I'm also confused :)

:) i think maybe willts reffering to is helpful

quote *It turns out that "force times distance" is a very useful concept in physics, and someone gave it the word work, and that was that. Now, whenever we say work in physics, we're refering to the very specific concept that may or may not always match the colloquial use of the term*

I think GTO has words with meaning that seem to me and dolle can comprehend by its meaning exploitative as well cause its ev plus at some points and we re holding to that word optimal in GTO shortcut and EV plus is by all means optimal to each of us

Im not well math educated, and it might be that we re missing out something that is definied by math, or some GTO definition that we missed out somewhere so is confusing us

I dont understand why would i exclude exploitative strategies out of game theory optimal in poker

POker is unsolved game of what i know, so optimal can be so many things??

And , meaning of the word GTO might be misleading , and as word work has 1 meaning in physics that doesnt always match colloquial, maybe there is meaning of GTO that has narrow definition or has other meaning that we missed, someone that understand or knows be nice to post, meanwhile i ll think exploitative is optimal same time and reverse

Sauce123 11 years, 10 months ago

This isn't a question of "x's belief on game theory", or "y's understanding of game theory", or "z's interpretation of game theory." It's just math guys ! The names for the referents of the math in game theory are conventional- that's why they're names. If they are confusing to you you can substitute other names which are more comfortable- but when posting publicly it's considerate to use conventions we all agree on.

In most discussions on the forums GTO means nash equilibrium. Sometimes we only talk about one part of the equilibrium- for example we might look at OOP's strategy at equilibrium even if IP is almost certainly not playing at equilibrium (or maximally exploitatively of OOP). When we try to find equilibria for one player it's usually in order to make the conclusions of the thread broadly applicable, or it's as a hedge against getting leveled by our opponent.

It's just math guys !

I disagree. I guess in a way, everything is math, but it seems like you're implying to just throw out psychology and anything else that makes poker a game of understanding people.

If they are confusing to you you can substitute other names which are more comfortable- but when posting publicly it's considerate to use conventions we all agree on.

I would love to find a way to understand what these terms mean. There are also a problem with this line of thinking, in that the meaning of terms can change over time. I find it weird that when talking about GTO/Nash equilibriums, no one ever quotes John Nash.

In most discussions on the forums GTO means nash equilibrium

I agree, but then the question is, what is a Nash equilibrium for poker? I've been working on this question a lot lately. Here's what I've got now:

- No one plays or will ever be able to play a fully GTO style of play in non-trivial poker games

- All we can do is make assumptions of what that will look like, and work to evolve poker strategy in that direction

I agree that a lot of theories related to GTO make a lot of sense, but that's a long way from them being GTO.

The big thing coming out of this is the question: What is our strategic goal? Are we aiming to be unexploitable, or are we aiming to maximize EV in every way we can? Is one of those GTO, and not the other? Why? What is it that makes one strategy GTO, and not another?

From my research into what a Nash equilibrium is, it seems to mean to make the most +EV play at each opportunity. It seems like the general consensus is that the Nash equilibrium for poker seeks to be unexploitable. In theory that could be the same thing. I believe that trying to exploit our opponents is a necessary part of being unexploitable. How big of a role that plays is up for debate, and I'd also like to hear if someone has good reasons as to why that's the wrong way to look at it.

"It's just math guys !I disagree. I guess in a way,
everything is math, but it seems like you're implying to just throw out
psychology and anything else that makes poker a game of understanding
people."

I read Ben's post as saying that as game theory is just maths then it isn't open to opinion, what someone says is either logically consistent with the theory or not. I don't think he's implying to throw out psychology etc and never play exploitatively.

"From my research into what a Nash equilibrium is, it seems to mean to
make the most +EV play at each opportunity. It seems like the general
consensus is that the Nash equilibrium for poker seeks to be
unexploitable. In theory that could be the same thing. I
believe that trying to exploit our opponents is a necessary part of
being unexploitable. How big of a role that plays is up for debate, and
I'd also like to hear if someone has good reasons as to why that's the
wrong way to look at it."

Trying to make the most +EV play with each hand against some kind of super opponent who can out adjust you perfectly (the Nemesis) leads to the nash equilibrium. Deviating from the nash equilibrium with a certain hand in your range in an attempt to increase your EV leads the nemisis to exploit your new strategy there by reducing it's EV to be at most the EV of the nash equilibrium in that spot.

Trying to exploit your opponents who are trying to exploit you back eventually leads to an equilibrium if you are both equally good at it. However trying to exploit your opponent's strategy is not a necessary part of being unexploitable and in fact is the opposite!

computerscreen 11 years, 10 months ago

I should not make posts after grinding ;p I even thought about it twice before I said 'rock'.

I'm not sure about GTO, I've heard it talked about in pokerz, and the idea is you make your opponent indifferent to calling or folding. So their strategy doesn't matter.

Nash equilibrium is when neither player (players) can gain by deviating.

What if they just pick 'paper'?

I only know short stack poker, the NE is important to know because we (should) use it as our back bone to adjust from. Many players have already adjusted shove charts but I reckon it to building a house on a sloped foundation.

if they only pick paper, or always fold their bluff catchers, which are indifferent to calling or folding, then we can deviate and gain an advantage

However if you're playing against a robot who can't deviate from gto then always folding your 0ev bluff catchers or always playing paper won't change your expectation

@Michael: I´m actually confused why you´re insisting that GTO incorporates exploitable / exploitative strategies. Ben nailed it, imho.

NASH EQUILIBRIUM = strategy-set where every (!) player´s strategy is the best (possible) response to every other player´s strategy

If we agree on that definition and agree that GTO / "optimal" means Nash, then it gets pretty clear that GTO doesn´t incorporate any exploitative plays. Because once every (see def.) player takes the best response to his opponents there is no exploit possible anymore.

And here´s the point: It doesn´t even matter if we win or lose! Every player maximizes his potential profit - not potential in terms of exploiting the other player´s leaks but potential from the given rules and circumstances of the game. "Optimal" could even mean a slight loss if that´s the best we could hope for.

If in RPS one player chooses Rock all the time - he´s not playing "optimally" (as we know). Once he´s not playing "optimally", our best response strategy is to deviate from optimal play as well and exploit him. By doing that we play maximally profitable - but (and please pay attention now, w/o any irony intended!) we DON´T PLAY OPTIMALLY anymore!!

That said, we just should not mix up terms.

OPTIMAL = unexploitable = maximizing potential profit against the best possible strategy of Villain!!

MAXIMALLY PROFITABLE = exploitable = maximizing own profit with the risk of offering Villain a counterstrategy to take advantage and exploit us (in which case we had to counteradjust etc.)

One last attempt to convice you. :) A chess computer which works in brute force mode uses GTO as well. In the first step he simulates every legal move he has. Then he simulates every (!) legal countermove for his opponent on every move he was allowed to make. That way he creates a tree: 1 - 1, 1 - 2, 1 ...21, 2 - 1, 2 - 2, ..., 2 - 35, ...

He does this for a certain number of moves (until the tree grows too big) and then stops and looks at the board. Then he goes backwards and while he assumes that his opponent (you) ALWAYS chose the best possible countermove he´ll choose the one single move that gives him the biggest advantage against this best possible move of his opponent. And that way he works himself backward to his next move.

Here you can see what happens: no "regular" chess computer (not even the biggest / sophisticated ones, afaik) will EVER exploit your weaknesses. He simply does not recognize any weaknesses in your game. He´s numb. He just assumes you´re playing perfectly against him. And he tries to maximize his "profit" against your optimal lines. If you play sub-optimal, the computer will win (quicker). But that does not mean that he would exploit you. How could he? He doesn´t know about exploits. He just plays "optimal" and let the rest care about itself (= letting you hang yourself ;)).

Raphael Cerpedes 11 years, 10 months ago

Great post, I agree with everything here.

I didnt read forums and ones i read werent having any accordance on what GTO exactly is, if its public accordance GTO is NAsh equilibrium then ppl should go with some definitions, i did try google and wiki it really didnt have word GTO for nash just GT so it was confusing

Even more is confusing definition of Nash Equilimbrium itself that bigfish gave above

It says : NASH EQUILIBRIUM = strategy-set where every (!) player´s strategy is the best (possible) response to every other player´s strategy

and when someone starts to think abt it may say i can chose something else as strategy set, exploitative instead optimal because it is best possible response to other players strategy` so word meaning in GTO letter O optimal is very misleading, and logical empty space in definition leaves room for someone to think it can incorporate exploitative in it

Apparently most confusion comes from ppl not having set of definitions, and what means what, and ppl would like to solve their own game in best possible way, which is not met by any ONE strategy by itself but by set of strategies or one yet not found, and in that search ppl search Game Theory solutions

What i guess we want is Game theory of Perfect Poker play :) that incorporates set of strategies since 1 by itself so far didnt accomplish perfect play neither in theory nor in practice

And other thing is suggested by Ben, and it is : but when posting publicly it's considerate to use conventions we all agree on...is very good point

So if we talk we can agree what means what, i dont have poroblem to accept wording, but to add trouble is i dont read same forums as Ben and i guess many ppl don't so they will come to same confusing point, which i think after this tread we may conclude that all misunderstanding comes from misconception of definition by itself, yet, there is only couple of us or so reading this, in some forums they probably already have accordance, but in many they dont and with so many ppl to come and join poker community i guess they will run into same question

Maybe someone puts a post on wiki :) so ppl come by will get a definition that is convention

Tom Willetts 11 years, 10 months ago

For the nash eqilibrium strategy to be best usually reqires your opponent to be playing optimally. If they aren't then it's likely you will do better by adjusting your strategy to play a maximally exploitive strategy, which in turn makes you exploitable.

Rake makes poker a negative sum game, so the optimal strategy is simple - don't play.

ZenFish 11 years, 10 months ago

1) Against any player:

Most profitable play = maximally exploitative play

2) Against a "nemesis" (a player who will exploit us maximally whatever we do):

Most profitable play = maximally exploitative play under the constraint that we immediately will get maximally exploited in return = Nash equilibrium strategy.

In both cases we're maximizing profit and playing each individual hand as profitably as possible. Against the nemesis, this process leads us to a Nash equilibrium strategy with ranges that are balanced everywhere, but we don't play balanced for balance's sake, we do it because it maximizes our expectation against the nemesis.

It often comes up in discussion that GTO/Nash equilibrium strategies are defensive and designed not to lose. It's not incorrect to state this, but it's more useful to to think about them as designed to maximize expectation against a perfectly playing opponent. It just so happens that when we're playing against a nemesis, our maximized expectation is zero, but against most opponents we will win.

Against an exploitable opponent who doesn't adjust, we will win the most by removing the assumption that we always get counter-exploited. Then we can make more extreme exploitative moves and increase our profit further.

In poker the term "exploitative play" is usually associated with making money from opponent mistakes. We're stuck with that term, but I like "maximizing expectation" better, because it's generally applicable and because this is what we always try to do, whether our opponent is making mistakes or not. Against weak players, we achieve this by using extreme strategies that can be exploited in return. Against the nemesis, we achieve this by playing a Nash equilibrium strategy.

That sums it up perfectly :)

Benovi 11 years ago

Agreed. You got it! To be unexploitable you have to be constantly exploiting. That's the definition of an equilibrium...constantly changing to cancel out the force of the other side, so in theory GTO is when you are perfectly exploiting all of your opponents in real time. A "nemesis" would be perfectly exploiting you at the same time and therefore create a perfect equilibrium or zero sum game.

In poker a few interesting things questions/ideas I have would be.

Could edge in poker could be defined as a difference in time between you understanding the villain and the villain understanding you? The faster you recognize a players tendencies, which are constantly changing, the more edge you have. If you adjust every game and your opponent adjusts every other game you will be ahead of the equilibrium and therefore be a winning player.

GTO should be in a constant flux and therefore poker is extremely difficult to solve. I believe poker is similar to a chemical equilibrium and is never static. When extreme actions are introduced the math becomes extremely complicated. A true GTO poker bot would have to account for the possibility that the opponent has changed his/her preflop range and bet sizes randomly because they are on tilt. Chess and other GTO games don't have the same human element. Even though we are simply algorithms, albeit extremely complex algorithms.

Teddy 11 years, 10 months ago

I think I´ll go back to thinking about Shania..

ps. Really hate R/P/S as a GT strategy example when talking about poker.

@Aleks: "[...] so word meaning in GTO letter O optimal is very misleading, [...]"

I agree with it and there was some really controversial discussion right away after the book "Mathematics of Poker" was published, because the term is indeed misleading, as "optimal" does not implicitly means "best" (like the "colloquial" definition of "optimal" implies).

Just imagine two bots that are perfect in exploiting each other´strategies. And those bots are capable of playing like 10 million hands per minute against each other. They each start with a (randomly chosen) strategy. Every minute, after having played 10 million hands against each other, each bot starts to refine his strategy by adjusting to the other bot´s observed leaks. This is commonly defined as "exploiting". After the next hour, new refinements get in place, as both players changed their strategies. This goes on and on and on ...

Now, 23 years later we see two bots playing each other with 99.99% identical strategies. They almost reached the Nash Equilibrium. So, "maximally exploitative play" (= most profitable / best play) ultimately converged to "optimal play" - while still being the maximally exploitative (= most profitable / best) play.

Agree?

So, "optimal" ultimately gets equal to "best" once the dust of two players adjusting to each other by exploiting their strategies settled down and the opponents arrive at the Nash. Once this happens, the optimal strategies simultaneously get "best" strategies.

I believe that exploitative strategy is a necessary part of GTO, and what you're saying seems to agree with that.

The only thing I would disagree with is that these two bots may not be playing the same strategy. Nash proved that there is at least one equilibrium for all finite games. We don't know if there is one, or more than one. I'd also argue that poker isn't finite, which also changes things (although for bots, it would be finite because the human element, and the game surrounding the game has been removed).

Total agree, , and as zen pointed out nicely, point is maximum exploit , only it just happens against perfect nemesis that maximizing our expectation equals zero and is definied by nash equilibrium, yet against non nemesis players, once that dont have such capabilities of perfection maximizing expectation includes exploits and as such they re optimal and best , so i really still feel GTO as such doesnt exclude exploit as such by any definition, except GTO final solution does exclude them in perfect play pair of nash equilibrium, but that includes perfection unmatched in reality, so i think its fair to say GTO of every day play against imperfect opponents may include exploits as well, because GTO Nash equilibrium is based on perfect play pair and whole theory is evolved around it, and in reality we dont have perfect playing pair, so i think GTO can be used to solve situational play of imperfect couples as well, Optimal and best being different from Nash in that case, just based on different starting assumptions ( perfect play pair, imperfect play pair)

Grrrrrr .... :D

There´s no "final solution of GTO". GTO is the final solution. There´s not a bit pregnant either. ;) It´s all or nothing.

The "O" in GTO means "optimal" in the sense that we reached the Nash Equilibrium.

Adjusting to Villain´s leaks to maximize our profit is a "best response" and is another term of GT (= "Game Theory") but it is not GTO.

So, I suggest to talk about best strategy (for maximally exploitive strategy) and optimal strategy (for GTO-strategy) and they´re unequal as long as we didn´t reach the Nash.

okok agreed` GTOptimal =Nash , GTB = best strategy (in situaional play, if its against nemesis its same as GTO if against any other..various exploitative strategies that imo go close to nash more perfect player opponent is and more towards exploitative more leaks player has )

Guys, didn't war games the movie teach us that against a nemesis, the best strategy is not to play?

;-)

i think we shouldnt worry till we re to meet nemesis ;-)

First, thanks for the responses everyone :)

Questions:

How can it be known how many Nash equilibrium strategies there are?

Is it possible that at least one of them is focused on exploitative play?

How can we know the answer to these questions without a true complete GTO strategy that will "never" exist?

Can any strategy be called GTO without a final solution?

I'm still not sold on ideas that separate exploitative play from GTO, which is why I have so many questions... and I think this is a great topic for discussion. Thanks for all of the input!

"Is it possible that at least one of them is focused on exploitative play?"

I don't think it can be (at least in heads up pots) as it would by definition not be an equilibrium state as one party could improve their EV by changing their strategy to exploit the exploiter.

"How can it be known how many Nash equilibrium strategies there are?"

There could be more than one *i think* but they will all have the same expected value.

"How can we know the answer to these questions without a true complete GTO strategy that will "never" exist?"

A lot can be shown to be incorrect in theory even if it is difficult to show what is correct in theory.

"I'm still not sold on ideas that separate exploitative play from GTO,
which is why I have so many questions... and I think this is a great
topic for discussion. Thanks for all of the input!"

Exploitative play is based on adjusting to opponents tendencies to increase our EV. Equilibrium is reached when adjusting back and forth yields no further increase in EV for either party.

I don't think it can be (at least in heads up pots) as it would by definition not be an equilibrium state as one party could improve their EV by changing their strategy to exploit the exploiter.

I don't think it means a change in strategy, but rather a deeper strategy that accounts for more possibilities.

Exploitative play is based on adjusting to opponents tendencies to increase our EV. Equilibrium is reached when adjusting back and forth yields no further increase in EV for either party.

Equilibrium is reached when the strategy is "complete", and doesn't change anymore. Adjusting could be a part of that strategy (and I would expect that it has to be).

"Equilibrium is reached when the strategy is "complete", and doesn't
change anymore. Adjusting could be a part of that strategy (and I would
expect that it has to be)."

Adjusting is a change in strategy from one strategy to another not part of a strategy. Once both parties are at a Nash equilibrium then they are no longer incentivised to deviate. If one player is no longer playing GTO then the other party can adjust to exploit him by changing his strategy but he is now no longer employing the Nash GTO strategy (which does not change).

"Is it possible that at least one of them is focused on exploitative play?"

I guess it all comes down to how you define "exploitative play". The Nash Equilibrium (per definition) means that both players (if we talk about two players) are maximally exploiting EACH OTHER.

That said you´re right that GTO is focused on exploitative play, but I guess that special form of exploitation is not what you mean, right?

If you´re talking about a Villain that uses sub-optimal strategies, we start exploiting, but then - by definition - we left the Nash.

That said, the answer to the above-mentioned question depends on the definition / idea of exploitative play. If we´re talking about "exploiting leaks", then no, there can´t be any NE that is focused on that, because it couldn´t exist. If we´re talking about exploitative play as choosing the best possible response to our opponent´s strategy - then yes, the Nash Equilibrium is focused on 'exploiting' our opponent´s strategy, yet, as he´s doing the same at the same time, we break-even.

I hope that doesn´t sound too weird and clears things up a bit?

Or maybe let´s stop talking about theoretical concepts and take a simple example. We´re at the river and have one psb left. We have 10 nut-combos and 10 air-combos. Villain has a bluffcatching-range that beats our air-hands. Given our ratio of value:bluffs, if the hands got checked down, everybody would take 50% of the pot longterm.

Now, let´s assume, we have seen Villain folding any single time on the river so far. We start exploiting him by betting the river 100%. Villain folds 100%. Our EV in this case is 1*p (which is twice our equity share!), EV of Villain is 0*p.

Now, Villain starts recognizing that we have bet any single river so far - and he starts to feel like Jack Ass. So he starts calling every single time. What happens? Our EV goes down to 0.5p and Villain´s EV goes up to 0.5p, so everybody takes his fair share on the pot.

Is that a Nash Equilibrium - as we both get the same EV now - namely exactly the equity share our ranges have in the pot?

NO!!

It can´t be a Nash, as we have a simple strategy to exploit Villain´s "new leak" (his stationary tendency). We simply stop bluffing. Our EV when we never bluff but valuebet 100% (and get called 100%) is (0.5*2p) + (0.5*0) = 1p. Villain´s EV is (0.5 * -1) + (0.5 * 1) = 0. So we´re at the starting point again.

So, by now Villain knows that he can´t fold 100% (because then we simply bluff 100% - and taking his entire equity share) and he knows he can´t call 100% (because then we simply bluff 0% - and again take his entire equity share). That means, both players draw circles around each other by adjusting to each other´s strategy. Where will they land? How often should Hero bluff - and how often should Villain call?

Let´s start with Villain. He has to call exactly often enough that we don´t make money with our bluffs (we´re indifferent on bluffing or not, so we can´t win money from either bluffing more or bluffing less). As we´re getting 1:1 on our bluffs, we make money (and have incentive to bluff 100%) if Villain folds > 50%. We have incentive to never bluff and only valuebet, if Villain folds < 50%. If Villain calls exactly 50% - we can either bluff 100% or 0% or anything in between. It doesn´t matter.

Now, if Villain calls exactly 50% (to keep us honest) - how often should we bluff now? Similar answer: we should bluff often enough that Villain is indifferent on calling, meaning he has no incentive to bluffcatch more or less than 50%. As we offer Villain pot-odds of 2:1 after us shoving one psb, Villain has to be good 1 out of 3 times. That means, if we would bluff >33% of the times we bet, Villain would have incentive to call 100%, because he makes money every time he calls, if we bluff <33%, Villain would have incentive to fold 100%, because he loses money everytime he calls. That means, we have to bluff exactly 33% of all the times we bet and valuebet 67%. So, our value:bluff-ratio is 2:1 (10 valuecombos and 5 bluffcombos) which is exactly the odds we offer Villain on the call.

OK, summarized:

Hero bets 75% (10 nut-combos and 5 air-combos) and checks 25% (5 remaining air-combos).

Villain calls 50% and folds 50%.

The EV for Hero is:

EV (Hero) = (0.5 * 0.5 * 2p) + (0.5 * 0.5 * p) + (0.25 * 0.5 * -p) + (0.25 * 0.5 * p) + (0.25 * 0) = 0.75p

The EV for Villain is:

EV (Villain) = (0.75 * ((0.5 * 0.67 * -p) + (0.5 * 0.33 * 2p)) + 0.25 * 1 = 0.25p

Et voilà: Nash Equilibrium! Even though, Villain gets less than his "fair share" of the pot, there´s nothing he can do without giving us the chance to increase our profit, up to 1p. Same goes for Hero, if Hero bluffs more or less, Villain can always maximize his profit - and lower Hero´s profit, down to 0.5p.

I hope, everything´s explained in an understandable manner (and I didn´t make any blatant mistakes ;-D).

willt 11 years, 10 months ago

Looks good to me.

In addition to the "neither player can deviate to improve his EV" definition of equilibrium, there's an equivalent, possible more intuitive way to say the same thing -- equilibrium is when both players are playing maximally exploitative at the same time. And BigFizch did a good job showing how both players' attempts to play as profitably as possible actually led to balanced/equilibrium play in this simple river situation.

So, of course GTO/unexploitable play arises out of the desire to make as much money as possible. That's the point of poker, so a theory of how to play the game that didn't try to do that at some level would be a pretty crappy theory. GTO play is your most profitable strategy when your opponent is playing as profitably as possible also.

op·ti·mal
adj.
Most favorable or desirable; optimum.

op•ti•mum
n., pl. -ma (-mə), -mums, n.
1. the most favorable point, degree, or amount of something for obtaining a given result.
2. the most favorable conditions for the growth of an organism.
3. the best result obtainable under specific conditions.
adj.
4. most favorable or desirable; best.

Not hard to understand why there is confusion when optimal is used for plays that don't maximise EV.

IT is always to maximise EV, it is just special circumstance of reaching the maximised EV by both perfect nemesis equals zero ..this tread has gone out of control i think :)

"this tread has gone out of control i think :)"

I hope you didn´t refer to my last post. :-D

no i certainly wasnt :)

I've updated the OP with my revised GTO theory:

This thread has generated some interesting thoughts, and I'm going to revise some of what I've said.

"Game Theory Optimal" is an answer to one of the following questions (and depends on who you ask):

- How do I maximize my EV to the greatest extent that is possible?

- How do I play unexploitably?

First, let's start with something simpler. What is the game of poker? That's the key to understanding why proof is not possible.

This is all fine. We could argue whether intangibles are part of the game or not. We could argue that they would be part of ANY game between human beings and not just poker. (or even any human interaction also outside of games) But where does all of this actually lead? I'm a lot more interested in what I am to learn from this in terms of actual approach to play.

Let's say you raise AQ UTG and get called by CO and BB. Flop comes Q73, you bet 2/3 pot, CO raises big and BB folds. You know to a high degree of certainty that CO only raises with two pair or better.

What I hear from GTO/balance people is that you can't always fold because you're too close to the top of your range. To me this is a strange application of GTO theory. Why would that matter at all if your "close to top of your range" is still always beat? So folding AQ here is certainly exploitable, no question about that. But if he's not exploiting you...

My main interest here is what I can learn from the GTO approach, and how much of it I should bring into my own game against the actual opponents I meet. In the example above... I don't feel like I learn a whole lot from GTO until CO starts raising a more balanced (weaker) range on the flop. The whole terminology discussion is sort of interesting on some theoretical level, but I'm mostly in poker for the practical application of acquired skills.

We could argue whether intangibles are part of the game or not.

Not sure what you mean here... reading people is a part of poker, and I don't think anyone would argue that it isn't.

Yeah, I agree with this completely.

The whole terminology discussion is sort of interesting on some theoretical level, but I'm mostly in poker for the practical application of acquired skills.

I agree. The reason for working to understand GTO is to better know what it is we're working towards.

>> We could argue whether intangibles are part of the game or not.

> Not sure what you mean here... reading people is a part of poker, and I don't think anyone would argue that it isn't.

Yes. But you could say the same for football. Or rock-paper-scissors. Or Formula 1. Or some multi-player video games. Or another long list of games and sports. In fact, a big reason why people think you can learn things about other parts of life from poker are these skills to read people and put together a reasonable decision from incomplete information.

Very often, the game is understood as the system within which we play a structured game. That's how you can argue whether mind games are in the actual system or not, but it's not really a very important distinction imo.

I also don't think the set of intangibles is infinite as such, but it's probably large enough to be practically infinite.

I agree! But that begs a follow up question:

If the part of the game that is "human" (e.g. like reading people/giving off tells) is not a part of GTO, then does that mean that someone could theoretically play a perfectly GTO game, and still lose, because they are giving off information "outside of the game" through their human body?

This is great, thank you :)

Yeah.

I like to think of poker as a mind game between 2-9 people, with cards being mostly just the medium. And when playing the cards, it's often not even our own cards that we play, but our opponent's cards. I just find that approach useful, and I think a lot of people do.

What I think the GTO crowd is saying is that they are ignoring the whole mind game thing. They know it's there, but they're going to ignore it, because if they play the "optimal" style, in theory the mind games won't apply to them.

WM2K 11 years, 10 months ago

What I think the GTO crowd is saying is that they are ignoring
the whole mind game thing. They know it's there, but they're going to
ignore it, because if they play the "optimal" style, in theory the mind
games won't apply to them.

I think this is overly simplifying it. I strongly disagree with the idea of "I m a GTO player" or "I m a feel/exploitative player" and if you put see yourself in either camp you are doing yourself a disfavor. Game theory is a tool and I personally use it kind of like a plumb line. Its a reference point. The better you know what balanced plays and ranges would be theoretically the better you can deviate from those strategies in order to maximally exploit your opponent.

Its also immensely useful imo for when you are facing multiple bets on locked ish boards and your opponent has a fairly defined value range but a effectively infinite bluffing range. When you understand your own personal range and his range well your able to better understand at what thresholds do you just close your eyes and click call. As much as I d love to just grind out HU with fish all day we all have to deal with tough regulars who are very capable of mixing it up quite well. This serves as a way to be able to deal with them and not get totally ruined every time your in a pot with them.

Game theory can also be used as a weapon to attack rather then a defensive strategy. When you see someone showdown and play a key hand you are able to glean a lot of info about their whole strategy. The better you know what is approximately "optimal" in a situation they better you are able to attack the holes in their ranges.

I ve been semi following this discussion and it overall just seems kind of silly. Obv every good player is looking to as maximally exploitative as possible. The reason why Game Theory Optimal is a interesting is because its an extremely useful tool for finding holes in your ranges as well as your opponents. It also makes it much easier to figure how you should play to maximally exploit.

OK, I´ll stop arguing about GT(O) at this point ...

Your arguments are understandable and correct, yet it makes no sense to me to use a well-known and accepted term (that has it´s roots in science and mathematics - and not in poker) and "redefine" it because it has no "final practical use" for the game of poker.

If you wanna do that and discuss that - I had no problem with that, but you should invent a new name, for example "best possible poker strategy" or anything that sounds hotter and cooler, but not reuse a term that stands for sth. different, that just leads to unnecessary confusion.

It would be the same as if I´d say "a circle is not a precisely round form because no human can draw a perfect circle, so from now on let´s call anything that has no angles a circle". ;-)

Lol awesome :)

I'm not redefining anything, I'm looking for the actual definition...

willt 11 years, 10 months ago

You've been told the actual definition of GTO a bunch of times ITT.

vanity02 11 years, 10 months ago

It would be the same as if I´d say "a circle is not a precisely round form because no human can draw a perfect circle, so from now on let´s call anything that has no angles a circle.

Just for fun :)

Ye id agree use of new name is needed, since as many people do view GTO as Nash, so...GT ppls, invent us new name , pls :-)

BPS~ best poker strategy~ seems cute bigfish

Fair enough. I'm essentially explaining my take on the Nash equilibrium, how I find that it applies to poker, how I see other people seeing it, and what I think the difference is.

Ok lets talk about Micheal Dolle Optimal from now on, or MDO strategy in short.

Close. I would say it is to make the most +EV decision at every possible opportunity. This also includes decisions to say or not say things. This includes timing. This includes being aware of all information made available to you. This includes everything.

Ok that is very wide. Lets consider the example of two opponents that will play 1000 hands of rakefree 100bb cap HU NL. Both players are unable to show cards without showdown, give timing tells or chat etc, the information they have comes from the observed actions and hands that are shown. Are you able to answer the following questions:

1. Does there exists a MDO strategy for the player (you) that starts with the small blind?

2. If 1. is answered with a yes. Does this MDO strategy have the property that if your opponent knows your strategy and is able to find a full strategy for the 1000 hands that yields the maxmimum expected winrate in big blinds against your MDO strategy that this winrate is always negative or zero?

3. Is 2. is answered with a yes. Should your MDO strategy exploit the following Maximally Aggressive Optimal (MAO) strategy: Pick an arbitrary Nash equilibrium strategy for 100bb cap HU NL. At each decision point with each hand, play the option that puts the most money in the pot - ie. when the Nash equilibrium mixes between calling and folding, call; when the Nash equilibrium mixes between betting or raising and folding or calling, bet or raise; when the Nash equilibrium mixes between multiple bet or raise sizes (or calling and/or folding) pick the largest size.

4. Does your MDO strategy exploit my MAO strategy?

How is what I've said different from what you think the definition of game theory optimal is? What is your take on what it means for something to be game theory optimal?

Please answer my questions one by one.

You've created something new. Although you named it after me, this MDO is your creation.

I'm still speaking of GTO/Nash equilibriums. If you think I'm not, I'd love to hear what you think I'm saying that goes against your understanding of GTO/Nash equilibriums so that I can understand why you disagree.

Both players are unable to show cards without showdown, give timing tells or chat etc, the information they have comes from the observed actions and hands that are shown.

I'm not interested in this game.

MDO is defined by you, it is exactly what you mean in this thread. That is why I called it MDO. It is unknown to me what you exactly mean by it, that is why I asked these questions for clarification.

I'm not interested in this game.

You're not interested in this game where one player can still 3-bet 100% or checkraise any flop or play 3/3 from the big blind or open all his hands to 7x over a significant sample.

I strongly believe that any interesting non contradicting definition or description of what an optimal poker strategy should look like should have a lot to say about this game. To me it seems that we can only agree to disagree on this subject.

"I'm still speaking of GTO/Nash equilibriums."

No, you aren´t - but I guess I understand now what you´re meaning and I hope I´ll be able to explain why you´re still wrong (if what I assume is correct).

Let´s take the odd RPS-game again (and pretend it would be comparable, the differences don´t matter for that case). Both players play their perfect Nash strategy of randomly chosing R, P and S, each with a perfect 1/3 distribution.

Now you recognize a tell in your opponent, say he nods his head everytime he´s going to play rock. Suddenly, your Nash strategy isn´t best anymore, even though your opponent still plays the optimal Nash Strategy (because he first chooses randomly in his mind and THEN give away the tell, shortly before acting, so you have just enough time to adjust).

Is that what you mean?

If so, then there´s a small flaw in the thought process. By giving away hints on "his actual holding", Villain is narrowing his range. AND - this is important - that means he deviates from the optimal strategy. The point is, it´s not relevant if Villain followed his Nash strategy when he first chose his option, as soon as he narrows his range, it´s as if he had chosen from a narrowed range immediately at the first choice.

Thus, when Villain does not play optimally (anymore) - it´s obvious that we´re not talking about Nash anymore - which shows that your "adjusted" definition does not fit.

Ok, I think I get the problem now, thanks!

So the assumption is that to play GTO, I must not give away any information about my hand from anything "human". Then, by definition, my opponent is doing the same thing. This is why psychology and anything "human" can be eliminated from the solution, and that it becomes "just math".

I think that's very difficult for most people, even before you start looking at what makes something "mathematically correct", which is a ton of work.

So, to play GTO, poker must be solved for every possible outcome, AND that person would also need to never give away information about the strength of their hand, potentially becoming very robotic? Does that sound about right? And on a side note, does the idea of being like that appeal to people?

If 1 of the subjects isnt playing optimally, does it mean we exited nash? and why would that be?

maximum exploit is same time object of Nash theory, except in his theory both opponents play perfectly without any faults, which isnt a case

Thus may we indirectly conclude nash may includes exploits? if starting assumptions change, like perfect pair became imperfect, does that mean that theory cant be applied to it and if that is a case, why we ever talk of nash cause no living creature is playing perfectly anyway...

grrr..this tread was supposed to make things understandable

If 1 of the subjects isnt playing optimally, does it mean we exited nash? and why would that be?

Yes. By definition, a Nash equilibrium requires both (all?) players to be playing optimally. They could still be playing different strategies though, as we don't know how many equilibriums there are in poker.

maximum exploit is same time object of Nash theory, except in his theory both opponents play perfectly without any faults, which isnt a case

I believe that it contains exploits, and weaknesses, but that they would be "balanced", so that you and your opponent would both be exploiting each other the same amount. That's just my theory though, and it seems a lot of people disagree with my thoughts on this.

You're right though. No one will ever play perfect. Although it can be a goal to work towards perfect play, it's good to recognize that there's more to poker than that perfect play.

grrr..this tread was supposed to make things understandable

Haha, yeah, I know. We're getting there :)

LOL..it doesnt look like we ever gonna get there..like total nash never come to reality

Mike87 11 years, 10 months ago

I have read the thread and I thank you all for the interesting discussion.

I have some questions.

First, if I understand this right, playing GTO means that you'll never change your strategy based on opponent's tendencies, and it will be impossible to show a profit playing against someone that plays perfect GTO.

Then, I want to refer to bigfiszh example about the river play, but let's change something. There's one person playing GTO, and that's the one facing the river bet, with a bluff catching range, against villain's polarized range.

This means that the person playing GTO will call the river bet 50% of the time. But let's say villain decides to never bluff the river and only valuebet, how can we not change our strategy and be unexploitable? Villain could play GTO in all the other areas and only vbet this river for value and his change in strategy would now give him and edge over the other person.

This brings another question for me, that might answer the last one: Is there any range stronger than the opponent's range in GTO, at any time in a hand, if we suppose both players are playing perfectly against each other? If there is no polarized vs. bluff catching range on the river, then there is no problem anymore.

Thanks

willt 11 years, 10 months ago

First, if I understand this right, playing GTO means that you'll never change your strategy based on opponent's tendencies,

Yea, but there's nothing deep here. Playing GTO means playing a particular strategy.

and it will be impossible to show a profit playing against someone that plays perfect GTO.

Yea, in heads-up play at least, two GTO players will break even against one another on average over both positions. Obviously there's no reason to expect to break even if we're just look at one spot in a vacuum (like BigFizch's polar versus bluff-catchers river situation) or even one hand (since being out of position is bad). But on average over both positions, two GTO players facing each other expect to break even, and if one of the players stops playing GTO, he can only decrease his expectation by doing so.

But let's say villain decides to never bluff the river and
only valuebet, how can we not change our strategy and be unexploitable?
Villain could play GTO in all the other areas and only vbet this river
for value and his change in strategy would now give him and edge over
the other person.

No, this is fuzzy thinking -- go through it more carefully, and you can see that if Villain began only betting this river for value, it would not increase his EV. His value hands have the same EV since they still bet and still get called the same amount. And his air hands have the same EV when they just give up since Hero was calling the right amount to make them indifferent between bluffing and just giving up in the first place. So, overall, Villain's EV does not change.

This brings another question for me, that might answer the last one: Is
there any range stronger than the opponent's range in GTO, at any time
in a hand, if we suppose both players are playing perfectly against each
other? If there is no polarized vs. bluff catching range on the river,
then there is no problem anymore.

There may or may not be any pure polar vs bluff-catcher situations in the GTO play of the real game, but it's still a good situation to be familiar with if for no other reason than that it's a close approximation of real river situations that come up a lot. Anyway, it's the PvBC river situation is well-defined game that we can talk about solving, and I think the problem you refer to stems from a misunderstanding anyway.

Mike87 11 years, 10 months ago

Thanks for the answer. There's one thing I don't get:
No, this is fuzzy thinking -- go through it more carefully, and you can
see that if Villain began only betting this river for value, it would
not increase his EV. His value hands have the same EV since they still
bet and still get called the same amount. And his air hands have the
same EV when they just give up since Hero was calling the right amount
to make them indifferent between bluffing and just giving up in the
first place. So, overall, Villain's EV does not change.

I agree on the part that the value hands have the same EV. But the caller will only face bets with hands that beat him, and he shouldn't call at all if an adjustment is made (it will no longer be GTO tho).

Villain would increase his EV imo, since instead of betting some of his bluffs and getting called half the time, he checks all of them which represents 0EV. He gains EV in his overall betting range since he gets value from his value range and never get caught bluffing with his bluffs.

willt 11 years, 10 months ago

Nah, at the equilibrium, Villain's bluffs have the same EV whether they bet or check because Hero is bluff-catching just the right frequency to make it so.

That is, bluffing the size of the pot and checking are both "0EV" for Villain's air hands when Hero is calling 50% of the time. So Villain can't improve his EV by changing just his strategy. If he switches to always checking (or always bluffing for that matter) and Heros strategy stays fixed, then Villain's air hands stay "0EV".

First, if I understand this right, playing GTO means that you'll never change your strategy based on opponent's tendencies,

Yea, but there's nothing deep here. Playing GTO means playing a particular strategy.

No, it doesn't. At least, we can't know that. Assuming poker is finite, we know there is at least one Nash equilibrium. We can't assume that there would be only one.

GTO says that we don't change our strategy, or at least it wouldn't be +EV to do so. GTO doesn't say that our strategy isn't one that adapts to our opponent.

willt 11 years, 10 months ago

Michael,

You seem to still be confused. Certainly many games have multiple equilibria. Are you implying that unexploitable play might involve switching from one equilibrium strategy to another in order to exploit our opponent?

Also, what do you think the word "strategy" means, precisely?

Will

You seem to still be confused.

So do you.

Certainly many games have multiple equilibria. Are you implying that unexploitable play might involve switching from one equilibrium strategy to another in order to exploit our opponent?

What I'm getting at is a deeper level of strategy that involves the appearance of switching strategies, but really it was a part of the strategy itself.

Also, what do you think the word "strategy" means, precisely?

A process for making decisions.

willt 11 years, 10 months ago

What I'm getting at is a deeper level of strategy
that involves the appearance of switching strategies, but really it was a
part of the strategy itself.

Well, that hand-wavy mystical stuff is all well and good, but don't lose track of what actually matters -- players' frequencies. So you mean to say that unexploitable strategies can involve changing your frequencies in certain spots over time?

Well, that hand-wavy mystical stuff is all well and good, but don't lose track of what actually matters -- players' frequencies.

Your comment is going towards insulting... and while we may have different viewpoints of what is mystical, your intent here is to put down my argument by attaching it without merit to something that a lot of people here are likely to look down upon.

I'm using logic here, so I ask, how is logic mystical? Player frequencies are great and all, but you can never know what player frequencies really are. By the time you could "know" them, they could've already changed... leaving your knowing to only cover the past.

Another problem with frequencies is that they become generalized so that we have a useful sample size. If I see your fold to Cbet is 50%, what does that mean? It means that over the hands we've been at the same table, you've folded 50%. Now we're in a hand together. I open the button, and you call out of the small blind. You check to me, and I cbet. Should I assume that you fold 50% here? Does your fold to cbet change based on who you're playing against? How about the board texture? Position? All of those things complicate the situation, making it impossible to have a true frequency of any action.

So you mean to say that unexploitable strategies can involve changing your frequencies in certain spots over time?

I don't know. No one has ever come up with an unexploitable strategy for poker, and I don't think anyone ever will... all we can do is speculate. It makes sense to me that it would be possible, but to say "yes" or "no" would be misleading without having a true GTO strategy to compare to.

The only GTO "proofs" that show up are for a simplified sub-game of poker. There is no GTO for a big bet game of poker, and I'm pretty sure there isn't one for limit yet either.

When you say player frequencies are important, I agree with that. And how do you expect to know what someone's frequencies are? By their history? The problem with that is that your opponent's frequencies could change. OR, your opponent may be seeing the situation differently than you, causing you to misread their frequencies. An example of this would be seeing someone with a fold to 3bet frequency of 85% on my HUD, but vs. me their fold to 3bet is 45%. The 85% would be misleading, and I would never be able to know the true frequency... although if I know what I'm doing I can get a close approximation.

I would say that truth is important. Saying a strategy is GTO is great, if it's true... but proving a small sub-game of poker is "unexploitable" doesn't show us proof for the entire game.

What I'm getting at is that anyone claiming to have a GTO strategy is being deceptive. I'm not saying that "GTO" math isn't useful... it is. I'm just saying that it's not really GTO, and that it's misleading to tell everyone that one strategy is GTO, and another isn't without knowing... which no one can do.

willt 11 years, 10 months ago

Michael,

I really don't mean to be insulting.

I believe all of your comments about figuring out players' frequencies are besides the point. If a player plays a certain strategy, he has certain frequencies. For example, in the PvBC river situation BigFizch described above, maybe the bluff-catching player's strategy was "call 50% of the time, randomly, and fold the other 50%". Certainly figuring out opponents' frequencies is challenging at the tables for a variety of reasons, but that is a complete non-issue in solving for GTO play.

I believe your objection to people claiming to have a GTO strategy to the full game is a strawman argument. I don't know of anyone making such a claim. If somone is, I agree that he is wrong.

That said, the fact that no equilibrium of a big-bet game is known does not mean that anyone is free to say anything they want about what it might look like. In fact, we can confidently say a good many things about it, starting from the definition and some properties of the game and applying logic.

I hope you will consider taking a step back and spending some time learning about how simple games are solved. They are much easier to work with than the full game, and it might give you a better feel for some of the concepts involved. You will be able to see why GTO is defined the way it normally is. And some of the results even provide insights useful for real play.

Thanks again for starting this thought-provoking thread.

Will