aAtlas's avatar

aAtlas

2 points

Ahh I see what you're saying, BigFiszh. Yes, you are correct. I'm aware that solvers use a guided adjustment method and not a random one (to be honest random is quite inefficient) and the raise/call/fold frequencies in the example that you gave are close to consistent with the regret matching / CFR algorithm that is actually used in solvers and poker AIs. I ultimately wanted to avoid talking about that and give an overly simplified conceptual description but in the end I just ended up spreading a bit of misinformation on the topic.

Thanks for all the responses. I appreciate that someone is keeping my answers in check!

Cheers

Feb. 12, 2020 | 2:42 p.m.

Hm, maybe my comment might have explained this in a way that is not entirely clear.

I completely agree with you that the direction in which the solver ultimately adjusts in is the direction of highest EV, but the part of my comment that you are referring to is not incorrect. What I was getting at with that part my explanation was more of getting at the heart of how some of these optimization algorithms work -- how exactly does the solver know which direction to adjust in that result in the highest EV? These optimization algorithms are not bestowed with the knowledge that raising the nuts has a higher EV than folding the nuts. What it must do first is try every adjustment randomly (even increasing folding frequencies of the nuts), then check the EV of every adjustment. Once that is done, the solver goes "Ah! Increasing the raise frequency of the nuts results in the highest EV change!", then it finally adjusts in that direction and repeats the process. It is easy to see how this is analogous to being blind folded on a mountain and trying to find the peak.

My comment about random adjustments was simply elaborating on how the solver finds the gradient of EV across all strategical adjustments.

If you keep reading a few sentences after the sentence that you quoted above, you will find that my comment is in line with your explanation.

It makes small adjustments to the frequencies of every action randomly (takes a step in a bunch of different directions), and then checks the change in EV with every adjustment (checks the change in elevation). The solver will keep adjusting in the direction that causes the highest EV increase

I apologize if my original comment was slightly convoluted and required further explanation to be made clear.

In any case, my description is only an oversimplified / conceptually similar process to what a solver like PioSolver actually does algorithmically. If I recall correctly, in the original 2p2 post for the release of Pio, punter11235 stated that older, pre-release versions of Pio were using a type of CFR algorithm but they have since moved on to faster algos.

expand

Feb. 11, 2020 | 8:24 p.m.

Hi BigFiszh,
Apologies, seems like I did not read your post before commenting. Would you mind telling mewhich part exactly about my comment is incorrect? Thanks.

Feb. 11, 2020 | 2:25 p.m.

WarLoGhE, it seems you are familiar with coding to some degree given your last post. If that is the case, then you can think of how solvers approach the Nash Equilibrium strategy as conceptually similar to how gradient descent works in Machine Learning.

Think of it like this: Imagine that you are blindfolded and placed randomly on a mountain. How do you go about finding the peak? The answer is to take one step in many different directions and with each step in a given direction, check the change in elevation. Once you determine which direction causes the largest increase in elevation, move one step in that direction and repeat the process. You will reach the peak when a step in any direction results in a drop in elevation.

The solver starts with a random strategy (typically an even distribution in frequency of raise / call / fold). It makes small adjustments to the frequencies of every action randomly (takes a step in a bunch of different directions), and then checks the change in EV with every adjustment (checks the change in elevation). The solver will keep adjusting in the direction that causes the highest EV increase and you will hit Nash Equilibrium (the peak of the mountain) when any adjustment to your current strategy results in a drop in EV (elevation).

In reality, we can never obtain true Nash Equilibrium (solvers are just a toy game representation of the entire game tree after all). But the solver will keep making adjustments with ever smaller and smaller increases to EV until you tell it to stop at a point that is acceptably close to equilibrium.

Hope this helps.

expand

Feb. 10, 2020 | 9:55 p.m.

All results loaded
Runitonce.com uses cookies to give you the best experience. Learn more about our Cookie Policy