Humans vs AI: who won statistically?
Posted by IamIndifferent
Posted by
IamIndifferent
posted in
High Stakes
Humans vs AI: who won statistically?
Prof Tomas Sandholm claims the challenge was a statistical tie. BrainsvsAI
On the twoplustwothread, one of the developers, Noam Brown claimed they calculated the 95% confidence interval based on the 80,000 mirrored hands that were played and it was +/- 10.35bb/100. The pros won by 9.16bb/100.
But isn't Noam wrong as they should have used a one-sided test based on the apriori hypothesis that the Humans are better than the AI. Hence Humans won at 95% confidence interval.
Secondly how does one statistically account for the mirrored hands. The 80,000 hands are not an independent sample. There are 40,000 independent hands.
Loading 2 Comments...
I do not think you need to be well educated in statistic as such to tell that the humans massivly outscaled the program.
80k hands is a big enough sample size to make these assumptions and almost beating the AI by a double digit bb/100 winrate is crushing to my knowledge.
The AI should also not be affected by tilt as such (selected human players neither because there was no actual money to win or lose in the challenge), but given some hands posted in the forums it seems Claudico has anger issues or gets bored to easily. A bit more bot-care is adviced in that matter.
We still have to keep in mind that the program played against well versed and highly experienced HU specialists, so a mediocre player might not perform that good.
"But isn't Noam wrong as they should have used a one-sided test based on the apriori hypothesis that the Humans are better than the AI" not necessarily if the hypothesis was to be " humans performance /winnings differs from bots perfomance/winnings over a long stretch of hands". Then you can cover the whole range of possible outcomes of experiment (bots=humans, botshumans).
When you followed the stream it was charming and amusing to hear Doug Polk explaining some game theoretical concepts such as very common poker terms like "cooler" to the bystanding programmers.
Be the first to add a comment