ReBeL: A general game-playing AI bot that excels at poker and more

ikeboy | karma 14321 | avg karma 2.95 · 2020-12-03 22:30:36+00:00

https://arxiv.org/abs/2007.13544

Underlying paper here. Originally published several months ago and updated with more information last week.

syntaxing | karma 4590 | avg karma 2.72 · 2020-12-04 00:25:43+00:00

Can anyone elaborate what “search” means in this context? It seems they are trying to determine the most probable Nash equilibrium but not sure what this mean when applied to RL.

gjstein | karma 511 | avg karma 5.32 · 2020-12-03 19:42:48

The "search" here refers to the idea that, in principle, you could search the entire space of possible hands of cards and exhaustively predict the optimal action by imagining every hand. However, like in the game of Go, this is computationally intractable, so instead they use machine learning to "guide search" towards more promising "moves". In AlphaGo (and here) this learning happened as part of a reinforcement learning pipeline.

There was some discussion of this in the AlphaGo Zero blog post from a while back: https://deepmind.com/blog/article/alphago-zero-starting-scra...

reply

jtsiskin | karma 1440 | avg karma 2.22 · 2020-12-04 02:29:38+00:00

To summarize in a photo, here’s minimax search with chess: https://cdn--media--1-freecodecamp-org.cdn.ampproject.org/i/...

Imagine searching through the entire tree - billions of nodes due to the branching of chess. Now use RL to help decide where to search and branches to prune

reply

hajimemash | karma 88 | avg karma 1.49 · 2020-12-03 20:49:34

this jpeg link downloaded a webp file to my computer and wen i try to open it it asks if im sure i want to open it. webp seems to be an image file but is it safe to open

freeone3000 | karma 6828 | avg karma 2.71 · 2020-12-04 05:27:36+00:00

WebP is the still image format of VP8 (like how HEIC is the still image format of x265). It's no more dangerous than any other image.

tghw | karma 5370 | avg karma 6.15 · 2020-12-03 20:54:09

When I was in grad school, I was working on general game playing AI. Unfortunately, I was in a "pure logic" research group, founded on the old-school AI principles that believed AI could be derived from deterministic logic.

Of course, this limited the games that we could simulate to purely deterministic games (checkers, chess, go, etc.). Any games that included an aspect of chance required a hack like a "dice player" or a "deck player" that would add the random aspects of the game. Of course, this led to other problems, since the engines would try to calculate the current state of the game based on the "optimal" play of the random player.

This is a much more interesting approach, and I imagine will prove to be far more useful.

reply