The problem is you are using a rounding of the balance across the million tries to decide on game B. The choice of which game B to use need to not be dependent on the other experiments.
I'm not so sure; when you read the description for Game B it should be clear that it's odds rely on your balance, an external factor.
As that is something that can be manipulated then it isn't a major leap, I think, to the idea that combinations of the two games would produce different outcomes - and that some outcomes would be positive.
Whilst fun; I rather preferred the latter part of the post - it reminded me of a brilliant book I had as a kid with many such puzzles in it :)
If you pick C, and your opponent picks B, you lose 58% of the time. What is optimal about that?
> If Gates had chosen first, then whichever die he chose, Buffett would have been able to find another die that could beat it (that is, one with more than a 50% chance of winning).
It seems like you're thinking about this outside of the relevant context, which is a math problem, loosely inspired by real life games.
A hypothesis was presented: using higher numbers will give 5-15 more resource cards over the course of a game. Then, conclusions were drawn that standard statistical techniques are deficient for not being able to detect the bias.
Where are the results of the actual experiment, say of 10 games played using the biased strategy, against a control opponent? Perhaps, standard statistical techniques are correct.
Without empirical tests of the hypothesis, where is the science? Perhaps, the dice are only loaded for a short time, in which time all the skew occurred. Without testing of the hypothesis, it is impossible to know.
The explanation that made most sense to me is to treat the second choice as a completely independent game (like a coin toss). You're given the choice to stick with your one-in-a-hundred chance, or you can instead go for a coin toss where the chances are obviously much higher.
You're changing the rules and shifting the goalposts. Look, it's simple: if the objective is to win then you want to avoid giving your opponent any chance to beat you. That means you never choose dice.
> The point is that there are appropriate tools for reasoning about such games correctly - utility, decision theory, etc. - and they beat minimax.
Perhaps this is true but you have not convinced me that this is the case here. Moreover, the OP claims to have empirical results showing that AB-pruning does better than MC-search for the game of 2048.
I'll try to write up a simulation this weekend. I'm not entirely sure where we are disagreeing, though. I'm not saying that you have 1/2 chance of winning the game. Only if you have the chance to swap, you win half of the time if you do. That seems to be what you said in a sibling thread.
You're using a mathematical model that doesn't apply. The ahead-of-time simulations invalidate the idea that your decision can't affect the outcome, despite the final decision ultimately occurring afterwards.
An analogy would be asserting that you can't possibly shoot yourself in the back of the head when firing into the distance, and sticking to that position even after finding out you're in a pac-man-style loop-around world.
A much closer but more technical analogy is that you can't solve imperfect information games by recursively solving subtrees in isolation. Optimal play can involve purposefully losing in some subtrees, so that bluffs are more effective in other subtrees.
The fact that you are doing worse by two-boxing, leaving with a thousand dollars instead of a million, despite following logic that's supposed to maximize how well you do, should be a huge red flag.
Also if you include a mixed (randomized) strategies the postulated Newcomb's problem is idiotic, because the other player is not an oracle. That's why they included a dirty fix that "if the Predictor predicts that the player will choose randomly, then box B will contain nothing".
> There isn't one game, there are infinitely many, and different people will have success at different games. But there is also the meta-game: The game of being able to win the highest variety and largest amount of different games.
I am less certain the meta-game is inherently about broadening the variety of status measurements, and more about setting the terms by which status is measured. A person who can win a large variety of such measurements might trend toward a broadening strategy such as you describe (since this is likely to favor their odds of winning any given contest), but ultimately the problem of of a "split decision" between multiple comparisons means we should expect any given contest to only ever use one means of status measurement.
Therefore the meta-game is about winning the selection process for that measurement (with the broad strategy being just one among many possibilities)....
Player preferences aren't within the bounds of the rules of the game, though.
You can have 100% knowledge of the bounds of the game's rules and still have a tremendous amount of difficulty in ascertaining player preference. The article makes it clear that the process of iterative playing of computationally complex decision games does not assure very significant approximation of preference.
In other words, you need to do something more than just play in order to reliably reach a place close to equilibrium.
Reality: the people playing more frequently played numbers don't have a firm enough grasp of statistics to be able to calculate their plays in a game theory scenario.
That only makes sense if you restrict Bob's decision making to consider each game in isolation, without regard to how how rejecting an offer in one game might lead to a higher offer in a future game. At that point, I would call it a different problem entirely, so there isn't much point in comparing the results. I certainly wouldn't label that version of the problem "rational" and the version where Bob considers future games "irrational".
On further reflection, you're right, I'm conflating it.
My correction is still valid though. You're not handling step ten properly. You didn't work over the information sets, didn't solve the actual graph that is the game, didn't handle the under-specified policy function.
To try and show you that your solution isn't the actual solution: well, both options have the same EV. So I choose switch every time, because why not. As you are no doubt aware I never get to have EV because I'm constantly swapping. The sixty is a mirage. For my policy choice, the answer was undefined or zero depending on how you write it down. But you told me they had the same EV. So if they did, why did my choice not produce that EV? Ergo, your solution only appears to be giving you the EV.
Think about that for a while and you'll start to realize why I honed in on specifying a recurrence relationship with the terminal keep node and why I'm so eager to escape the trap of their flawed problem model.
Let me start by saying from where did the idea of this project, because I find it is one of the most interesting things about a project.
The idea comes reading "Thinking, Fast and Slow".
I created a simple game to give a simple demonstration of how we are often biased in making choices.
If you are interested you can go into that reading this paper
Choices, Values, and Frames: http://web.missouri.edu/~segerti/capstone/choicesvalues.pdf
and this article: https://en.wikipedia.org/wiki/Loss_aversion
The game consists of choosing between two different bets.
reply