Inductive decision theory

damc4

Summary

I'm proposing inductive decision theory (IDT) - a descriptive decision theory (an algorithm how to predict people's actions in order to know what action to take) that aims to fix the imperfections of the standard game theory frameworks.

The decision theory is based on the Solomonoff inductive theory. The decision theory predicts actions of the players, assuming that their actions are generated with some rule (program), and it assigns higher prior probability to the programs that are simpler.

Additionally, it assigns higher prior probability to the actions that are optimal, assuming that:

Optimal actions means the actions that give the highest expected utility.
Other player's actions are predicted using inductive decision theory.
Other players, if their strategy is to select the optimal action, will also use inductive decision theory to predict other players actions.

For now, it's just a general idea. I will improve it / go in greater detail, if/when I have time.

Theory

To the best of my knowledge, there is no game theory framework that takes into account all of the below facts:

Humans don't always take the action that maximizes expected utility.
Humans are learning agents that dynamically and empirically learn about what are the other players' strategies.
To some extent, we can predict how one human will behave based on how other humans behaved in the past, when they were in a similar situation.

This decision theory is based on the Solomonoff inductive theory (hence "inductive" part in the name). This theory predicts actions of the players, assuming that their actions are generated with some rule (program), and it assigns higher prior probability to the programs that are shorter (analogically to how Solomonoff induction assigns higher probability to programs that are shorter, when predicting the next number in the sequence).

The rationale behind that is that if Solomonoff induction is a correct way to predict reality, then it's also a correct way to predict human action because humans and their actions are part of reality.

This decision theory assumes that all players' actions are generated with the same program. That is different for example from Bayesian Fictitious Play which tries to predict a program (strategy) for each player separately.

The rationale behind that is the fact that humans are similar to some extent, and if one human (player) acted in certain way, then it's reasonable to assign higher probability that another human will also act in that way. If we assume that each player has a different program, then we can't make conclusions about actions of a player based on the actions of the previous players. That would be limiting.

But that one program can return different actions for different players. For example, it can be something like: if player_taking_action = player X, then ... else .... So, despite the fact that theory assumes that all actions are generated with one program, the theory doesn't always have to assume that each player will play according to the same strategy.

Additionally, the theory assigns higher prior probability to the actions that are optimal, assuming that:

Optimal action means the action that results with the highest expected utility for the player that takes the action.
When deciding the optimal action, other player's actions are predicted using inductive decision theory.
Other players, if their strategy is to select the optimal action, will also use inductive decision theory to predict other players actions.

The rationale behind that is that humans aim to take the action that gives them highest expected utility, so their action is more likely to be the action that gives them the highest expected utility.

The problem with that theory is that it's not possible to compute the optimal actions because:

There is infinite number of programs to consider.
Some programs will fall into infinite loop.
Sometimes optimal action of player A depends on optimal action of player B which depends on optimal action of player A and so on... and that repeats for infinity.

The solution to that is:

To consider only certain most probable programs instead of all programs.
If a program executes more than X operations, then assume that the program returns a random action.
If we deal with the 3rd problem, then when we reach certain depth, we simply assume that the player will take a random action.

Areas for improvement

There exists a precedent in human past that when humans can communicate and they commit to certain strategy profile, then they will indeed follow that strategy profile, as long as that strategy profile is beneficial to them. We know that for example from experiments with Stag-Hunt game - if people can communicate, then they are able to agree on choosing "Stag" in that game (which is the Pareto-optimal equilibrium) and proceed with that. We can therefore assign a high prior probability to a program/rule that says "if people can communicate and they commit to choosing a certain pareto-optimal strategy profile, then they will act according to that strategy profile".

Effective Altruism Forum
EA Forum

Inductive decision theory

2

Summary

Theory

Areas for improvement

2

Reactions

More posts like this