2 comments, sorted by Click to highlight new comments since: Today at 12:11 PM
New Comment

One doubt on superrationality:

(I guess similar discussions must have happened elsewhere, but I can't find them. I am new to decision theory and superrationality, so my thinking may very well be wrong.)

First I present an inaccruate summary of what I want to say, to give a rough idea:

  • The claim that "if I choose to do X, then my identical counterpart will also do X" seems to (don't necessarily though; see the example for details) imply there is no free will. But if we in deed assume determinism, then no decision theory is practically meaningful.

Then I shall elaborate with an example:

  • Two AIs with identical source codes, Alice and Bob, are engaging in a prisoner's dillema.
  • Let's first assume they have no "free will", i.e. their programs are completely deterministic.
    • Suppose that Alice defects, then Bob also defects, due to their identical source code.
    • Now, we can vaguely imagine a world in which Alice had cooperated, and then Bob would also cooperate, resulting in a better outcome.
    • But that vaguely imagined world is not coherent, as it's just impossible that, given the way her source code was written, Alice had cooperated.
    • Therefore, it's practically meaningless to say "It would be better for Alice to  cooperate".
  • What if we assume they have free will, i.e. they each have a source of randomness, feeding random numbers into their programs as input?
    • If the two sources of randomness are completely independent, then decisions of Alice and Bob are also independent. Therefore, to Alice, an input that leads her to defect is always better than an input that leads her to cooperate - under both CDT and EDT.
    • If, on the other hand, the two sources are somehow correlated, then it might in deed be better for Alice to receive an input that leads her to cooperate. This is the only case in which superrationality is practically meaningful, but here the assumption of correlation is quite a strong claim and IMO dubious:
      • Our initial assumption on Alice and Bob is only that they have identical source codes. Conditional on Alice and Bob having identical source codes, it seems rather unlikely that their inputs would also be somehow correlated.
      • In the human case: conditional on my counterpart and I having highly similar brain circuits (and therefore way of thinking), it seems unreasonable to assert that our "free will" (parts of our thinking that aren't deterministically explainable by brain circuits) will also be highly correlated.

After writing this down, I'm seeing a possible response to the argument above:

  • If we observe that Alice and Bob had, in the past, made similar decisions under equivalent circumstances, then we can infer that:
    • There's an above-baseline likelihood that Alice and Bob have similar source codes, and
    • There's an above-baseline likelihood that Alice and Bob have correlated sources of randomness.
    • (where the "baseline" refers to our prior)


  • It still rests on the non-trivial metaphysical claim that different "free wills" (i.e. different sources of randomness) could be correlated.
  • The extent to which we update our prior (on the likelihood of correlated inputs) might be small, especially if we consider it unlikely that inputs could be correlated. This may lead to a much smaller weight of superrational considerations in our decision-making.