Senior Researcher / Lead, FutureLab on Game Theory and Networks of Interacting Agents @ Potsdam Institute for Climate Impact Research
I'm a mathematician working on collective decision making, game theory, formal ethics, international coalition formation, and a lot of stuff related to climate change. Here's my professional profile.

My definition of value :

  • I have a wide moral circle (including aliens as long as they can enjoy or suffer life)
  • I have a zero time discount rate, i.e., value the future as much as the present
  • I am (utility-) risk-averse: I prefer a sure 1 util to a coin toss between 0 and 2 utils
  • I am (ex post) inequality-averse: I prefer 2 people to each get 1 util for sure to one getting 0 and one getting 2 for sure
  • I am (ex ante) fairness-seeking: I prefer 2 people getting an expected 1 util to one getting an expected 0 and one getting an expected 2.
  • Despite all this, I am morally uncertain
  • Conditional on all of the above, I also value beauty, consistency, simplicity, complexity, and symmetry

How others can help me

I need help with various aspects of my main project, which is to develop an open-source collective decision app, :

  • project and product management
  • communication, marketing, social media
  • quality control, testing
  • translations
  • funding

How I can help others

I can help by ...

  • providing feedback on ideas
  • proofreading and commenting on texts


Joe thinks, in contrast with the dominant theory of correct decision-making, that it’s clear you should send a million dollars to your twin.

I'm deeply confused about this. According to the premise, you are a deterministic AI system. That means what you will do is fully determined by your code and your input, both of which are already given. So at this point, there is no longer any freedom to many a choice – you will just do what your given code and input determine. So what does it mean to ask what you should do? Does that actually mean: (i) what code should your programmer have written? Or does it mean: (ii) what would the right choice be in the counterfactual situation in which you are not deterministic after all and do have a choice (while your twin doesn't? or does as well?). In order to answer version (i), we need to know the preferences of the programmer (rather than your own preferences). If the programmer is interested in the joint payoff of both twins, she should have written code that makes you cooperate. In order to answer version (ii), we would need to know what the consequences of making either choice in the counterfactual world where you do have a choice are on the possibility of the other twin to make a choice. If your choice does not influence the possibility of the other twin to make a choice, the dominant strategy is defection, as in the simple PD. Otherwise, who knows...

I liked the "we don't trade with ants" post, but I think it misses an additional reason why we would be near useless for superhuman AGI even if we can communicate with them: We would be so slow as compared to them that in their terms we would take forever to understand and do what they would want from us, so that it would still be more efficient to just move us out of the way instead of telling us to move out of the way and then wait until we get it and do it in slo-mo.

Not exactly. A typical SLA only contains a lower bound, that would still allow for maximization. The program for a satisficer in the sense I meant it would states that the AL system really aims to do no better than requested. So, for example, quantilizers would not qualify since they might still (by chance) choose that action which maximizes return.

The challenge isn’t figuring out some complicated, nuanced utility function that “represents human values”; the challenge is getting AIs to do what it says on the tin—to reliably do whatever a human operator tells them to do.

IMO, this implies we need to design AI systems so that they satisfice rather than maximize: perform a requested task at a requested performance level but no better than that and with a requested probability but no more likely than that.

Yes, he is, so what? Does that mean he has to do all the work alone? Surely not. Bob pointed out a neglected area that should be investigated. By who? Probably by those best placed to do this within EA. Are you suggesting Bob is this person?

As every month, I let others decide collectively where I will donate. If you want to participate:

Answer by Jobst Heitzig ( 23, 20224

I donate monthly to charities collectively chosen by my colleagues and friends, for four reasons:

  • I believe in epistemic democracy
  • I want to learn about others' priorities
  • I want to encourage them to think about donating themselves
  • I want to test my voting app (which I use for that collective decision)

So far, most of those donations went to public health charities like Doctors Without Borders or Malaria Consortium. If you want to have a say where my November donation goes:

Answer by Jobst Heitzig ( 13, 202210

I forgot to add that there are of course also approaches based on regret.

  • Let's call each possible solution of ambiguity a scenario.
  • For each scenario Z and each possible strategy S, one can estimate the expected value of S in scenario Z, let's denote that by v(S|Z).
  • Let's call the difference in expected value between the chosen strategy S and the optimal one in Z the regret in Z, denoted r(S|Z) = max{v(S'|Z): strategies S'} – v(S|Z).
  • Let's denote the minimal and maximal regret when choosing S by minr(S) = min{r(S|Z): all scenarios Z} and maxr(S) = max{r(S|Z): all scenarios Z}

Then Savage's minimax regret criterion demands one should choose that S which minimizes maxr(S). The advantage over the Hurwicz criterion is that the latter only looks at the two most extreme scenarios, which might not be representative at all of what will actually happen, while Savage's criterion takes into account the available information about all possible scenarios more comprehensively.

Obviously, one might combine the Hurwicz and Savage approaches into what one might call the regret-based Hurwicz or Savage–Hurwicz criterion that would demand choosing that S which minimizes h maxr(S) + (1–h) minr(S), where h is again some parameter aiming to represent one's degree of ambiguity aversion. (I haven't found this criterion in the literature but think it must be known since it is such an obvious combination.)

This is tremendously helpful!

I personally sometimes have an anger problem. Curiously it mostly happens is someone I love seems to be obviously wrong in a recurring way.

I believe part of the reason that I then sometimes get angry is that it may then seem that the person I love might be less worthy of my love because of their seemingly silly opinion or behaviour. At the same time, I then notice that such a thought of mine is itself silly, and that makes me angry at myself. But in such a situation, I can't admit that I'm angry at myself, so I end up acting as if I was angry at the other person.

What a mess...

One thing that seems to help me most of the time is the buddhist "loving kindness" exercise, as for example explained here.

I think you are right, and the distinction still makes sense, but only as a theoretical device to disentangle things in thought experiments, maybe less in practice, unless one can argue that the correlations are weak.

