Jobst Heitzig (vodle.it)

Senior Researcher / Lead, FutureLab on Game Theory and Networks of Interacting Agents @ Potsdam Institute for Climate Impact Research
206 karmaJoined Oct 2022Working (15+ years)

Bio

I'm a mathematician working on collective decision making, game theory, formal ethics, international coalition formation, and a lot of stuff related to climate change. Here's my professional profile.

My definition of value :

  • I have a wide moral circle (including aliens as long as they can enjoy or suffer life)
  • I have a zero time discount rate, i.e., value the future as much as the present
  • I am (utility-) risk-averse: I prefer a sure 1 util to a coin toss between 0 and 2 utils
  • I am (ex post) inequality-averse: I prefer 2 people to each get 1 util for sure to one getting 0 and one getting 2 for sure
  • I am (ex ante) fairness-seeking: I prefer 2 people getting an expected 1 util to one getting an expected 0 and one getting an expected 2.
  • Despite all this, I am morally uncertain
  • Conditional on all of the above, I also value beauty, consistency, simplicity, complexity, and symmetry

How others can help me

I need help with various aspects of my main project, which is to develop an open-source collective decision app, http://www.vodle.it :

  • project and product management
  • communication, marketing, social media
  • quality control, testing
  • translations
  • funding

How I can help others

I can help by ...

  • providing feedback on ideas
  • proofreading and commenting on texts

Comments
50

Still, it might add more effort for the non-native speaker because a native speaker can identify something as jargon more easily. This is only a hypothesis of course, so to make progress in this discussion it might he helpful to review the literature on this.

What is OAA? And, more importantly: where now would you put it in your taxonomy?

"targeting NNs" sounds like work that takes a certain architecture (NNs) as a given rather than work that aims at actively designing a system.

To be more specific: under the proposed taxonomy, where would a project be sorted that designs agents composed of a Bayesian network as a world model and an aspiration-based probabilistic programming algorithm for planning?

Where in your taxonomy does the design of AI systems go – what high-level architecture to use (non-modular? modular with a perception model, world-model, evaluation model, planning model etc.?), what type of function approximators to use for the modules (ANNs? Bayesian networks? something else?), what decision theory to base it on, what algorithms to use to learn the different models occurring in these modules (RL? something else?), how to curate training data, etc.?

Small remark regarding your the metric "* 100% minus the probability that the given technological restraint would have occurred without protests" (let's call the latter probability x): this seems to suggest that given the protests the probability became 100% while before it had been x and that hence the protests raised the probability from x to 100%. But the fact that the event eventually did occur does not mean at all that after the protests it had a probability of 100% of occurring. It could even have had the very same probability of occurring as before the protests, namely x, or even a smaller probability than that, if only x>0.

What you would actually want to compare here is the probability of occurring given no protests (x) and the probability of occurring given protests (which would have to be estimated separately).

In short: your numbers overestimate the influence of protests by an unknown amount.

So we're converging...

One final comment on your argument about odds: In our algorithms, specifying an allowable aspiration includes specifying a desired probability of success that is sufficiently below 100%. This is exactly to avoid the problem of fulfilling the aspiration becoming an optimization problem through the backdoor.

Dear Seth, thank you again for your opinion. I agree that many instrumental goals such as power would be helpful also for final goals that are not of the type "maximize this or that". But I have yet to see a formal argument that show that they would actually emerge in a non-maximizing agent just as likely as in a maximizer.

Regarding your other claim, I cannot agree that "mismatched goals is the problem". First of all, why do you think there is just a single problem, "the" problem? And then, is it helpful to consider something a "problem" that is an unchangeable fact of life? As long as there is more than one human who is potentially affected by an AI system's actions, and these humans' goals are not matched with each other (which they usually aren't), no AI system can have goals matched to all humans affected by it. Unless you want to claim that "having matched goals" is not a transitive relation. So I am quite convinced that the fact that AI systems will have mismatched goals is not a problem we can solve but a fact we have to deal with.

Dear Seth,

if Yonatan meant it the way you interpret it, I would still respond: Where is the evidence that such a reward function exists and guides humans' behavior? I spoke to several high-ranking scientists from psychology and social psychology who very much doubt this. I suspect that the theory of humans aiming to maximize reward functions might be a non-testable one, and in that sense "non-scientific" – you might believe in it or not. It helps explaining some stuff, but it is also misleading in other respects. I choose not to believe it until I see evidence.

I also don't agree that optimization is a red herring. It is a true issue, just not the only one, and maybe not the most severe one (if one believes one can separate out the relative severity of several interlinked issues, which I don't). I do agree that powerful agents are another big issue, whether competent or not. But powerful, competent, and optimizing agents are certainly the most scary kind :-)

Hi Seth, thank you for your thoughts!

I totally agree that it's just a start, and I hope to have made clear that it is just a start. If it was not sufficiently clear before, I have now added more text making explicit that of course I don't think that dropping the optimization paradigm is sufficient to make AI safe, just that it is necessary. And because if appears necessary and under-explored, I chose to study it for some time.

I don't agree with your 2nd point however: If an agent turns 10% of the world into paperclips, we might still have a chance to survive. If it turns everything into paperclips, we don't.

Regarding the last point:

  • Quantilizers are optimizing (namely a certain "regularized" reward function)
  • By "surprising amount" you probably mean "surprisingly large amount"? Why is that surprising you then if you agree that they are a "start on taking the points off of the tiger's teeth"? Given the obvious risks of optimization, I am also surprised by the amount of support non-maximization approaches get: namely, I am surprised how small that support is. To me this just shows how strong the grip of the optimization paradigm on people's thinking still is :-)
  • I believe any concentration of power is too risky, regardless whether in the hands of a superintelligent AI or dumb human. I have now added some text on this as well.

The "impossible to correlate perfectly" piece is like in AI alignment, where one could also argue that perfect alignment of a reward function to the "true" utility function is impossible.

Indeed, one might even argue that the joint cognition implemented by the EA/rationality/x-risk community as a whole is a form of "artificial" intelligence, let's call it "EI" and thus we face an "EI alignment" problem. As EA becomes more powerful in the world, we get "ESI" (effective altruism superhuman intelligence) and related risks from misaligned ESI.

The obvious solution in my opinion is the same for AI and EI: don't maximize, since the metric you might aim to maximize is most likely imperfectly aligned with true utility. Rather satisfice: be ambitious, but not infinitely so. After reaching an ambitious goal, check if your reward function still makes sense before setting the next, more ambitious goal. And have some human users constantly verify your reward function :-)

Load more