Robi Rahman🔸

Data Scientist @ MIRI Technical Governance Team
1579 karmaJoined Working (6-15 years)New York, NY, USA
www.robirahman.com

Bio

Participation
9

Data scientist working on AI governance at MIRI, previously forecasting at Epoch and the Stanford AI Index. GWWC pledge member since 2017. Formerly social chair at Harvard Effective Altruism, facilitator for Arete Fellowship, and founder of the DC Slate Star Codex meetup.

Comments
273

consider this story. You’re a fairly thoughtful longtermist who one day comes across the idea of cluelessness. You’re skeptical, but you find yourself grudgingly agreeing that predictions about the far future are mostly made-up. Claims like “this intervention will increase expected total welfare across the cosmos” start to seem fake. Still, surely some actions have better or worse consequences than others? Soon enough you wonder if that thought is fake too. You realize that even, say, kicking a puppy could prevent human extinction. Maybe this act of pointless cruelty sparks moral outrage in some onlooker, galvanizing them to work a bit harder at their AI safety job. Yes, this possibility is clearly far-fetched. And you can easily imagine stories pushing in the opposite direction. But you can’t come up with a good argument that the expected long-run effects point one way or the other, or that they precisely cancel out. (That is, your cluelessness is “complex”, not “simple”.[1]) The sign of the EV of kicking the puppy seems to be: shrug. You walk away feeling like this reasoning is too clever by half, even if you can’t say where exactly it went wrong. So you decide to file cluelessness under “yeah I get the arguments, but this is a bridge too far”. And you go back to thinking about how to solve AI alignment.

More generally, here’s what this tale illustrates. As argued in Mogensen (2020) and this sequence, impartial consequentialists[2] can’t say whether any intervention has higher or lower expected value than inaction.

I hope I'm not strawmanning cluelessness advocates, but they seem to constantly make this basic error. I haven't seen this point accounted for anywhere but if it is, perhaps you can point me to it.

The sign of the EV of kicking the puppy seems to be: shrug.

...no, it's not? The EV of kicking the puppy is still a small negative amount, after incorporating all of the cluelessness-possibilities described above.

If the expected long-run effects precisely cancel out, then you shouldn't kick the puppy, same as you already wouldn't before accounting for highly uncertain long-term effects.

If the expected long-run effects don't precisely cancel out, then you incorporate them into the EV calculation and act based on that.

Cluelessness doesn't change the EV of kicking the puppy from -5 to 0. It changes the EV of the puppy from -5 to -5 ± 1000000. But it doesn't move puppy-kicking up or down in the rank-ordering of possible actions you could take. (You could argue for risk aversion and say this favors neartermist work, but that's a very different claim from the radical cluelessness stuff.)

Do cluelessness advocates have a response to this?

"managers' time is also scarce and high opportunity cost" doesn't seem like enough in my opinion to warrant "millions will just sit in an account for another year"

Yeah, if the problem is scarcity of manager time then @lukeprog should be hiring more managers so the managers can manage more grantmakers.

comparing cluelessness world preferences with non-cluelessness world preferences

To decide whether you should act based on your cluelessness world preferences or your non-cluelessness world preferences, you'd have to use evidence (either empirical or theoretical) for which world you're in.

But an ordinary consequentialist decision-making process would already take this evidence into account. (I think cluelessness-believers sometimes strawman cluelessness-skeptics and assume that they haven't thought of this or wouldn't update on such evidence.)

I think a more interesting crux is that cluelessness advocates believe that cluelessness undermines consequentialism while leaving deontology or virtue ethics relatively unscathed, but it seems to me that all the cluelessness arguments against consequentialism also undermine other decision-making systems as well.

I disagree that you're justified in doing that

Even if it's true that it's unjustified, that's not relevant. If the decision-making process can't justify any actions, you can do whatever you want. Might as well act as if it can and do whatever I would have done anyway. (It would be less justified, but... shrug)

I'd be interested in reading the case for it

Conveniently, it's not necessary to justify it and try to increase anyone's credence that it's true, because the alternative doesn't affect what decision they should make, so they should behave as if it's true regardless of the probability, anywhere from 0 to 100%.

Yes, I understand that you're saying that's a problem that you can't assign a probability-weighted expectation. I'm pointing out that if you can't do that, then cluelessness doesn't support any claims that any other action (besides the highest-EV action based on the premise that you can assign a non-arbitrary probability-weighted expectation) is higher-EV, so you should do the previously-chosen highest-EV action anyway. Therefore cluelessness is irrelevant.

But it's not answered there. If you're pointing at the thing about how UEV isn't a point estimate but a range, that's still irrelevant: even if true, the preferred action is still decided by the scalar value that comes from taking your probability-weighted expectation of that range.

Are you assuming that the previously preferred action would still have some normative force behind it?

Yes.

If cluelessness is false, then you have some preferred action.

If cluelessness is true, then your preferred action is unjustified.

There's some level of credence in cluelessness. The worst case for your otherwise-preferred action is when p(cluelessness)=100%. In that case it doesn't matter what you do.

So you might as well do whatever you were going to do anyway.

There's no case where cluelessness is relevant to your decision process.

Cluelessness makes the belief that the preferred action has good consequences weaker, but it doesn't raise any other action above the otherwise-preferred action in terms of expected consequences. So it's irrelevant to decision-making, and you should still do whatever you were going to do anyway.

Let me know if you have any arguments indicating otherwise. I was hoping this post would provide some.

It doesn’t follow from “we don’t know the net direction of the consequences we’re unaware of” that we should regard the positives and negatives as precisely symmetric. One reason symmetry is implausible: If we become aware of a new possible consequence, this should update our beliefs about the others we’re unaware of, breaking the symmetry.

That's great news. You're saying that we have information about which direction our estimates are biased? Then cluelessness is false! We should update our estimates in the direction indicated by the information we become aware of, making a new, more accurate estimate, and act upon that.

But that's false. If cluelessness is true, then you can't tell what you should do in that case. But you still know what you should do in the case where cluelessness is false (at least, as well as you did before hearing the cluelessness argument) so you should do whatever you were going to do anyway. So the justified action is still the same as before, but with lower confidence.

This wouln't be the case if you're 100% certain that cluelessness is true, but if cluelessness is true then it's not justified to be 100% confident that cluelessness is true.

Load more