Owen Cotton-Barratt

9073 karmaJoined


Reflection as a strategic goal
On Wholesomeness
Everyday Longermism


Topic contributions

Thanks, that helped me sharpen my intuitions about what triggers the "appalled" reaction.

I think I'm still left with: People may very reasonably say that fraud in the service of effective altruism is appalling. Then it's pretty normal and understandable (even if by my lights unreasonable) to label as "appalling" things which you think will predictably lead others to appalling action.

I can see where you're coming from, but I'm not sure I agree. People would be appalled by restaurant staff not washing their hands after going to the toilet, and I think this is because it's instrumentally bad (in an uncooperative way + may make people ill) rather than because it's extreme vice.

Mmm, while I can understand "appalling" or "deeply appalling", I don't think "inherently appalling" makes sense to me (at least coming from philosophers, who should be careful about their language use). I guess you didn't use that phrase in the original post and now I'm wondering if it's a precise quote.

(I'd also missed the fact that these were philosophical judgements, which makes me think it's reasonable to hold them to higher standards than otherwise.)

I really like the idea of "a beneficentric virtue ethicist who takes scope-sensitive impartial benevolence to be the central virtue", and feel that something approximating this would be a plausible recommendation of utilitarianism for the heuristics people should actually use to act. (For this purpose, it obviously wouldn't work to include the parenthetical "(or even only)".)

However, I'm confused by your confusion at people being appalled by utilitarianism. It seems to me that the heart of it is that utilitarianism, combined with poor choices in instrumental rationality, can lead to people doing really appalling things. Philosophically, you may reasonably object that this is a failure of instrumental rationality, not of utilitarianism. But humans are notoriously bad at instrumental rationality! From a consequentialist perspective it's a pretty big negative to recommend something which, when combined with normal human levels of failure at instrumental rationality, can lead to catastrophic failures. It could be that it's still, overall, a good thing to recommend, but I'd certainly feel happier if people doing so (unless they're explicitly engaged just in a philosophical truth-seeking exercise, and not concerned with consequences) would recognise and address this issue.

I can't speak for Elizabeth, but I also find that that paragraph feels off, for reasons something like:

  1. Conflation of "counterfactual money to high-impact charities" with "your impact"
    • Maybe even if it's counterfactually moved, you don't get to count all the impact from it as your impact, since to avoid double-counting and impact ponzi schemes it's maybe important to take a "share-of-the-pie" approach to thinking about your impact (here's my take on that general question), and presumably they get a lot of the credit for their giving
    • Plus, maybe you do things which are importantly valuable that aren't about your pledge! It's at least a plausible reading (though it's ambiguous) that "double your impact" would be taken as "double your lifetime impact"
  2. As well as sharing credit for their donations with them, you maybe need to share credit for having nudged them to make the pledge with other folks (including but not limited to GWWC)
  3. As you say, their donations may not be counterfactual even in the short-term
    • Even if a good fraction of them are maybe from outside the community, that's still a fraction by which it reduces expected impact
    • Although on average I think it's likely very good, I'm sure in some cases the EA push towards a few charities that have been verified as highly effective actually does harm by pulling people to give to those over some other charities which were in fact even more effective (but illegibly so)
  4. Man, long-term counterfactuals are hard
    • Maybe GWWC/EA ends up growing a lot further, so that it reaches effective saturation among ~all relevant audiences
    • In that world, if someone was open to taking the GWWC pledge, they'd likely do it eventually, even if they are currently not at all connected to the community

Now, none of these points are blatant errors, or make me want to say "what were you thinking?!?". But I feel taken together the picture is that in fact there's a lot of complexity to the question of how impact should be counted in that case, and the text doesn't help the reader to understand that there's a lot of complexity or how to navigate thinking about it, but instead cheerfully presents the most favourable possible interpretation. It just has a bit of a vibe of slightly-underhand sales tactics, or something?

While not providing anything like a solution to the central issue here, I want to note that it looks likely to be the middle classes that get hollowed out first -- human labour to do all kinds of physical tasks is likely to be valued for longer than various kinds of desk-based tasks, because scaling up and deploying robotics to replace them would take significant time, whereas scaling up the automation of desk-based tasks can be relatively quick.

Thanks for exploring this, I found it quite interesting. 

I'm worried that casual readers might come away with the impression "these dynamics of compensation for safety work being a big deal obviously apply to AI risk". But I think this is unclear, because we may not have the key property  (that you call assumption (b)). 

Intuitively I'd describe this property as "meaningful restraint", i.e. people are holding back a lot from what they might achieve if they weren't worried about safety. I don't think this is happening in the world at the moment. It seems plausible that it will never happen -- i.e. the world will be approximately full steam ahead until it gets death or glory. In this case there is no compensation effect, and safety work is purely good in the straightforward way.

To spell out the scenario in which safety work now could be bad because of risk compensation: perhaps in the future everyone is meaningfully restrained, but if there's been more work on how to build things safely done ahead of time, they're less worried so less restrained. I think this is a realistic possibility. But I think that this world is made much safer by less variance in the models of different actors about how much risk there is, in order to avoid having the actor who is an outlier in not expecting risk being the one to press ahead. Relatedly, I think we're much more likely to reach such a scenario if many people have got on a similar page about the levels of risk. But I think that a lot of "technical safety" work at the moment (and certainly not just "evals") is importantly valuable for helping people to build common pictures of the character of risk, and how high risk levels are with various degrees of safety measure. So a lot of what people think of as safety work actually looks good even in exactly the scenario where we might get >100% risk compensation.

All of this isn't to say "risk compensation shouldn't be a concern", but more like "I think we're going to have to model this in more granularity to get a sense of when it might or might not be a concern for the particular case of technical AI safety work".

A small point of confusion: taking U(C) = C (+ a constant) by appropriate parametrization of C is an interesting move. I'm not totally sure what to think of it; I can see that it helps here, but it makes it seem quite hard work to develop good intuitions about the shape of P. But the one clear intuition I have about the shape of P is that there should be some C>0 where P is 0, regardless of S, because there are clearly some useful applications of AI which pose no threat of existential catastrophe. But your baseline functional form for P excludes this possibility. I'm not sure how much this matters, because as you say the conclusions extend to a much broader class of possible functions (not all of which exclude this kind of shape), but the tension makes me want to check I'm not missing something?

Maybe? It seems a bit extreme for that; I think 5/6 of the "disagree" votes came in over a period of an hour or two mid-evening UK time. But it could certainly just be coincidence, or a group of people happening to discuss it and all disagree, or something.

OK actually there's been a funny voting pattern on my top-level comment here, where I mostly got a bunch of upvotes and agree-votes, and then a whole lot of downvotes and disagree-votes in one cluster, and then mostly upvotes and agree-votes since then. Given the context, I feel like I should be more open than usual to a "shenanigans" hypothesis, which feels like it would be modest supporting evidence for the original conclusion.

Anyone with genuine disagreement -- sorry if I'm rounding you into that group unfairly, and I'm still interested in hearing about it.

Load more