1740Joined Aug 2017


Nice post. One thought on this - you wrote:

"I’d be especially excited for people to spread messages that help others understand - at a mechanistic level - how and why AI systems could end up with dangerous goals of their own, deceptive behavior, etc. I worry that by default, the concern sounds like lazy anthropomorphism (thinking of AIs just like humans)."

I agree that this seems good for avoiding the anthropomorphism (in perception and in one's own thought!) but I think it'll be important to emphasise when doing this that these are conceivable ways and ultimately possible examples rather than the whole risk-case. Why? People might otherwise think that they have solved the problem when they've ruled out or fixed that particular problematic mechanism, when really they haven't. Or when the more specific mechanistic descriptions probably end up wrong in some way, the whole case might be dismissed - when the argument for risk didn't ultimately depend on those particulars.

(this only applies if you are pretty unconfident confident in the particular mechanisms that will be risky vs. safe)

[written in my personal capacity]

[writing in my personal capacity, but asked an 80k colleague if it seemed fine for me to post this]

Thanks a lot for writing this - I agree with a lot of (most of?) of what's here.

One thing I'm a bit unsure of is the extent to which these worries have implications for the beliefs of those of us who are hovering more around 5% x-risk this century from AI, and who are one step removed from the bay area epistemic and social environment you write about. My guess is that they don't have much implication for most of us, because (though what you say is way better articulated) some of this is already naturally getting into people's estimates.

e.g. in my case, basically I think a lot of what you're writing about is sort of why for my all-things-considered beliefs I partly "defer at a discount" to people who know a ton about AI and have high x-risk estimates. Like I take their arguments, find them pretty persuasive, end up at some lower but still middlingly high probability, and then just sort of downgrade everything because of worries like the ones you cite, which I think is part of why I end up near 5%.

This kind of thing does have the problematic effect probably of incentivising the bay area folks to have more and more extreme probabilities - so that, to the extent that they care, quasi-normies like me will end up with a higher probability - closer to the truth in their view - after deferring at a discount.

Hi vmasarik,

Arden from 80k here. Yes is the general and up to date write-up covering founding a charity. I'm not sure exactly what previous page you are referring to, but it sounds to me like it will be represented by (Though might be helpful as well if you are thinking specifically of global health and development charities -- though it hasn't been updated in a while).

Answer by ArdenlkNov 29, 202232

I fulfil my gwwc pledge by donating each month to the EA funds animal welfare fund for the fund managers to distribute as they see fit. I trust them to make a better decision than I will on the individual charities' effectiveness since I don't have that much time/expertise to look into it.

I think the long-run future is incredibly important, and I spend my labour mostly on that. But my guess (though I'm pretty unsure) is that my donations do more good in animal welfare than in longtermism-focused things. Perhaps the new landscape should change that but I haven't made any updates yet.

I also admit to donating a bit extra to The Humane League because they are close to my heart and also seem really effective for animals.

I don't think about this that often, and there was part of me that didn't want to post this because it's not very rigorous! But also maybe others also feel that way and it feels honest to post. (If other people have takes on where I should donate instead though I'm open to hearing them!)

Thank you for writing this - strong +1. At 80k we are going to be thinking carefully about what this means for our career advice and our ways of communicating - how this should change things and what we should do going forward. But there’s a decent amount we still don’t know and it will also just take time to figure that all out.

It feels like we've just gotten a load of new information, and there’s probably more coming, and I am in favour of updating on things carefully.

Hey, Arden from 80k here -

It'd take more looking into stuff/thinking to talk about the other points, but I wanted to comment on something quickly: thank you for pointing out that the philosophy phd career profile and the competitiveness of the field wasn’t sufficiently highlighted on the GPR problem profile . We’ve now added a note about it in the "How to enter" section.

I wrote the career review when I'd first started at 80k, and for me it was just an oversight not to link to it and its points more prominently on the GPR problem profile.

One reason might be that this framework seems to bake totalist utilitarianism into longtermism (by considering expansion/contraction and average willbeing incrase/decrease) as the two types of longtermist progress/regress, whereas longtermism is compatible with many ethical theories?

Again there doesn’t seem to be a strong reason to think there’s an upper bound to the amount of people that could be killed in a war featuring widespread deployment of AI commanders or lethal autonomous weapons systems.[17]

So on technological grounds, at least, there seem to be no strong reasons to think that the distribution of war outcomes continues all the way to the level of human extinction.

Sounds right!

This made me realise that my post is confusing/miseadling in a particular way -- because of the context of the 80,000 Hours problem profiles page, I was thinking of the question like "what's the leftover x-risk from conflict once you aren't considering AI, bio, or nukes (since those have their own problem profiles)"? But that context is much stronger in my head than in the readers', and should be made explicit.

I guess also AI-as-a-weapon should perhaps fall into the great power conflict bucket, as it's not discussed that much in the AI profile.

Thanks for this post!

I strongly agree with this:

This seems odd to consider an ‘existential’ risk - there are many ways in which we can imagine positive or negative changes to expected future quality of life (see for example Beckstead’s idea of trajectory change). Classing low-value-but-interstellar outcomes as existential catastrophes seems unhelpful both since it introduces definitional ambiguity over how much net welfare must be lost for them to qualify, and since questions of expected future quality of life are very distinct from questions of future quantity of life, and so seem like they should be asked separately.

But I feel like it'd be more confusing at this point to start using "existential risk" to mean "extinction risk" given the body of literature that's gone in for the former?

Load More