On “first critical tries” in AI alignment

Joe_Carlsmith

Hide table of contents

Comment Permalink

SophiaAug 9 20223

tl;dr:

when effective altruism is communicated in a nuanced way, it doesn't sound weird.
rushing to the bottom line and leaving it unjustified means the person has neither a good understanding of the conclusion nor of the reasoning
I want newcomers to have a nuanced view of effective altruism
I think newcomers only understanding a rushed version of the bottom line without the reasoning is worse than them only understanding the very first step of the reasoning.
I think it's fine for people to go away with 10% of the reasoning.
I don't think it's fine for people to go away with the conclusion and 0% of the reasoning.
I want to incentivize people to communicate in a nuanced way rather than just quickly rush to a bottom line they can't justify in the time they have.
Therefore, I think "make EA ideas sound less weird" is much better than no advice.

Imagine a person at MIT comes to an EA society event and has some conversations about AI and then never comes back. Eventually, they end up working at Google and making decisions about Deepmind's strategy.

Which soundbite do I want to have given them? What is the best 1 dimensional (it's a quick conversation, we don't have time for the 2 dimensional explanation) message I could possibly leave this person with?

Option 1: "an AI might kill us all" (they think we think a Skynet scenario is really likely and we have poor reasoning skills because a war with walking robots is not that reasonable)
Option 2: "an AI system might be hard to control and because of that, some experts think it could be really dangerous" (this statement accurately applies to the "accidentally breaks the child's finger" case and also world-ending scenarios, in my mind at least, so they've fully understood my meaning even if I haven't yet managed to explain my personal bottom line)

I think they will be better primed to make good decisions about safe AI if I focus on trying to convey my reasoning before I try and communicate my conclusion. Why? My conclusion is actually not that helpful to a smart person who wants to think for themselves without all the context that makes that conclusion reasonable. If I start with my reasoning, even if I don't take this person to my bottom line, someone else down the road who believes the same thing as me can take them through the next layer up. Each layer of truth matters.

If it sounds weird, it's probably because I've not given enough context for them to understand the truth and therefore I haven't really done any good by sharing that with them (all I've done is made them think I'm unreasonable).

My guess is this person who came to one EA event and ended up being a key decisionmaker at Deepmind is going to be a lot less resistant when they hear about AI alignment in their job if they heard option 2 and not option 1. Partly because the groundwork to the ideas was better laid out. Partly because they trust the "effective altruism" brand more because they have the impression the effective altruism people, associated with "AI alignment" (an association that could stick if we keep going the way we've been going), to be full of reasonable people who think reasonable things.

What matters is whether we've conveyed useful truth, not just technically true statements. I don't want us to shy away from communicating about AI, but I do want us to shy away from communicating about AI in a confronting way that is counterproductive to giving someone a very nuanced understanding later on.

I think the advice "make AI sound less weird" is better than no advice because I think that communicating my reasoning well (which won't sound weird because I'll build it up layer by layer) is more important than communicating my current bottom line (leaving an impression of my bottom line that has none of the context attached to make it meaningful, let alone nuanced and high-fidelity) as quickly as possible.

PS: I still don't think I've actually done a good job of laying the reasoning for my views clearly here so I'm going to write a post at some point (I don't have time to fix the gaps I see now). It is helpful for you to point out gaps you see out explicitly so I can fill them in future writing if they actually can be filled (or change my mind if not).

In the meantime, I wanted to say that I've really valued this exchange. It has been very helpful for forcing me to see if I can make my intuitions/gut feeling more explicit and legible.

See in context

EA can sound less weird, if we want it to

by [anonymous]

May 25 20224 min read 35

144

CommunityBuilding effective altruismEffective altruism messaging

Frontpage

EA can sound less weird, if we want it to

“Trying to make EA seem less weird is unimportant”

“EA is inherently weird, and making our ideas seem less weird is not tractable”

AI risk

Longtermism

Non-human Welfare

Other existential risks

Conclusion

36 comments

I have previously encountered EAs who have beliefs about EA communication that seem jaded to me. These are either, “Trying to make EA seem less weird is an unimportant distraction, and we shouldn’t concern ourselves with it” or “Sounding weird is an inherent property of EA/EA cause areas, and making it seem less weird is not tractable, or at least not without compromising important aspects of the movement.” I would like to challenge both of these views.

“Trying to make EA seem less weird is unimportant”

As Peter Wildeford explains in this LessWrong post:

People take weird opinions less seriously. The absurdity heuristic is a real bias that people -- even you -- have. If an idea sounds weird to you, you're less likely to try and believe it, even if there's overwhelming evidence.

Being taken seriously is key to many EA objectives, such as growing the movement, getting mainstream researchers to care about the EA risks of their work, and having policymakers give weight to EA considerations. Sci-fi-sounding ideas make it harder to achieve these, while making it easier for critics to mischaracterize the movement and (probably) contributing to the perception that EA is cult-like.

On a more personal note, and perhaps more relevant to some of the examples I am going to mention, it is also nice for friends and family to be able to understand why we do what we do, which I don’t think is a trivial desire to have.

All things considered, I think it would be better if EA ideas did not sound weird to outsiders, and instead sounded intuitive and immediately persuasive.

“EA is inherently weird, and making our ideas seem less weird is not tractable”

I don’t think this is true, and I think this view generally comes from people who haven’t spent much time trying to think creatively about this. Some ways of framing things are more compelling than others, and this is an area where we can iterate, innovate and improve. Here are a few examples of possible ways we talk about weird EA ideas:

AI risk

Talking about other bad but less severe outcomes of AI misalignment besides paperclip maximizers and then saying “and it could even get as bad as paperclip maximizers,” requires less of a leap in imagination than opening with paperclip maximizers. It may be the case that we don’t even need to make general audiences consider paperclip maximizers at all, since the mechanisms needed to prevent them are the same as those needed to prevent the less severe and more plausible-sounding scenarios of the form "you ask an AI to do X, and the AI accomplishes X by doing Y, but Y is bad and not what you intended".

Longtermism

Due to scope insensitivity, referencing visuals to show just how much larger the future can be than the present is particularly emotionally powerful here, and makes the whole idea of working to improve the far future feel far less abstract. My favorite longtermist visualization is this one by Our World in Data, which I have saved on my phone to be able to reference it in conversations. (I think visualizations also work well to combat scope insensitivity for wild animal welfare and farmed animal welfare).

Non-human Welfare

If it is the first time someone is contemplating the idea that insects or wild animals deserve moral consideration, it makes sense to want to give them the spiel with the least probability of being mocked and dismissed. If you start to explain it by saying we should spend money to enhance the lives of insects in the wild, the idea will probably get laughed out of the room.

I think for insect welfare, the most palatable approach would be talking about the role of inhumane pesticides and other ways humans harm insects actively, making insect welfare more comparable to farmed animal welfare than wild animal welfare. Similarly, talking about helping wild animals during pandemics, famines, wildfires, etc. (problems humans also have) probably incites more compassion than talking about helping them from being chased by lions. How sensible something sounds to a layperson seems correlated with how tractable it is, so tractability can be used as a proxy for how likely an idea is to be dismissed by the person you are talking to.

The point is not to commit the motte-and-bailey fallacy (and one must be careful not to do this), but that people will be more open to contemplating your idea if you go in motte first instead of bailey first.

Other existential risks

I think the point about paperclip maximizers generalizes—it is sometimes not necessary to frame existential risks as existential risks. Most still have unusually high expected value even if they fall short of extinction, and this can be a preferable framing in some cases. Extinction-level events can be difficult to imagine and emotionally process, leading to overwhelm and inaction (see climate paralysis). We can say that serious pandemics are one of the “highest priority risks” for the international community due to their potential to kill hundreds of millions of people, and in many cases this would resonate more than the harder-to-conceive “existential risk” that could “lead to the extinction of humanity”. (Whether a problem poses extinction risk is, of course, still a relevant factor for cause prioritization.)

Also, as has been discussed many times already, longtermism is not a necessary prerequisite to care about existential risk. The expected value in the short term is enough to make people care about it, so trying to pitch existential risk through longtermism requires convincing people of an extra weird step and unnecessarily makes it less compelling to most.

Conclusion

My point is not necessarily that we should implement these specific examples, but that there are ways we can make our ideas more palatable to people. Also, there is the obvious caveat that the best way to talk about a topic will depend on your audience, but that doesn’t mean there aren’t some ways of communicating that work better most of the time, or work better for the general public.

Edit: the "AI accomplishes X by doing Y" thing I was talking about is called specification gaming, and here are some examples of it

144 Reactions

Mentioned in

172We all teach: here's how to do it better

164How EA is perceived is crucial to its future trajectory

79EA may look like a cult (and it's not just optics)

79EA movement course corrections and where you might disagree

62Monthly Overload of EA - June 2022

Load more (5/7)

More posts like this

Comments36

Sorted by

New & upvoted

Click to highlight new comments since: Today at 3:58 AM

Rohin ShahMay 26 202270

I agree with the main point that we could sound less weird if we wanted to, but it seems unlikely to me that we want that.

since the mechanisms needed to prevent them are the same as those needed to prevent the less severe and more plausible-sounding scenarios of the form "you ask an AI to do X, and the AI accomplishes X by doing Y, but Y is bad and not what you intended".

This is just not true.

If you convince someone of a different, non-weird version of AI risk, that does not then mean that they should take the actions that we take. There are lots of other things you can do to mitigate the less severe versions of AI risk:

You could create better "off-switch" policies, where you get tech companies to have less-useful but safe baseline policies that they can quickly switch to if one of their AI systems starts to behave badly (e.g. switching out a recommender system for a system that provides content chronologically).
You could campaign to have tech companies not use the kinds of AI systems subject to these risks (e.g. by getting them to ban lethal autonomous weapons).
You could switch to simpler "list of rules" based AI systems, where you can check that the algorithm the AI is using in fact seems good to you (e.g. Figure 3 here).

Most of these things are slightly helpful but overall don't have much effect on the versions of AI risk that lead to extinction.

(I expect this to generalize beyond AI risk as well and this dynamic is my main reason for continuing to give the weird version of EA ideas.)

Benjamin HiltonMay 26 202214

This does seem to be an important dynamic.

Here are a few reasons this might be wrong (both sound vaguely plausible to me):

If someone being convinced of a different non-weird version of an argument makes it easier to convince them of the actual argument, you end up with more people working on the important stuff overall.
If you can make things sound less weird without actually changing the content of what you're saying, you don't get this downside (This might be pretty hard to do though.)

(1) is particularly important if you think this "non-weird to weird" approach will appeal to a set of people who wouldn't otherwise end up agreeing with your arguments. That would mean it has a high counterfactual impact - even if some of the people do work that whilst still being good is ultimately far less relevant to x-risk reduction. This is even more true if you think there's a low rate of people who would have just listened to your weirder sounding arguments in the first place who will get "stuck" at the non-weird stuff and as a result never do useful things.

Rohin ShahMay 27 202211

I agree with both of those reasons in the abstract, and I definitely do (2) myself. I'd guess there are around 50 people total in the world who could do (2) in a way where I'd look at it and say that they succeeded (for AI risk in particular), of which I could name maybe 20 in advance. I would certainly not be telling a random EA to make our arguments sound less weird.

(EDIT: My estimate is now more like 30; I actually asked some people to do (2) for AI alignment and they did worse than I expected.)

I'd be happy about the version of (1) where the non-weird version was just an argument that people talked about, without any particular connection to EA / AI x-risk. I would not say "make EA sound less weird", I'd say "one instrumental strategy for EA is to talk about this other related stuff".

SophiaJul 22 20227

If few people actually have their own views on why AI is an important cause area to be able to translate them into plain English, then few people should be trying to convince others that AI is a big deal in a local group.

I think it is counterproductive for people who don't understand the argument they are making well enough to put the arguments into plain English to instead parrot off some jargon.

If you can't put the point you are trying to express in language the person you are talking to can understand, then there is no point talking to that person about the topic.

I agree that we shouldn't make up arguments that sound less weird that we don't think are true but I think EA could sound a lot less weird and say things we believe. Using jargon in places where there are not many people who are new to the community where everyone there understands 90% of the jargon seems fine. But technical language used in groups of people who are unlikely to understand that jargon will obviously obscure meaning, making everyone less accountable to what they are saying because less people can call them out on it. This is not good for creating a community of people with good epistemics!

It is much harder to notice that I'm parroting arguments I don't fully understand if I am using jargon because I know, at least subconsciously, that less people here can call me out on my logic not adding up.
If I say "one instrumental strategy for EA is to talk about this other related stuff" but couldn't figure out that another way to say it is, that tells me I don't actually know what point I was trying to make. For example, I could instead make the point by saying "it is useful to make EA cause areas relatable because people need something they find relatable before they start engaging deeply with it (deeply enough to end up understanding the more core reasons for thinking these causes are a big deal)."

If I can't explain what I mean by mesa-optimisation in a way that makes sense to 90% of my audience (which in local groups, is people who are relatively new to effective altruism and the whole AI thing) in the context of the point I'm trying to make, I probably don't really understand the point I'm trying to make well enough for it to be better to use the word "mesa-optimisation" over talking about something I know well enough to say it clearly and plainly so everyone listening can poke holes in it and we can have the sorts of conversations that are actually good to have in a local group (relatable, a tiny bit on the edge of the Overton Window but still understandable and well-reasoned rather than weird and obscure which can end up making conversations more didactic, alienating and preachy).

There is a difference between "can't" and "inconvenient". It is much lower effort to sound weird even if you understand something. However, I suspect that if you understand the point you are trying to make, with some effort and thought, you can find the less weird way to present it. If you can't, even after thinking about it, that makes me think that you don't know what you're trying to say (and therefore shouldn't be saying it in the first place).

Rohin ShahJul 22 20226

I strongly agree that you want to avoid EA jargon when doing outreach. Ideally you would use the jargon of your audience, though if you're talking to a broad enough audience that just means "plain English".

I disagree that "sounding weird" is the same thing (or even all that correlated with) "using jargon". For example,

When a lion kills and eats a deer in nature, the deer suffers. This is bad, and we should take action to prevent it.

This has no jargon, but still sounds weird to a ton of people. Similarly I think with AI risk the major weird part is the thing where the AI kills all the humans, which doesn't seem to me to depend much on jargon.

(If anything, I've found that with more jargon the ideas actually sound less weird. I think this is probably because the jargon obscures the meaning and so people can replace it with some different less weird meaning and assume you meant that. If you say "a goal-directed AI system may pursue some goal that we don't want leading to a catastrophic outcome" they can interpret you as saying "I'm worried about AIs mimicking human biases"; that doesn't happen when you say "an AI system may deliberately kill all the humans".)

SophiaJul 23 20223

That was a really clarifying reply!

tl;dr

I see language and framing as closely related (which is why I conflated them in my previous comment)
Using more familiar language (e.g. less unfamiliar jargon) often makes things sound less weird
I agree that weird ideas often sound weirder when you make them clearer (e.g. when you are clearer by using language the person you are talking to understands more easily)
I agree that it is better to be weird than to be misleading
However, weird ideas in plain English often sound weird because there is missing context, sounding weird does not help with giving a more accurate impression in this case
Breaking ideas into smaller pieces helps with sounding less weird while laying the groundwork to give an overall more accurate impression
Do you think a caveated version of this advice like "make EA ideas sound less weird without being misleading" is better than no advice?

I intuitively see vocabulary (jargon v. plain English) and framing as pretty closely related (there seems to almost be a continuum between "an entirely foreign language" to "extremely relatable and familiar language and framing of an idea"), so I conflated them a lot in my previous comment. They aren't the same though and it's definitely less of a red flag to me if someone can't find a relatable framing than if they can't put their point into plain English.

I think there is a clear mechanism for less jargon to make ideas sound less weird. When you find a way to phrase things in a more familiar way (with more familiar language; i.e. more familiar vocabulary, phrasing, framings, metaphors etc), the idea should sound more familiar and, therefore, less weird (so finding a more normal, jargon-free way of saying something should make it sound less weird).

However, I can also see there is also a mechanism for weird ideas to sound more weird with less jargon. If the idea is weird enough, then being more clear by using more familiar language will often make the idea sound weirder.

If there is no perfect "translation" of an idea into the familiar, then I agree that it's better to actually communicate the idea and sound weird than to sound less weird due to creating a false impression.

"an AI system may deliberately kill all the humans" sounds really weird to most people partly because most people don't have the context to make sense of the phrase (likewise with your other plain English example). That missing context, that inferential gap, is a lot like a language barrier.

The heuristic "make yourself sound less weird" seems helpful here.

I think it sounds a lot less weird to say "an AI system might be hard to control and because of that, some experts think it could be really dangerous". This doesn't mean the same thing as "an AI might kill us all", but I think it lays the groundwork to build the idea in a way that most people can then better contextualize and understand more accurately from there.

Often, the easiest way to make EA ideas sound less weird is just to break up ideas into smaller pieces. Often EA ideas sound weird because we go way too fast and try to give way more information than a person can actually process at once. This is not actually that helpful for leaving accurate impressions. They might technically know the conclusion, but without enough steps to motivate it, an isolated, uncontextualized conclusion is pretty meaningless.

I think the advice "try to sound less weird" is often a helpful heuristic. You've convinced me that a caveat to the advice is maybe needed: I'd much rather we were weird than misleading!

What do you think of a caveated version of the advice like "make EA ideas sound less weird without being misleading"? Do you think this caveated version, or something like it, is better than no advice?

Rohin ShahJul 28 20223

I mostly agree with this.

I think it sounds a lot less weird to say "an AI system might be hard to control and because of that, some experts think it could be really dangerous". This doesn't mean the same thing as "an AI might kill us all"

I think it sounds a lot less weird in large part because you aren't saying that the AI system might kill us all. "Really dangerous" could mean all sorts of things, including "the chess-playing robot mistakes a child's finger for a chess piece and accidentally breaks the child's finger". Once you pin it down to "kills all humans" it sounds a lot weirder.

I still do agree with the general point that as you explain more of your reasoning and cover more of the inferential gap, it sounds less weird.

What do you think of a caveated version of the advice like "make EA ideas sound less weird without being misleading"?

I still worry that people will not realize the ways they're being misleading -- I think they'll end up saying true but vague statements that get misinterpreted. (And I worry enough that I feel like I'd still prefer "no advice".)

SophiaAug 9 20223

tl;dr:

when effective altruism is communicated in a nuanced way, it doesn't sound weird.
rushing to the bottom line and leaving it unjustified means the person has neither a good understanding of the conclusion nor of the reasoning
I want newcomers to have a nuanced view of effective altruism
I think newcomers only understanding a rushed version of the bottom line without the reasoning is worse than them only understanding the very first step of the reasoning.
I think it's fine for people to go away with 10% of the reasoning.
I don't think it's fine for people to go away with the conclusion and 0% of the reasoning.
I want to incentivize people to communicate in a nuanced way rather than just quickly rush to a bottom line they can't justify in the time they have.
Therefore, I think "make EA ideas sound less weird" is much better than no advice.

Rohin ShahAug 11 20223

I agree that if the listener interprets "make EA sound less weird" as "communicate all of your reasoning accurately such that it leads the listener to have correct beliefs, which will also sound less weird", then that's better than no advice.

I don't think that's how the typical listener will interpret "make EA sound less weird"; I think they would instead come up with surface analogies that sound less weird but don't reflect the underlying mechanisms, which listeners might notice then leading to all the problems you describe.

I definitely don't think we should just say all of our conclusions without giving our reasoning.

(I think we mostly agree on what things are good to do and we're now hung up on this not-that-relevant question of "should we say 'make EA sound less weird'" and we probably should just drop it. I think both of us would be happier with the advice "communicate a nuanced, accurate view of EA beliefs" and that's what we should go with.)

SophiaAug 11 20221

Note: edited significantly for clarity the next day

Tl;dr: Weirdness is still a useful sign of sub-optimal community building. Legibility is the appropriate fix to weirdness.

I know I used the terms "nuanced" and "high-fidelity" first but after thinking about it a few more days, maybe "legibility" more precisely captures what we're pointing to here?

Me having the hunch that the advice "don't be weird" would lead community builders to be more legible now seems like the underlying reason I liked the advice in the first place. However, you've very much convinced me you can avoid sounding weird by just not communicating any substance. Legibility seems to capture what community builders should do when they sense they are being weird and alienating.

EA community builders probably should stop and reassess when they notice they are being weird, "weirdness" is a useful smoke alarm for a lack of legibility. They should then aim to be more legible. To be legible, they're probably strategically picking their battles on what claims they prioritize justifying to newcomers. They are legibly communicating something, but they're probably not making alienating uncontextualized claims they can't back-up in a single conversation.

They are also probably using clear language the people they're talking to can understand.

I now think the advice "make EA more legible" captures the upside without the downsides of the advice "make EA sound less weird". Does that seem right to you?

I still agree with the title of the post. I think EA could and should sound less weird by prioritizing legibility at events where newcomers are encouraged to attend.

Noticing and preventing weirdness by being more legible seems important as we get more media attention and brand lock-in over the coming years.

Rohin ShahAug 12 20222

Yeah I'm generally pretty happy with "make EA more legible".

SophiaAug 14 20221

Cool . I'm curious, how does this feeling change for you if you found out today that AI timelines are almost certainly less than a decade?

I'm curious because my intuitions change momentarily whenever a consideration pops into my head that makes me update towards AI timelines being shorter.

I think my intuitions change when I update towards shorter AI timelines because legibility/the above outlined community building strategy has a longer timeline before the payoffs. Managing reputation and goodwill seem like good strategies if we have a couple of decades or more before AGI.

If we have time, investing in goodwill and legibility to a broader range of people than the ones who end up becoming immediately highly dedicated seems way better to me.

Legible high-fidelity messages are much more spreadable than less legible messages but they still some take more time to disseminate. Why? The simple bits of it sound like platitudes. And the interesting takeaways require too many steps in logic from the platitudes to go viral.

However, word of mouth spread of legible messages that require multiple steps in logic still seem like they might spread exponentially (just with a lower growth rate than simpler viral messages).

If AI timelines are short enough, legibility wouldn't matter in those possible worlds. Therefore, if you believe timelines are extremely short then you probably don't care about legibility or reputation (and you also don't advise people to do ML PhDs because by the time they are done, it's too late).

Does that seem right to you?

Rohin ShahAug 15 20223

Idk, what are you trying to do with your illegible message?

If you're trying to get people to do technical research, then you probably just got them to work on a different version of the problem that isn't the one that actually mattered. You'd probably be better off targeting a smaller number of people with a legible message.

If you're trying to get public support for some specific regulation, then yes by all means go ahead with the illegible message (though I'd probably say the same thing even given longer timelines; you just don't get enough attention to convey the legible message).

TL;DR: Seems to depend on the action / theory of change more than timelines.

SophiaAug 9 20221

Goal of this comment:

This comment fills in more of the gaps I see that I didn't get time to fill out above. It fleshes out more of the connection between the advice "be less weird" and "communicate reasoning over conclusions".

Doing my best to be legible to the person I am talking to is, in practice, what I do to avoid coming across as weird/alienating.
there is a trade-off between contextualizing and getting to the final point
we could be in danger of never risking saying anything controversial so we do need to encourage people to still get to the bottom line after giving the context that makes it meaningful
right now, we seem to often state an insufficiently contextualized conclusion in a way that seems net negative to me
we cause bad impressions
we cause bad impressions while communicating points I see as less fundamentally important to communicate
communicating reasoning/our way of thinking seems more important than the bottom line without the reasoning
AI risk can often take more than a single conversation to contextualize well enough for it to move from a meaningless topic to an objectionable claim that can be discussed with scepticism but still some curiosity
I think we're better off trying to get community builders to be more patient and jump the gun less on the alienating bottom line
The soundbite "be less weird" probably does move us in a direction I think is net positive

I suspect that this is what most community builders will lay the groundwork to more legibly support conclusions when given advice like "get to the point if you can, don't beat around the bush, but don't be weird and jump the gun and say something without the needed context for the person you are talking to to make sense of what you are saying"

I feel like making arguments about stuff that is true is a bit like sketching out a maths proof for a maths student. Each link in the chain is obvious if you do it well, at the level of the person with whom you are taking through the proof, but if you start with the final conclusion, they are completely lost.

You have to make sure they're with you every step of the way because everyone gets stuck at a different step.

You get away with stating your conclusion without the proof in maths because there is a lot of trust that you can back up your claim (the worst thing that happens is the person you are talking to loses confidence in their ability to understand maths if you start with the conclusion before walking them through it at a pace they can follow).

We don't have that trust with newcomers until we build it. They won't suspect we're right unless we can show we're right in the conversation we made the claim in.

They'll lose trust and therefore interest very fast if we make a claim that requires at least 3 months of careful thought to come to a nuanced view on it. AI risk takes a tonne of time to develop inside views on. There is a lot of deference because it's hard to think the whole sequence through yourself for yourself and explore various objections until you feel like it's your view and not just something dictated to you. Deference is weird too (and gets a whole lot less weird when you just admit that you're deferring a bit and what exactly made you trust the person you are deferring to to come to reasonable views in the first place).

I feel like "don't sound weird" ends up translating to "don't say things you can't backup to the person you are talking to". In my mind, "don't sound weird" sounds a lot like "don't make the person you are talking to feel alienated", which in practice means "be legible to the person you are talking to".

People might say much less when they have to make the person they are talking to understand all the steps along the way, but I think that's fine. We don't need everyone to get to the bottom line. It's also often worse than neutral to communicate the bottom line without everything above it that makes it reasonable.

Ideally, community builders don't go so glacially slowly that they are at a standstill, never getting to any bottom lines that sound vaguely controversial, but while we've still got a decent mass of people who know the bottom line and enough of the reasoning paths that can take people there, it seems fine to increase the number of vague messages in order to decrease the number of negative impressions.

I still want lots of people who understand the reasoning and the current conclusions, I just don't think starting with an unmotivated conclusion is the best strategy for achieving this and I think "don't be weird" plus some other advice to stop community builders from literally stagnating and never getting to the point seems much better than the current status quo.

SophiaJul 29 20220

Fair enough.

tl;dr: I now think that EA community-builders should present ideas in a less weird way when it doesn't come at the expense of clarity, but maybe the advice "be less weird" is not good advice because it might make community-builders avoid communicating weird ideas that are worth communicating.

You probably leave some false impressions either way

In some sense (in the sense I actually care about), both statements are misleading.

I think that community builders are going to convey more information, on average, if they start with the less weird statement.

Often inferential gaps can't be crossed in a conversation.

That missing understanding will always get filled in with something inaccurate (if they had an accurate impression, then there is no inferential gap here). The question is, which misconceptions are better to leave someone with?

You've outlined how "an AI system might be hard to control and because of that, some experts think it could be really dangerous" could be misunderstood. I agree that people are unlikely to think you mean "the AI system will kill us all" without further elaboration. They will attach more accessible examples to the vaguer statement in the meantime. It is unlikely they will attach one really specific wrong example though, there is ambiguity there and the uncertainty left from that ambiguity is much better than a strongly held false impression (if those are the two choices, ideally, the inferential gap gets closed and you get a strongly held true impression).

People who are hearing the statement "the AI system will kill us all" without further context will still try and attach the most accessible examples they have to make the phrase make as much sense as possible to them. This tends to mean Skynet style walking robots. They'll also probably hypothesize that you don't have very good epistemics (even if this is not the language they'd use to describe it). They won't trust you to have good reasons to believe what you do because you've made an extraordinary claim without having laid out the case for it yet. These are false impressions too. They are also likely to stick more because the extra weirdness makes these first impressions much more memorable.

Which impression do I prefer community builders leave newcomers with?

I value community builders conveying the reasoning processes much more than the bottom line. I want newcomers to have the tools to come to reasonable conclusions for themselves (and I think giving newcomers the reasons why the EA community has its current conclusions is a good start).

Giving newcomers a more accurate impression of a conclusion without giving them much context on that conclusion seems often worse than nothing. Especially since you often lose the trust of reasonable people when you make very surprising claims and can't back them up in the conversation (because that's too much of an inferential distance to cross in the time you have).

Giving them an accurate impression of one of the reasons for a conclusion seems neutral (unaligned AI seems like it could be an x-risk because AI systems are hard to control). That isolated reason without further elaboration doesn't actually say that much, but I think it does lay the groundwork for a deeper understanding of the final conclusion "AI might kill us all" if future conversations happen down the road.

My takeaways

After this discussion, I've changed my mind on "be less weird" being the right advice to get what I want. I can see how trying to avoid being weird might make community builders avoid getting the point across.

Something like "aim to be as accurate as possible using language and examples the person you are talking to can understand" still probably will result in less weird. I'd be surprised if it resulted in community builders obscuring their point.

[This comment is no longer endorsed by its author]Reply

Lukas_GloorAug 15 20223

I liked this comment!

In particular, I think the people who are good at "not making EA seem weird" (while still communicating all the things that matter – I agree with the points Rohin is making in the thread) are also (often) the ones who have a deeper (or more "authentic") understanding of the content.

There are counterexamples, but consider, for illustration, that Yudkowsky's argument style and the topics he focuses on would seem a whole lot weirder if he wasn't skilled at explaining complex issues. So, understanding what you talk about doesn't always make your points "not weird," but it (at least) reduces weirdness significantly.

I think that's mostly beneficial and I think fewer people coming into contact with EA ideas but where they do come into contact with them, they hear them from exponents with a particularly deep, "authentic" understanding, seems like a good thing!

I think it is counterproductive for people who don't understand the argument they are making well enough to put the arguments into plain English to instead parrot off some jargon.

Instead of (just) "jargon" you could also say "talking points."

SophiaAug 18 20222

tl;dr:

I am not sure that the pressure on community builders to communicate all the things that matter is having good consequences.
This pressure makes people try to say too much, too fast.
Making too many points too fast makes reasoning less clear.
We want a community full of people who have good reasoning skills.
We therefore want to make sure community builders are demonstrating good reasoning skills to newcomers
We therefore want community builders to take the time they need to communicate the key points
This sometimes realistically means not getting to all the points that matter

I completely agree that you could replace "jargon" with "talking points".

I also agree with Rohan that it's important to not shy away from getting to the point if it is possible you can make the point in a well-reasoned way.

However, I actually think it's possibly quite important for improving the epistemics of people new to the community for there to be less pressure to communicate "all the things that matter". At least, I think there needs to be less pressure to communicate all the things that matter all at once.

The sequences are long for a reason. Legible, clear reasoning is slow. I think too much pressure to get to every bottom line in a very short time makes people skip steps. This means that not only are we not showing newcomers what good reasoning processes look like, we are going to be off-putting to people who want to think for themselves and aren't willing to make huge jumps that are missing important parts of the logic.

Pushing community builders to get to all the important key points, many bottom lines, will maybe make it hard for newcomers to feel like they have permission to think for themselves and make their own minds up. To feel rushed to a conclusion, to feel like you must come to the same conclusion as everyone else, no matter how important it is, will always make clear thinking harder.

If we want a community full of people who have good reasoning processes, we need to create environments where good reasoning processes can thrive. I think this, like most things, is a hard trade-off and requires community builders to be pretty skilled or to have much less asked of them.

If it's a choice between effective altruism societies creating environments where good reasoning processes can occur or communicating all the bottom lines that matter, I think it might be better to focus on the former. I think it makes a lot of sense to have effective altruism societies to be about exploration.

We still need people to execute. I think having AI risk specific societies, bio-risk societies, broad longtermism societies, poverty societies (and many other more conclusion focused mini-communities) might help make this less of a hard trade-off (especially as the community grows and there becomes more room for more than one effective altruism related society on any given campus). It is much less confusing to be rushed to a conclusion when that conclusion is well-labelled from the get-go (and effective altruism societies then can point interested people in the right direction to find out why certain people think certain bottom lines are sound).

Whatever the solution, I do worry rushing people to too many bottom lines too quickly does not create the community we want. I suspect we need to ask community builders to communicate less (we maybe need to triage our key points more), in order for them to communicate those key points in well-reasoned way.

Does that make sense?

Also, I'm glad you liked my comment (sorry for writing an essay objecting to a point made in passing, especially since your reply was so complementary; clearly succinctness is not my strength so perhaps other people face this trade-off much less than me :p).

SophiaJun 6 20221

I think there are trade-offs here. Ideally, I think we want community builders to prioritise high-fidelity, nuanced and precise communication over appealing to more people, but we also want community builders to prioritise broader appeal over signalling they are in the in-group or that they are intelligent (we’re all human and I think it takes conscious effort to push back against these instincts to 1) make it clear you belong in the tribe by signalling in-groupness and 2) that you possess traits that are valued by the tribe like intelligence)

SophiaJul 22 20221

Relevant example 😅🤣🤮😿

[comment deleted]May 27 20222

Deleted by Jaime Sevilla, 05/27/2022

niplavMay 25 202229

I have a slightly negative reaction to this kind of thinking.

At the limit, there is a trade-off between reporting my beliefs without having bias in the sampling (i.e. lies by omission) and trying to convince people. If I mainly talk about how recommender systems are having bad effects on the discourse landscape because they are aligned, I am filtering evidence (and therefore imposing very high epistemic costs on my discussion partner in the process!)

In the process of doing so, I would not only potentially be making the outside epistemic environment worse, but also might be damaging my own epistemics (or that of the EA community) in the process (via Elephant-in-the-brain-like dynamics or by the conjecture that if you say something long enough, you become more likely to believe it as well).

A good idea that came out of the discussion (point 3, "Bayesian Honesty") around Meta-Honesty was the heuristic that, when talking to another person, one shouldn't give information that would, in expectation, cause the other person to update in the wrong direction. I think the above proposals would sometimes skirt this line (and cross it when considering beliefs about the EA community, such as "EA mainly worries about recommender systems increasing political polarization").

Perhaps this is just a good reason for me not to be a spokesperson about AI risk (probably inappropriately married to the idea that truth is to be valued above everything else), but I wish that people will be very thoughtful around reporting misleading reasons why large parts of the EA community are extremely freaked out about AI (and not, as the examples would suggest, just a bit worried).

[anonymous]May 25 20226

This is a good point, and I thought about it when writing the post—trying to be persuasive does carry the risk of ending up flatteringly mischaracterizing things or worsening epistemics, and we must be careful not to do this. But I don't think it is doomed to happen with any attempts at being persuasive, such that we shouldn't even try! I'm sure someone smarter than me could come up with better examples than the ones I presented. (For instance, the example about using visualizations seems pretty harmless—maybe attempts to be persuasive should look more like this than the rest of the examples?)

niplavMay 25 202216

Maybe we don't just want to optimize the messaging, but the messengers: Having charismatic & likeable people talk about this stuff might be good (to what extent is this already happening? Are MacAskill & Ord as good as spokespeople as they are as researchers?).

Furthermore, taking the WaitButWhy approach, with easily understandable visualizations, sounds like a good approach, I agree.

[anonymous]May 25 20222

Oh, I like this idea! And love WaitButWhy.

PearlMay 25 202220

I agree strongly with the central thesis of this post and the suggestions are both helpful and practical. The following excerpt resonated especially strongly with me:

"Being taken seriously is key to many EA objectives, such as growing the movement, getting mainstream researchers to care about the EA risks of their work, and having policymakers give weight to EA considerations."

Indeed, EA doesn't need to become less weird in order to further its objectives, but it should be possible to develop more layperson-friendly framings to that end.

GilMay 25 202214

It's kind of funny for me to hear about people arguing that weirdness is a necessary part of EA. To me, EA concepts are so blindingly straightforward ("we should try to do as much good with donations as possible", "long-term impacts are more important than short-term impacts", "even things that have a small probability of happening are worth tackling if they are impactful enough") that you have to actively modify your rhetoric to make them seem weird.

Strongly agree with all of the points you brought up - especially on AI Safety. I was quite skeptical for a while until someone gave me an example of AI risk that didn't sound like it was exaggerated for effect, to which my immediate reaction was "Yeah, that seems... really scarily plausible".

[anonymous]May 25 202231

It seems like there are certain principles that have a 'soft' and a 'hard' version - you list a few here. The soft ones are slightly fuzzy concepts that aren't objectionable, and the hard ones are some of the tricky outcomes you come to if you push them. Taking a couple of your examples:

Soft: We should try to do as much good with donations as possible

Hard: We will sometimes guide time and money away from things that are really quite important, because they're not the most important

Soft: Long-term impacts are more important than short-term impacts

Hard: We may pass up interventions with known and high visible short-term benefits in favour of those with long-term impacts that may not be immediately obvious

This may seem obvious, but to people who aren't familiar, leading with the soft ones on the basis that the hard ones will come up soon enough if someone is interested or does their research will be more effective in giving a positive impression than jumping straight to the hard stuff. But I see a lot more jumping than would be justified. I can see why, but if you were trying to persuade someone to join or have a good opinion of your political party, would you lead with 'we should invest in public services' or 'you should pay more taxes'?

Amber DawnMay 25 202212

Strong agree.

I've seen some discourse on Twitter along the lines of "EA's critics never seem to actually understand what we actually believe!" In some ways, this is better than critics understanding EA well and strongly opposing the movement anyway! But it does suggest to me that EA has a problem with messaging, and part of this might be that some EAs are more concerned with making technically-defensible and reasonable statements - which, to be clear, is important! - than with meeting non-EAs (or not-yet-EAs) where they're at and empathizing with how weird some EA ideas seem at first glance.

Rachel ShuMay 26 20225

I feel like a lot of the ideas aren't really perceived as that weird, when I've discussed EA in intellectual circles unfamiliar with the concept? "Charity should first go to the most needy" is something most people espouse, even if they don't actually put it into action. A lot of my liberal friends are vegetarian or vegan for one reason or another and have strong opinions on animal abuse. The single most common complaint about politics is that it focuses too much on short-term incentives instead of long-term issues. That covers the top three; AI takeover? The only socially weird thing is how seriously the EA takes it, but everyone has an idea of what AI takeover might look like. Many people disagree with EAs, but not more than people disagree with, say, climate change activists.

Marcel DMay 25 20223

It may be the case that we don’t even need to make general audiences consider paperclip maximizers at all, since the mechanisms needed to prevent them are the same as those needed to prevent the less severe and more plausible-sounding scenarios

I’m somewhat unsure what exactly you meant by this, but if your point is “solutions to near-term AI concerns like bias and unexpected failures will also provide solutions to long-term concerns about AI alignment,” that viewpoint is commonly disputed by AI safety experts.

[anonymous]May 25 20227

No, that's not what I mean. I mean we should use other examples of the form "you ask an AI to do X, and the AI accomplishes X by doing Y, but Y is bad and not what you intended" where Y is not as bad as an extinction event.

Marcel DMay 25 20221

I understand—and agree with—the overall point being made about “don’t just talk about the extreme things like paperclip maximizers”, but I’m still thrown off by the statement that “the mechanisms needed to prevent [paperclip maximizers] are the same as those needed to prevent the less severe and more plausible-sounding scenarios”

[anonymous]May 25 20221

Hm, yeah, I see where you're coming from. Changed the phrasing.

ludwigbaldJun 1 20223

I don't think EA should be weird. All we're doing is filling the gaps to make sure everyone is taken care of. And of course we do it cost-effectively! Most people I talk to find that reasonable.

EA's top cause area are neglected by others. By definition, they are unpopular and unusual. When discovering a new opportunity, we should therefore expect it to be weird. However, as more and more people interact with the new area, it becomes less and less weird to them. Ideally, it ends up no longer neglected, totally main stream and EA can incubate the next weird thing.

Weirdness is not an inherent property of the thing, it's a property of the relationship between the thing and its observer.

ChanaMessingerMay 25 20222

I appreciate you thinking through ideas of presentation to new people! I've also spent some time thinking about how to make things not seem as weird, and when that's useful.

One thought is had is that, while it's true that pandemics are really bad and don't need to be described as an existential risk for that to be true, it feels like it relies strongly on other people thinking "what's an actual existential risk" and then back generating reasons why those things are also bad separate from that. I think there are costs and benefits to that dual step process, but one cost is that we lose the focus on the actual discerning principle that generates good ideas, which seems more important to me than communicating those good ideas well (though I'd be really sad if we failed at getting a bunch of CS students thinking about AI Safety only because of framings).

[anonymous]May 25 20221

Yes, this is true and very important. We should by no means lose sight of existential risks as a discerning principle! I think the best framing to use will vary a lot case-by-case, and often the one you outline will be the better option. Thanks for the feedback!