Sam Clarke

Research assistant at AI:FAR

Topic Contributions


SamClarke's Shortform

Agreed, thanks for the pushback!

SamClarke's Shortform

Ways of framing EA that (extremely anecdotally*) make it seem less ick to newcomers. These are all obvious/boring; I'm mostly recording them here for my own consolidation

  • EA as a bet on a general way of approaching how to do good, that is almost certainly wrong in at least some ways—rather than a claim that we've "figured out" how to do the most good (like, probably no one claims the latter, but sometimes newcomers tend to get this vibe). Different people in the community have different degrees of belief in the bet, and (like all bets) it can make sense to take it even if you still have a lot of uncertainty.
  • EA as about doing good on the current margin. That is, we're not trying to work out the optimal allocation of altruistic resources in general, but rather: given how the rest of the world is spending its money and time to do good, which approaches could do with more attention? Corollary: you should expect to see EA behaviour changing over time (for this and other reasons). This is a feature not a bug.
  • EA as diverse in its ways of approaching how to do good. Some people work on global health and wellbeing. Others on animal welfare. Others on risks from climate change and advanced technology.

These frames can also apply to any specific cause area.

*like, I remember talking to a few people who became more sympathetic when I used these frames.

How I Formed My Own Views About AI Safety

I'm still confused about the distinction you have in mind between inside view and independent impression (which also have the property that they feel true to me)?

Or do you have no distinction in mind, but just think that the phrase "inside view" captures the sentiment better?

How I Formed My Own Views About AI Safety

Thanks - good points, I'm not very confident either way now

On presenting the case for AI risk

Thanks, I appreciate this post a lot!

Playing the devil's advocate for a minute, I think one main challenge to this way of presenting the case is something like "yeah, and this is exactly what you'd expect to see for a field in its early stages. Can you tell a story for how these kinds of failures end up killing literally everyone, rather than getting fixed along the way, well before they're deployed widely enough to do so?"

And there, it seems you do need to start talking about agents with misaligned goals, and the reasons to expect misalignment that we don't manage to fix?

AI Risk is like Terminator; Stop Saying it's Not

Thanks for writing this!

There are yet other views about about what exactly AI catastrophe will look like, but I think it is fair to say that the combined views of Yudkowsky and Christiano provide a fairly good representation of the field as a whole.

I disagree with this.

We ran a survey of prominent AI safety and governance researchers, where we asked them to estimate the probability of five different AI x-risk scenarios.

Arguably, the "terminator-like" scenarios are the "Superintelligence" scenario, and part 2 of "What failure looks like" (as you suggest in your post).[1]

Conditional on an x-catastrophe due to AI occurring, the median respondent gave those scenarios 10% and 12% probability (mean 16% each). The other three scenarios[2] got median 12.5%, 10% and 10% (means 18%, 17% and 15%).

So I don't think that the "field as a whole" thinks terminator-like x-risk scenarios are the most likely. Accordingly, I'd prefer if the central claim of this post was "AI risk could actually be like terminator; stop saying it's not".

  1. Part 1 of "What failure looks like" probably doesn't look that much like Terminator (disaster unfolds more slowly and is caused by AI systems just doing their jobs really well) ↩︎

  2. That is, the following three secanrios: Part 1 of "What failure looks like", existentially catastrophic AI misuse, and existentially catastrophic war between humans exacerbated by AI. See the post for full scenario descriptions. ↩︎

You are probably underestimating how good self-love can be

After practising some self-love I am now noticeably less stressed about work in general. I sleep better, have more consistent energy, enjoy having conversations about work-related stuff more (so I just talk about EA and AI risk more than I used to, which was a big win on my previous margin). I think I maybe work fewer hours than I used to because before it felt like there was a bear chasing me and if I wasn't always working then it was going to eat me, whereas now that isn't the case. But my working patterns feel healthy and sustainable now; before, I was going through cycles of half-burning out every 3 months or so (which was bad enough for my near-term productivity, not to mention long-term producitivity and health). I also spend relatively less time just turning the handle on my mainline tasks (vs zooming out, having random conversations that feel useful but won't pay off immediately, reading more widely), which again I think was a win on my previous margin (maybe reduced it from ~90% to ~80% of my research hours).

I'm confused about how this happened. My model is that before there were two parts of me that strongly disagreed about whether work is good, and that these parts have now basically resolved (they agree that doing sensible amounts of work is good), because both feel understood and loved. Basically the part that didn't think work was good just needed its needs to be understood and taken into account.

I think this model is quite different from Charlie's main model of what happens (which is to do with memory consolidation), so I'm especially confused.

I haven't attained persistent self-love of the sort described here.

You are probably underestimating how good self-love can be

I found this helpful and am excited to try it - thanks for sharing!

How I Formed My Own Views About AI Safety

Also, nitpick, but I find the "inside view" a more confusing and jargony way of just saying "independent impressions" (okay, also jargon to some extent, but closer to plain English), which also avoids the problem you point out: inside view is not the opposite of the Tetlockian sense of outside view (and the other ambiguities with outside view that another commenter pointed out).

How I Formed My Own Views About AI Safety

Nice post! I agree with ~everything here. Parts that felt particularly helpful:

  • There are even more reasons why paraphrasing is great than I thought - good reminder to be doing this more often
  • The way you put this point was v crisp and helpful: "Empirically, there’s a lot of smart people who believe different and contradictory things! It’s impossible for all of them to be right, so you must disagree with some of them. Internalising that you can do this is really important for being able to think clearly"
  • The importance of "how much feedback do they get from the world" in deferring intelligently

One thing I disagree with: the importance of forming inside views for community epistemic health. I think it's pretty important. E.g. I think that ~2 years ago, the arguments for the longterm importance of AGI safety were pretty underdeveloped; that since then lots more people have come out with their insidee views about it; and that now the arguments are in much better shape.

Load More