All of Sam Clarke's Comments + Replies

(Post 6/N with some rough notes on AI governance field-building strategy. Posting here for ease of future reference, and in case anyone else thinking about similar stuff finds this helpful.)

Some heuristics for prioritising between talent pipeline interventions

Explicit backchaining is one way to do prioritisation. I sometimes forget that there are other useful heuristics, like:

  • Cheap to pilot
    • E.g. doesn't require new infrastructure or making a new hire
  • Cost is easier to estimate than benefit, so lower cost things tend to be more likely to actually happen
  • Visual
... (read more)

(Post 5/N with some rough notes on AI governance field-building strategy. Posting here for ease of future reference, and in case anyone else thinking about similar stuff finds this helpful.)

Laundry list of talent pipeline interventions

  • More AI governance groups/programs at universities
  • Run workshops on the most marginally valuable aptitudes
    • E.g. a macrostrategy workshop could look like: people tell stories about, concretely, things could go badly, and backchaining to what we should do
  • Run bootcamps on particularly important topics, e.g. compute
    • Help bring more
... (read more)

(Post 4/N with some rough notes on AI governance field-building strategy. Posting here for ease of future reference, and in case anyone else thinking about similar stuff finds this helpful.)

Some exercises for developing good judgement

I’ve spent a bit of time over the last year trying to form better judgement. Dumping some notes here on things I tried or considered trying, for future reference.

  • Jump into the mindset of “the buck stops at me” for working out whether some project takes place, as if you were the grantmaker having to make the decision. Ask yours
... (read more)
2
michel
10mo
This is a good tip! Hadn't thought of this.

(Post 3/N with some rough notes on AI governance field-building strategy. Posting here for ease of future reference, and in case anyone else thinking about similar stuff finds this helpful.)

Some hot takes on AI governance field-building strategy

  • More people should consciously upskill as ‘founders’, i.e. people who form and lead new teams/centres/etc. focused on making AI go well
    • A case for more founders: plausibly in crunch time there will be many more people/teams within labs/govs/think-tanks/etc. that will matter for how AI goes. Would be good if those tea
... (read more)

(Post 2/N with some rough notes on AI governance field-building strategy. Posting here for ease of future reference, and in case anyone else thinking about similar stuff finds this helpful.)

Misc things it seems useful to do/find out

  • To inform talent development activities: talk with relevant people who have skilled up. How did they do it? What could be replicated via talent pipeline infrastructure? Generally talk through their experience.
    • Kinds of people to prioritise: those who are doing exceptionally well; those who have grown quite recently (might have be
... (read more)

(Post 1/N with some rough notes on AI governance field-building strategy. Posting here for ease of future reference, and in case anyone else thinking about similar stuff finds this helpful.)

Some key uncertainties in AI governance field-building

According to me, these are some of the key uncertainties in AI governance field-building—questions which, if we had better answers to them, might significantly influence decisions about how field-building should be done.

How best to find/upskill more people to do policy development work?

  • I think there are three main sk
... (read more)

Thanks for this, I won't use "bet" in this context in the future

Things that surprised me about the results

  • There’s more variety than I expected in the group of people who are deferred to
    • I suspect that some of the people in the “everyone else” cluster defer to people in one of the other clusters—in which case there is more deference happening than these results suggest.
  • There were more “inside view” responses than I expected (maybe partly because people who have inside views were incentivised to respond, because it’s cool to say you have inside views or something). Might be interesting to think about whether it’s good (on
... (read more)
6
jtm
1y
Thanks for doing this survey and sharing the results, super interesting! Regarding Yes, I definitely think that there's a lot of potential for social desirability bias here! And I think this can happen even if the responses are anonymous, as people might avoid the cognitive dissonance that comes with admitting to "not having an inside view." One might even go as far as framing the results as  "Who do people claim to defer to?"

Thanks for your comment!

I doubt that it's reasonable to draw these kinds of implications from the survey results, for a few reasons:

  • respondents were very uncertain
  • there's overlap between the scenarios
  • there's no 1-1 mapping between "fields" and risk scenarios (e.g. I'd strongly bet that improved cooperation of certain kinds would make both catastrophic misalignment and war less likely) (though maybe your model tries to account for this, I didn't look at it)

A broader point: I think making importance comparisons (between interventions) on the level of abstrac... (read more)

1
NunoSempere
1y
I dislike the usage of "strongly bet" here, given that a literal bet here seems hard to arrive at. See here: <https://nunosempere.com/blog/2023/03/02/metaphorical-bets> for some background.
7
Maxime_Riche
1y
Thanks for your response!  Yet, I am still not clearly convinced that my reading doesn't make sense. Here are some comments: * "respondents were very uncertain" This seems to be, at the same time, the reason why you could want to diversify your portfolio of interventions for reducing X-risks. And the reason why someone could want to improve such estimates (of P(Nth scenario|X-risk)). But it doesn't seem to be a strong reason to discard the conclusion of the survey (It would be, if we had more reliable information elsewhere). * "there's overlap between the scenarios": I am unsure, but it seems that the overlaps are not that big overall. Especially, the overlap between {1,2,3} and {4,5} doesn't seem huge. (I also wonder if these overlaps also illustrate that you could reduce X-risks using a broader range of interventions (than just "AI alignment" and "AI governance")) 1. The “Superintelligence” scenario (Bostrom, 2014) 2. Part 2 of “What failure looks like” (Christiano, 2019) 3. Part 1 of “What failure looks like” (Christiano, 2019) 4. War (Dafoe, 2018) 5. Misuse (Karnofsky, 2016) 6. Other existential catastrophe scenarios. * "no 1-1 mapping between "fields" and risk scenarios" Sure, this would benefit from having a more precise model. * "Priority comparison of interventions is better than high-level comparisons" Right. High-level comparisons are so much cheaper to do, that it seems worth it to stay at that level for now. The point I am especially curious about is the following: - Is this survey pointing to the fact that the importance of working on "Technical AI alignment", "AI governance", "Cooperative AI" and "Misuse limitation" are all within one OOM? By importance here I mean, the importance as in the ITN framework of 80k, not the overall priority, which should include neglectedness, tractabilities and looking at object-level interventions.

In a following post, we will explore:

  1. How you could orient your career toward working on security

 

Did you end up writing this, or have a draft of it you'd be willing to share?

1
Artyom K
4mo
There is nice post about this at 80k: https://80000hours.org/career-reviews/information-security/

Will get them written up this month—sorry for the delay!

In fact, one I am writing this comment because I think this post itself endorses that framing to too great an extent.

Probably agree with you there

I do not think it is appropriate to describe this [the Uber crash] simply as an accident

Also agree with that. I wasn't trying to claim it is simply an accident—there are also structural causes (i.e. bad incentives). As I wrote:

Note that this could also be well-described as an "accident risk" (there was some incompetence on behalf of the engineers, along with the structural causes). [emphasis added]

If I... (read more)

9
David Krueger
1y
I recently learned that in law, there is a breakdown as: * Intent (~=misuse) * Oblique Intent (i.e. a known side effect) * Recklessness (known chance of side effect) * Negligence (should've known chance of side effect) * Accident (couldn't have been expected to know) This seems like a good categorization.

Unfortunately, when someone tells you "AI is N years away because XYZ technical reasons," you may think you're updating on the technical reasons, but your brain was actually just using XYZ as excuses to defer to them.

I really like this point. I'm guilty of having done something like this loads myself.

When someone gives you gears-level evidence, and you update on their opinion because of that, that still constitutes deferring. What you think of as gears-level evidence is nearly always disguised testimonial evidence. At least to some, usually damning, degree

... (read more)
5
Emrik
2y
This was badly written. I just mean that if you update on their opinion as opposed to just taking the patterns & trying to adjust for the fact that you received them through filters, is updating on testimony. I'm saying nothing special here, just that you might be tricking yourself into deferring (instead of impartially evaluating patterns) by letting the gearsy arguments woozle you. I wrote a bit about how testimonial evidence can be "filtered" in the paradox of expert opinion:

Thanks for your comment! I agree that the concept of deference used in this community is somewhat unclear, and a separate comment exchange on this post further convinced me of this. It's interesting to know how the word is used in formal epistemology.

Here is the EA Forum topic entry on epistemic deference. I think it most closely resembles your (c). I agree there's the complicated question of what your priors should be, before you do any deference, which leads to the (b) / (c) distinction.

1
aaron_mai
2y
I wonder if it would be good to create another survey to get some data not only on who people update on but also on how they update on others (regarding AGI timelines or something else). I was thinking of running a survey where I ask EAs about their prior on different claims (perhaps related to AGI development), present them with someone's probability judgements and then ask them about their posterior.  That someone could be a domain expert, non-domain expert (e.g., professor in a different field) or layperson (inside or outside EA).  At least if they have not received any evidence regarding the claim before, then there is a relatively simple and I think convincing model of how they should update: they should set their posterior odds in the claim to the product of their prior odds and someone else's odds (this is the result of this paper, see e.g. p.18). It would then be possible to compare the way people update to this rational ideal. Running such a survey doesn't seem very hard or expensive (although I don't trust my intuition here at all) and we might learn a few interesting biases in how people defer to others in the context of (say) AI forecasts.  I have a few more thoughts on exactly how to do this, but I'd be curious if you have any initial thoughts on this idea! 

Thanks for your comment!

Asking "who do you defer to?" feels like a simplification

Agreed! I'm not going to make any changes to the survey at this stage, but I like the suggestion and if I had more time I'd try to clarify things along these lines.

I like the distinction between deference to people/groups and deference to processes.

deference to good ideas

[This is a bit of a semantic point, but seems important enough to mention] I think "deference to good ideas" wouldn't count as "deference", in the way that this community has ended up using it. As per the foru... (read more)

Cool, makes sense.

The main way to answer this seems to be getting a non-self-rated measure of research skill change.

Agreed. Asking mentors seems like the easiest thing to do here, in the first instance.

Somewhat related comment: next time, I think it could be better to ask "What percentage of the value of the fellowship came from these different components?"* instead of "What do you think were the most valuable parts of the programme?". This would give a bit more fine-grained data, which could be really important.

E.g. if it's true that most of the value of ERIs comes from networking, this would suggest that people who want to scale ERIs should do pretty different things (e.g. lots of retreats optimised for networking).

*and give them several buckets to select from, e.g. <3%, 3-10%, 10-25%, etc.

2
L Rudolf L
2y
Yes, letting them specifically set a distribution, especially as this was implicitly done anyways in the data analysis, would have been better. We'd want to normalise this somehow, either by trusting and/or checking that it's a plausible distribution (i.e. sums to 1), or by just letting them rate things on a scale of 1-10 and then getting an implied "distribution" from that.

Thanks for putting this together!

I'm surprised by the combination of the following two survey results:

Fellows' estimate of how comfortable they would be pursuing a research project remains effectively constant. Many start out very comfortable with research. A few decline.

and

Networking, learning to do research, and becoming a stronger candidate for academic (but not industry) jobs top the list of what participants found most valuable about the programs. (emphasis mine)

That is: on average, fellows claim they learned to do better research, but became ... (read more)

2
L Rudolf L
2y
I agree that this is confusing. Also note: And also note that fellows consistently ranked the programs as providing on average slightly higher research skill gain than standard academic internships (average 5.7 on a 1-10 scale where 5 = standard academic internship skill gain, see ""perceived skills and skill changes" section). I can think of many possible theories, including: * fellows don't become more comfortable with research despite gaining competence at it because the competence does not lead to feeling good at research (e.g. maybe they update towards research being hard, or there is some form of Dunning-Kruger type thing here, or they already feel pretty comfortable as you mention); therefore self-rated research comfort is a bad indicator and we might instead try e.g. asking their mentors or looking at some other external metric * fellows don't actually get better at research, but still rate it as a top source of value because they want to think they did, and their comfort with research not staying the same is a more reliable indicator than them marking it as a top source of value (and also they either have a low opinion of skill gain from standard academic internships, or then haven't experienced those and are just (pessimistically)  imagining what it would be like) The main way to answer this seems to be getting a non-self-rated measure of research skill change.
1
Sam Clarke
2y
Somewhat related comment: next time, I think it could be better to ask "What percentage of the value of the fellowship came from these different components?"* instead of "What do you think were the most valuable parts of the programme?". This would give a bit more fine-grained data, which could be really important. E.g. if it's true that most of the value of ERIs comes from networking, this would suggest that people who want to scale ERIs should do pretty different things (e.g. lots of retreats optimised for networking). *and give them several buckets to select from, e.g. <3%, 3-10%, 10-25%, etc.

Re (1) See When Will AI Exceed Human Performance? Evidence from AI Experts (2016) and the 2022 updated version. These surveys don't ask about x-risk scenarios in detail, but do ask about the overall probability of very bad outcomes and other relevant factors.

Re (1) and (3), you might be interested in various bits of research that GovAI has done on the American public and AI researchers.

You also might want to get in touch with Noemi Dreksler, who is working on surveys at GovAI.

A potentially useful subsection for each perspective could be: evidence that should change your mind about how plausible this perspective is (including things you might observe over the coming years/decades). This would be kinda like the future-looking version of the "historical analogies" subsection.

2
MMMaas
2y
That's a great suggestion, I will aim to add that for each!

Another random thought: a bunch of these lessons seem like the kind of things that general writing and research coaching can teach. Maybe summer fellows and similar should be provided with that? (Freeing up time for you/other people in your reference class to play to your comparative advantage.)

(Though some of these lessons are specific to EA research and so seem harder to outsource.)

Love it, thanks for the post!

"Reading 'too much' is possibly the optimal strategy if you're mainly trying to skill up (e.g., through increased domain knowledge), rather than have direct impact now. But also bear in mind that becoming more efficient at direct impact is itself a form of skilling up, and this pushes back toward 'writing early' as the better extreme."

Two thoughts on this section:

  1. Additional (obvious) arguments for writing early: producing stuff builds career capital, and is often a better way to learn than just reading.

  2. I want to disen

... (read more)

The question was:

Assume for the purpose of this question that HLMI* will at some point exist. How positive or negative do you expect the overall impact of this to be on humanity, in the long run?

So it doesn't presuppose some agentic form of AGI—but rather asks about the same type of technology that the median respondant gave a 50% chance of arriving within 45 years.

*HLMI was defined in the survey as:

“High-level machine intelligence” (HLMI) is achieved when unaided machines can accomplish every task better and more cheaply than human workers.

Right, I just wanted to point out that the average AI researcher who dismisses AI x-risk doesn't do so because they think AGI is very unlikely. But I admit to often being confused about why they do dismiss AI x-risk.

The same survey asked AI researchers about the outcome they expect from AGI:

The median probability was 25% for a “good” outcome and 20% for an “extremely good” outcome. By contrast, the probability was 10% for a bad outcome and 5% for an outcome described as “Extremely Bad (e.g., human extinction).”

If I learned that there was some scientifi... (read more)

1
Karthik Tadepalli
2y
I don't think the answers are illuminating if the question is "conditional on AGI happening, would it be good or bad" - that doesn't yield super meaningful answers from people who believe that AGI in the agentic sense is vanishingly unlikely. Or rather it is a meaningful question, but to those people AGI occurs with near zero probability so even if it was very bad it might not be a priority.

Thanks, I agree with most of these suggestions.

"Other (AI-enabled) dangerous tech" feels to me like it clearly falls under "exacerbating other x-risk factors"

I was trying to stipulate that the dangerous tech was a source of x-risk in itself, not just a risk factor (admittedly the boundary is fuzzy). The wording was "AI leads to deployment of technology that causes extinction or unrecoverable collapse" and the examples (which could have been clearer) were intended to be "a pathogen kills everyone" or "full scale nuclear war leads to unrecoverable collapse"

they basically see AGI as very unlikely

Certainly some people you talk to in the fairness/bias crowd think AGI is very unlikely, but that's definitely not a consensus view among AI researchers. E.g. see this survey of AI researchers (at top conferences in 2015, not selecting for AI safety folk), which finds that:

Researchers believe there is a 50% chance of AI outperforming humans in all tasks in 45 years and of automating all human jobs in 120 years

1
Karthik Tadepalli
2y
That's fascinating! But I don't know if that is the same notion of AGI and AI risk that we talk about in EA. It's very possible to believe that AI will automate most jobs and still not believe that AI will become agentic/misaligned. That's the notion of AGI that I was referring to.

Thanks, I'm glad this was helpful to you!

I'm also still a bit confused about what exactly this concept refers to. Is a 'consequentialist' basically just an 'optimiser' in the sense that Yudkowsky uses in the sequences (e.g. here), that has later been refined by posts like this one (where it's called 'selection') and this one?

In other words, roughly speaking, is a system a consequentialist to the extent that it's trying to take actions that push its environment towards a certain goal state?

Found the source. There, he says that an "explicit cognitive model and explicit forecasts" about the future are necessary to true consequentialist cognition (CC). He agrees that CC is already common among optimisers (like chess engines); the dangerous kind is consequentialism over broad domains (i.e. where everything in the world is in play, is a possible means, while the chess engine only considers the set of legal moves as its domain).

"Goal-seeking" seems like the previous, less-confusing word for it, not sure why people shifted.

Another (unoriginal) way that heavy AI reg could be counterproductive for safety: AGI alignment research probably increases in productivity as you get close to AGI. So, regulation in jurisdictions with the actors who are closest to AGI (currently, US/UK) would give those actors less time to do high productivity AGI alignment research, before the 2nd place actor catches up

And within a jurisdiction, you might think that responsible actors are most likely to comply to regulation, differentially slowing them down

Agreed, thanks for the pushback!

Ways of framing EA that (extremely anecdotally*) make it seem less ick to newcomers. These are all obvious/boring; I'm mostly recording them here for my own consolidation

  • EA as a bet on a general way of approaching how to do good, that is almost certainly wrong in at least some ways—rather than a claim that we've "figured out" how to do the most good (like, probably no one claims the latter, but sometimes newcomers tend to get this vibe). Different people in the community have different degrees of belief in the bet, and (like all bets) it can make sense t
... (read more)
6
Stefan_Schubert
2y
I like the thinking in some ways, but think there are also some risks. For instance, emphasising EA being diverse in its ways of doing good could make people expect it to be more so than it actually is, which could lead to disappointment. In some ways, it could be good to be upfront with some of the less intuitive aspects of EA.

I'm still confused about the distinction you have in mind between inside view and independent impression (which also have the property that they feel true to me)?

Or do you have no distinction in mind, but just think that the phrase "inside view" captures the sentiment better?

3
Neel Nanda
2y
Inside view feels deeply emotional and tied to how I feel the world to be, independent impression feels cold and abstract

Thanks - good points, I'm not very confident either way now

Thanks, I appreciate this post a lot!

Playing the devil's advocate for a minute, I think one main challenge to this way of presenting the case is something like "yeah, and this is exactly what you'd expect to see for a field in its early stages. Can you tell a story for how these kinds of failures end up killing literally everyone, rather than getting fixed along the way, well before they're deployed widely enough to do so?"

And there, it seems you do need to start talking about agents with misaligned goals, and the reasons to expect misalignment that we don't manage to fix?

6
Aryeh Englander
2y
What I do (assuming I get to that point in the conversation) is that I deliberately mention points like this, even before trying to argue otherwise. In my experience (which again is just my experience) a good portion of the time the people I'm talking to debunk those counterarguments themselves. And if they don't, well then I can start discussing it at that point - but at that point it feels to me like I've already established credibility and non-craziness by (a) starting off with noncontroversial topics, (b) starting off the more controversial topics with arguments against taking it seriously, and (c) by drawing mostly obvious lines of reasoning from (a) to (b) to whatever conclusions they do end up reaching. So long as I don't go signaling science-fiction-geekiness too much during the conversation, it feels to me like if I end up having to make some particular arguments in the end then those become a pretty easy sell.

Thanks for writing this!

There are yet other views about about what exactly AI catastrophe will look like, but I think it is fair to say that the combined views of Yudkowsky and Christiano provide a fairly good representation of the field as a whole.

I disagree with this.

We ran a survey of prominent AI safety and governance researchers, where we asked them to estimate the probability of five different AI x-risk scenarios.

Arguably, the "terminator-like" scenarios are the "Superintelligence" scenario, and part 2 of "What failure looks like" (as you suggest... (read more)

2
skluug
2y
Thanks for reading—you’re definitely right, my claim about the representativeness of Yudkowsky & Christiano’s views was wrong. I had only a narrow segment of the field in mind when I wrote this post. Thank you for conducting this very informative survey.

After practising some self-love I am now noticeably less stressed about work in general. I sleep better, have more consistent energy, enjoy having conversations about work-related stuff more (so I just talk about EA and AI risk more than I used to, which was a big win on my previous margin). I think I maybe work fewer hours than I used to because before it felt like there was a bear chasing me and if I wasn't always working then it was going to eat me, whereas now that isn't the case. But my working patterns feel healthy and sustainable now; before, I was... (read more)

I found this helpful and am excited to try it - thanks for sharing!

Also, nitpick, but I find the "inside view" a more confusing and jargony way of just saying "independent impressions" (okay, also jargon to some extent, but closer to plain English), which also avoids the problem you point out: inside view is not the opposite of the Tetlockian sense of outside view (and the other ambiguities with outside view that another commenter pointed out).

3
Neel Nanda
2y
The complaint that it's confusing jargon is fair. Though I do think the Tetlock sense + phrase inside view captures something important - my inside view is what feels true to me, according to my personal best guess and internal impressions. Deferring doesn't feel true in the same way, it feels like I'm overriding my beliefs, not like how they world is.  This mostly comes under the motivation point - maybe, for motivation, inside views matter but independent impressions don't? And people differ on how they feel about the two?

Nice post! I agree with ~everything here. Parts that felt particularly helpful:

  • There are even more reasons why paraphrasing is great than I thought - good reminder to be doing this more often
  • The way you put this point was v crisp and helpful: "Empirically, there’s a lot of smart people who believe different and contradictory things! It’s impossible for all of them to be right, so you must disagree with some of them. Internalising that you can do this is really important for being able to think clearly"
  • The importance of "how much feedback do they get f
... (read more)
3
Neel Nanda
2y
I want to push back against this. The aggregate benefit may have been high, but when you divide it by all the people trying, I'm not convinced it's all that high. Further, that's an overestimate - the actual question is more like 'if the people who are least enthusiastic about it stop trying to form inside views, how bad is that?'. And I'd both guess that impact is fairly heavy tailed, and that the people most willing to give up are the least likely to have a major positive impact. I'm not confident in the above, but it's definitely not obvious
9
Sam Clarke
2y
Also, nitpick, but I find the "inside view" a more confusing and jargony way of just saying "independent impressions" (okay, also jargon to some extent, but closer to plain English), which also avoids the problem you point out: inside view is not the opposite of the Tetlockian sense of outside view (and the other ambiguities with outside view that another commenter pointed out).
3
MichaelA
2y
Cool, I'll shoot you an email!

Maybe this process generalises and so longtermist AI governance can learn from other communities?

In some sense, this post explains how the longtermist AI governance community is trying to go from “no one understands this issue well”, to actually improving concrete decisions that affect the issue.

It seems plausible that the process described here is pretty general (i.e. not specific to AI governance). If that’s true, then there could be opportunities for AI governance to learn from how this process has been implemented in other communities/fields and vice-versa.

1
Iyngkarran Kumar
10mo
Intuitively I feel that this process does generalise, and I would personally be really keen to read case studies of an idea/strategy that was moved from left to right in the diagram above. i.e a thinker initially identifies a problem, and over the following years or decades it moves to tactics research, then policy development, then advocacy and finally is implemented.  I doubt any idea in AI governance has gone through the full strategy-to-implementation lifecycle, but maybe one in climate change, nuclear risk management, or something else has? Would appreciate if anyone could link case studies of this sort!

Something that would improve this post but I didn’t have time for:

For each kind of work, give a sense of:

  • The amount of effort currently going into it
  • What the biggest gaps/bottlenecks/open questions are
  • What kinds of people might be well-suited to it

Thanks!

I agree with your quibble. Other than the examples you list here, I'm curious for any other favourite reports/topics in the broader space of AI governance - esp. ones that you think are at least as relevant to longtermist AI governance as the average example I give in this post?

Note: "If you want to add one or more co-authors to your post, you’ll need to contact the Forum team..." is no longer the easiest way to add co-authors, so might want to be updated accordingly.

And by the way, thanks for adding this new feature!

2
Aaron Gertler
2y
Thanks for the notice — looks like we just merged this feature from LW! Thrilled to now be removing this from the post.
Load more