Sam Clarke

1024Joined Aug 2018


Strategy @ GovAI

Views are my own


Topic Contributions

Thanks for your comment!

I doubt that it's reasonable to draw these kinds of implications from the survey results, for a few reasons:

  • respondents were very uncertain
  • there's overlap between the scenarios
  • there's no 1-1 mapping between "fields" and risk scenarios (e.g. I'd strongly bet that improved cooperation of certain kinds would make both catastrophic misalignment and war less likely) (though maybe your model tries to account for this, I didn't look at it)

A broader point: I think making importance comparisons (between interventions) on the level of abstraction of "improving cooperation", "hard AI alignment" and "partial/naive alignment" doesn't make much sense. I expect comparing specific plans/interventions to be much more useful.

In a following post, we will explore:

  1. How you could orient your career toward working on security


Did you end up writing this, or have a draft of it you'd be willing to share?

Will get them written up this month—sorry for the delay!

In fact, one I am writing this comment because I think this post itself endorses that framing to too great an extent.

Probably agree with you there

I do not think it is appropriate to describe this [the Uber crash] simply as an accident

Also agree with that. I wasn't trying to claim it is simply an accident—there are also structural causes (i.e. bad incentives). As I wrote:

Note that this could also be well-described as an "accident risk" (there was some incompetence on behalf of the engineers, along with the structural causes). [emphasis added]

If I were writing this again, I wouldn't use the word "well-described" (unclear what I actually mean; sounds like I'm making a stronger claim than I was). Maybe I'd say "can partly be described as an accident".

But today, I think this mostly just introduces unnecessary/confusing abstraction. The main important point in my head now is: when stuff goes wrong, it can be due to malintent, incompetence, or the incentives. Often it's a complicated mixture of all three. Make sure your thinking about AI risk takes that into account.

And sure, you could carve up risks into categories, where you're like:

  • if it's mostly incompetence, call it an accident
  • if it's mostly malintent, call it misuse
  • if it's mostly incentives, call it structural

But it's pretty unclear what "mostly" means, and moreover it just feels kind of unnecessary/confusing.

Unfortunately, when someone tells you "AI is N years away because XYZ technical reasons," you may think you're updating on the technical reasons, but your brain was actually just using XYZ as excuses to defer to them.

I really like this point. I'm guilty of having done something like this loads myself.

When someone gives you gears-level evidence, and you update on their opinion because of that, that still constitutes deferring. What you think of as gears-level evidence is nearly always disguised testimonial evidence. At least to some, usually damning, degree. And unless you're unusually socioepistemologically astute, you're just lost to the process.

If it's easy, could you try to put this another way? I'm having trouble making sense of what exactly you mean, and it seems like an important point if true.

Thanks for your comment! I agree that the concept of deference used in this community is somewhat unclear, and a separate comment exchange on this post further convinced me of this. It's interesting to know how the word is used in formal epistemology.

Here is the EA Forum topic entry on epistemic deference. I think it most closely resembles your (c). I agree there's the complicated question of what your priors should be, before you do any deference, which leads to the (b) / (c) distinction.

Thanks for your comment!

Asking "who do you defer to?" feels like a simplification

Agreed! I'm not going to make any changes to the survey at this stage, but I like the suggestion and if I had more time I'd try to clarify things along these lines.

I like the distinction between deference to people/groups and deference to processes.

deference to good ideas

[This is a bit of a semantic point, but seems important enough to mention] I think "deference to good ideas" wouldn't count as "deference", in the way that this community has ended up using it. As per the forum topic entry on epistemic deference:

Epistemic deference is the process of updating one's beliefs in response to what others appear to believe, even if one ignores the reasons for those beliefs or do not find those reasons persuasive. (emphasis mine)

If you find an argument persuasive and incorporate it into your views, I think that doesn't qualify as "deference". Your independent impressions don't (and in most cases won't) be the views you formed in isolation. When forming your independent impressions, you can and should take other people's arguments into account, to the extent that you find them convincing. Deference occurs when you take into account knowledge about what other people believe, and how trustworthy you find them, without engaging with their object level arguments.

non-defensible original ideas

A similar point applies to this one, I think.

(All of the above makes me think that the concept of deference is even less clear in the community than I thought it was -- thanks for making me aware of this!)

Cool, makes sense.

The main way to answer this seems to be getting a non-self-rated measure of research skill change.

Agreed. Asking mentors seems like the easiest thing to do here, in the first instance.

Somewhat related comment: next time, I think it could be better to ask "What percentage of the value of the fellowship came from these different components?"* instead of "What do you think were the most valuable parts of the programme?". This would give a bit more fine-grained data, which could be really important.

E.g. if it's true that most of the value of ERIs comes from networking, this would suggest that people who want to scale ERIs should do pretty different things (e.g. lots of retreats optimised for networking).

*and give them several buckets to select from, e.g. <3%, 3-10%, 10-25%, etc.

Thanks for putting this together!

I'm surprised by the combination of the following two survey results:

Fellows' estimate of how comfortable they would be pursuing a research project remains effectively constant. Many start out very comfortable with research. A few decline.


Networking, learning to do research, and becoming a stronger candidate for academic (but not industry) jobs top the list of what participants found most valuable about the programs. (emphasis mine)

That is: on average, fellows claim they learned to do better research, but became no more comfortable pursuing a research project.

Do you think this is mostly explained by most fellows already being pretty comfortable with research?

A scatter plot of comfort against improvement in research skill could be helpful to examine different hypotheses (though won't be possible with the current data, given how the "greatest value adds" question was phrased.

Load more