Mauricio

2489Joined Aug 2020

Comments
248

[Mostly typed this before I saw another reply which makes this less relevant.]

Thanks for adding these explanations.

research affiliates: 4

Affiliates aren't staff though, and several of them are already counted anyway under the other orgs or the "Other" section. (Note the overlap between CSER and Leverhulme folks.)

FLI: counted 5 people working on AI policy and governance.

Sure, but that's not the same as 5 non-technical safety researchers. A few of their staff are explicitly listed as e.g. advocates, rather than as researchers.

I think 5 is a good conservative estimate.

I don't think we should be looking at all their researchers. They have a section on their site that specifically lists safety-related people, and this is the section my previous comment was addressing. Counting people who aren't on that website seems like it'll get us counting non-safety-focused researchers.

There are about 45 research profile on Google Scholar with the 'AI governance' tag. I counted about 8 researchers who weren't at the other organizations listed.

Thanks for adding this, but I'm not sure about this either--as with Leverhulme, just because someone is researching AI governance doesn't mean they're a non-technical safety researcher; there's lots of problems other than safety that AI governance researchers can be interested in.

[Edit: I think the following no longer makes sense because the comment it's responding to was edited to add explanations, or maybe I had just missed those explanations in my first reading. See my other response instead.]

Thanks for this. I don't see how the new estimates incorporate the above information. (The medians for CSER, Leverhulme, and FLI seem to still be at 5 each.)

(Sorry for being a stickler here--I think it's important that readers get accurate info on how many people are working on these problems.)

Thanks for the updates!

I have it on good word that CSET has well under 10 safety-focused researchers, but fair enough if you don't want to take an internet stranger's word for things.

I'd encourage you to also re-estimate the counts for CSER, Leverhulme, and the Future of Life Institute.

  • CSER's list of team members related to AI lists many affiliates, advisors, and co-founders but only ~3 research staff.
  • The Future of Life Institute seems more focused on policy and field-building than on research; they don't even have a research section on their website. Their team page lists ~2 people as researchers.
  • Of the 5 people listed in Leverhulme's relevant page, one of them was already counted for CSER, and another one doesn't seem safety-focused.

I also think the number of "Other" is more like 4.

Thanks for the response! Maybe readers would find it helpful if the summary of your post was edited to incorporate this info, so those who don't scroll to the comments can still get our best estimate.

Thanks for posting, seems good to know these things! I think some of the numbers for non-technical research should be substantially lower--enough that an estimate of ~55 non-technical safety researchers seems more accurate:

  • CSET isn't focused on AI safety; maybe you could count a few of their researchers (rather than 10).
  • I think SERI and BERI have 0 full-time non-technical research staff (rather than 10 and 5).
  • As far as I'm aware, the Leverhulme Centre for the Future of Intelligence + CSER only have at most a few non-technical researchers in total focused on AI safety (rather than 10 & 5). Same for FLI (rather than 5).
  • I hear Epoch has ~3 FTEs (rather than 10).
  • GoodAI's research roadmap makes no mention of public/corporate policy or governance, so I'd guess they have at most a few non-technical safety-focused researchers (rather than 10).

If I didn't mess up my math, all that should shift our estimate from 93 to ~42. Adding in 8 from Rethink (going by Peter's comment) and 5 (?) from OpenPhil, we get ~55.

Thanks for posting! I'm sympathetic to the broad intuition that any one person being at the sweet spot where they make a decisive impact seems unlikely , but I'm not sold on most of the specific arguments given here.

Recall that there are decent reasons to think goal alignment is impossible - in other words, it's not a priori obvious that there's any way to declare a goal and have some other agent pursue that goal exactly as you mean it.

I don't see why this is the relevant standard. "Just" avoiding egregiously unintended behavior seems sufficient for avoiding the worst accidents (and is clearly possible, since humans do it often).

Also, I don't think I've heard these decent reasons--what are they?

Recall that engineering ideas very, very rarely work on the first try, and that if we only have one chance at anything, failure is very likely.

It's also unclear that we only have one chance at this. Optimistically (but not that optimistically?), incremental progress and failsafes can allow for effectively multiple chances. (The main argument against seems to involve assumptions of very discontinuous or abrupt AI progress, but I haven't seen very strong arguments for expecting that.)

Recall that getting "humanity" to agree on a good spec for ethical behavior is extremely difficult: some places are against gene drives to reduce mosquito populations, for example, despite this saving many lives in expectation.

Agree, but also unclear why this is the relevant standard. A smaller set of actors agreeing on a more limited goal might be enough to help.

Recall that there is a gigantic economic incentive to keep pushing AI capabilities up, and referenda to reduce animal suffering in exchange for more expensive meat tend to fail.

Yup, though we should make sure not to double-count this, since this point was also included earlier (which isn't to say you're necessarily double-counting).

Recall that we have to implement any solution in a way that appeals to the cultural sensibilities of all major and technically savvy governments on the planet, plus major tech companies, plus, under certain circumstances, idiosyncratic ultra-talented individual hackers.

This also seems like an unnecessarily high standard, since regulations have been passed and enforced before without unanimous support from affected companies.

Also, getting acceptance from all major governments does seem very hard but not quite as hard as the above quotes makes it sound. After all, many major governments (developed Western ones) have relatively similar cultural sensibilities, and ambitious efforts to prevent unilateral actions have previously gotten very broad acceptance (e.g. many actors could have made and launched nukes, done large-scale human germline editing, or maybe done large-scale climate engineering, but to my knowledge none of those have happened).

The we-only-get-one-shot idea applies on this stage too.

Yup, though this is also potential double-counting.

+1 on this being a relevant intuition. I'm not sure how limited these scenarios are - aren't information asymmetries and commitment problems really common?

Ah sorry, I had totally misunderstood your previous comment. (I had interpreted "multiply" very differently.) With that context, I retract my last response.

By "satisfaction" I meant high performance on its mesa-objective (insofar as it has one), though I suspect our different intuitions come from elsewhere.

it should robustly include "building copy of itself"

I think I'm still skeptical on two points:

  • Whether this is significantly easier than other complex goals
    • (The "robustly" part seems hard.)
  • Whether this actually leads to a near-best outcome according to total preference utilitarianism
    • If satisfying some goals is cheaper than satisfying others to the same extent, then the details of the goal matter a lot
      • As a kind of silly example, "maximize silicon & build copies of self" might be much easier to satisfy than "maximize paperclips & build copies of self." If so, a (total) preference utilitarian would consider it very important that agents have the former goal rather than the latter.

getting the "multiply" part right is sufficient, AI will take care of the "satisfaction" part on its own

I'm struggling to articulate how confused this seems in the context of machine learning. (I think my first objection is something like: the way in which "multiply" could be specified and the way in which an AI system pursues satisfaction are very different; one could be an aspect of the AI's training process, while another is an aspect of the AI's behavior. So even if these two concepts each describe aspects of the AI system's objectives/behavior, that doesn't mean its goal is to "multiply satisfaction." That's sort of like arguing that a sink gets built to be sturdy, and it gives people water, therefore it gives people sturdy water--we can't just mash together related concepts and assume our claims about them will be right.)

(If you're not yet familiar with the basics of machine learning and this distinction, I think that could be helpful context.)

[This comment is no longer endorsed by its author]Reply

I can't, but I'm not sure I see your point?

Load More