OCB

Owen Cotton-Barratt

9713 karmaJoined

Sequences
3

Reflection as a strategic goal
On Wholesomeness
Everyday Longermism

Comments
856

Topic contributions
3

I think there's maybe a useful distinction to make between future-out-of-human-hands (what this post was about, where human incompetence no longer matters) and future-out-of-human-control (where humans can no longer in any meaningful sense choose what happens).

I'm confused about how to relate to speaking about these issues. I feel like I can speak to several but not all of the questions you raise (as well as some things you don't directly ask about). I'm not sure there's anything too surprising there, but I'd I feel generically good about the EA community having more information.

But -- this is a topic which invites drama, in a way that I fear is sometimes disproportionate. And while I'm okay (to a fault) sharing information which invites drama for me personally, I'd feel bad about potentially stirring it up for other people.

That makes me hesitant. And I'm not sure how much my speaking would really help (of course I can't speak with anything like the authority of an external investigation). So my default is not to speak, at least yet (maybe in another year or two?).

Open to hearing arguments or opinions that that's the wrong meta-level orientation. (An additional complication is that some-but-not-all of the information I have came via my role on the board of EV, which makes me think it's not properly my information to choose whether to share. But this can be regarded as a choice about the information which does feel like it's mine to choose what to do with.)

There are different reference classes we might use for "reasonable" here. I believe that paying the salary just of the researchers involved to do the key work will usually be a good amount less (but maybe not if you're having to compete with AI lab salaries?). But I think that that's not very available on the open market (i.e. for funders, who aren't putting in the management time), unless someone good happens to want to research this anyway. In the reference class of academic grants, this looks relatively normal.

It's a bit hard from the outside to be second-guessing the funders' decisions, since I don't know what information they had available. The decisions would look better the more there was a good prototype or other reason to feel confident that they'd produce a strong benchmark. It might be that it would be optimal to investigate getting less thorough work done for less money, but it's not obvious to me.

I guess this is all a roundabout way of saying "naively it seems on the high side to me, but I can totally imagine learning information such that it would seem very reasonable". 

I'm interested if you have views on how this intersects with advancing AI. It felt a little striking that it wasn't mentioned more, given that in other places 80k talks about a reasonable chance of transformative AI within a small number of decades.

(I think that accelerating AI progress could increase the risk of nuclear war, and I weakly guess that AI-linked risks might account for the majority of nuclear war risk over the next 30 years or so; but I'm conscious that I don't have a great basis for trying to think quantitatively about this, and feel very interested in others' takes.)

My thoughts:

  1. It's totally possible to have someone who does a mix of net-positive and net-harmful work, and for it to be good to fund them to do more of the net-positive work
    • In general, one might reasonably be suspicious that grants could subsidize their harmful work, or that working on harmful things could indicate poor taste which might mean they won't do a good job with the helpful stuff
      • This is more an issue if you can't tell whether they did the work you wanted, and if you have no enforcement mechanism to ensure they do
        • Benchmarks seem unusually far in the direction of "you can tell if people did a good job"
    • OTOH in principle funding them to work on positive things might end up pulling attention from the negative things
  2. However ... I think the ontology of "advancing capabilities" is too crude to be useful here
    • I think that some types of capabilities work are among the most in-expectation positive things people could be doing, and others are among the most negative
    • It's hard to tell from these descriptions where their work falls
  3. In any case, this makes me primarily want to evaluate how good it would be to have these benchmarks
    • Mostly benchmarks seem like they could be helpful in orienting the world to what's happening
      • i.e. I buy the top-level story for impact has something to it
    • Benchmarks could be harmful, via giving something for people to aim towards, and thereby accelerating research
      • I think this may be increasingly a concern as we get close to major impacts
      • But prima facie I'd expect this to be substituting for other comparably-good benchmarks for that purpose, whereas it is more just a new thing that wouldn't exist otherwise for helping people to orient
    • My gut take is mildly-sceptical it's a good grant: like if you got great benchmarks from these grants, I'd be happy; but I sort of suspect that most things that look like this turn into something kind of underwhelming, and after accounting for that I wonder if it's worthwhile
      • I can imagine having my mind changed on this point pretty easily
      • I do think there's something healthy about saying "Activity X is one the world should be doing and isn't; we're just going to fund the most serious attempts we can find to do X, and let the chips fall where they may"

Thanks Toby, this is a nice article, and I think more easily approachable than anything previous I knew on the topic.

For those interested in the topic, I wanted to add links to a couple of Paul Christiano's classic posts:

(I think this is perhaps mostly relevant for the intellectual history.)

I was just disagreeing with Habryka's first paragraph. I'd definitely want to keep content along the lines of his third paragraph (which is pretty similar to what I initially drafted).

So it may be that we just have some different object-level views here. I don't think I could stand behind the first paragraph of what you've written there. Here's a rewrite that would be palatable to me:

OpenAI is a frontier AI company, aiming to develop artificial general intelligence (AGI). We consider poor navigation of the development of AGI to be among the biggest risks to humanity's future. It is complicated to know how best to respond to this. Many thoughtful people think it would be good to pause AI development; others think that it is good to accelerate progress in the US. We think both of these positions are probably mistaken, although we wouldn't be shocked to be wrong. Overall we think that if we were able to slow down across the board that would probably be good, and that steps to improve our understanding of the technology relative to absolute progress with the technology are probably good. In contrast to most of the jobs in our job board, therefore, it is not obviously good to help OpenAI with its mission. It may be more appropriate to consider working at OpenAI as more similar to working at a large tobacco company, hoping to reduce the harm that the tobacco company causes, or leveraging this specific tobacco company's expertise with tobacco to produce more competetive and less harmful variations of tobacco products.

I want to emphasise that this difference is mostly not driven by a desire to be politically acceptable (although the inclusion/wording of the "many thoughtful people ..." clauses are a bit for reasons of trying to be courteous), but rather a desire not to give bad advice, nor to be overconfident on things.

I don't regard the norms as being about witholding negative information, but about trying to err towards presenting friendly frames while sharing what's pertinent, or something?

Honestly I'm not sure how much we really disagree here. I guess we'd have to concretely discuss wording for an org. In the case of OpenAI, I imagine it being appropriate to include some disclaimer like:

OpenAI is a frontier AI company. It has repeatedly expressed an interest in safety and has multiple safety teams. However, some people leaving the company have expressed concern that it is not on track to handle AGI safely, and that it wasn't giving its safety teams resources they had been promised. Moreover, it has a track record of putting inappropriate pressure on people leaving the company to sign non-disparagement agreements. [With links]

I largely agree with the rating-agency frame.

Hmm at some level I'm vibing with everything you're saying, but I still don't think I agree with your conclusion. Trying to figure out what's going on there.

Maybe it's something like: I think the norms prevailing in society say that in this kind of situation you should be a bit courteous in public. That doesn't mean being dishonest, but it does mean shading the views you express towards generosity, and sometimes gesturing at rather than flat expressing complaints.

With these norms, if you're blunt, you encourage people to read you as saying something worse than is true, or to read you as having an inability to act courteously. Neither of which are messages I'd be keen to send.

And I sort of think these norms are good, because they're softly de-escalatory in terms of verbal spats or ill feeling. When people feel attacked it's easy for them to be a little irrational and vilify the other side. If everyone is blunt publicly I think this can escalate minor spats into major fights.

Load more