D

DanielFilan

1084 karmaJoined Oct 2014

Comments
103

we don't use superintelligent singletons and probably won't, I hope. We instead create context limited model instances of a larger model and tell it only about our task and the model doesn't retain information.

FYI, current cutting-edge large language models are trained on a massive amount of text on the internet (in the case of GPT-4, likely approximately all the text OpenAI could get their hands on). So they certainly have tons of information about stuff other than the task at hand.

I asked Alex "no chance you can comment on whether you think assistance games are mostly irrelevant to modern deep learning?"

His response was "i think it's mostly irrelevant, yeah, with moderate confidence". He then told me he'd lost his EA forum credentials and said I should feel free to cross-post his message here.

(For what it's worth, as people may have guessed, I disagree with him - I think you can totally do CIRL-type stuff with modern deep learning, to the extent you can do anything with modern deep learning.)

The core argument of Nick Bostrom’s bestselling book Superintelligence has also aged quite poorly: In brief, the book mostly assumed we will manually program a set of values into an AGI, and argued that since human values are complex, our value specification will likely be wrong, and will cause a catastrophe when optimized by a superintelligence. But most researchers now recognize that this argument is not applicable to modern ML systems which learn values, along with everything else, from vast amounts of human-generated data.

For what it's worth, the book does discuss value learning as a way of an AI acquiring values - you can see chapter 13 as being basically about this.

I would describe the core argument of the book as the following (going off of my notes of chapter 8, "Is the default outcome doom?"):

  • It is possible to build AI that's much smarter than humans.
  • This process could loop in on itself, leading to takeoff that could be slow or fast.
  • A superintelligence could gain a decisive strategic advantage and form a singleton.
  • Due to the orthogonality thesis, this superintelligence would not necessarily be aligned with human interests.
  • Due to instrumental convergence, an unaligned superintelligence would likely take over the world.
  • Because of the possibility of a treacherous turn, we cannot reliably check the safety of an AI on a training set.

There are things to complain about in this argument (a lot of "could"s that don't necessarily cash out to high probabilities), but I don't think it (or the book) assumes that we will manually program a set of values into an AGI.

Stuart Russell’s “assistance game” research agenda, started in 2016, is now widely seen as mostly irrelevant to modern deep learning— see former student Rohin Shah’s review here, as well as Alex Turner’s comments here.

The second link just takes me to Alex Turner's shortform page on LW, where ctrl+f-ing "assistance" doesn't get me any results. I do find this comment when searching for "CIRL", which criticizes the CIRL/assistance games research program, but does not claim that it is irrelevant to modern deep learning. For what it's worth, I think it's plausible that Alex Turner thinks that assistance games is mostly irrelevant to modern deep learning (and plausible that he doesn't think that) - I merely object that the link provided doesn't provide good evidence of that claim.

The first link is to Rohin Shah's reviews of Human Compatible and some assistance games / CIRL research papers. ctrl+f-ing "deep" gets me two irrelevant results, plus one description of a paper "which is inspired by [the CIRL] paper and does a similar thing with deep RL". It would be hard to write such a paper if CIRL (aka assistance games) was mostly irrelevant to modern deep learning. The closest thing I can find is in the summary of Human Compatible, which says "You might worry that the proposed solution [of making AI via CIRL / assistance games] is quite challenging: after all, it requires a shift in the entire way we do AI.". This doesn't make assistance games irrelevant to modern deep learning - in 2016, it would have been true to say that moving the main thrust of AI research to language modelling so as to produce helpful chatbots required a shift in the entire way we did AI, but research into deeply learned large language models was not irrelevant to deep learning as of 2016 - in fact, it sprung out of 2016-era deep learning.

Sorry, maybe this is addressed elsewhere, but what relationship have you had with Nonlinear?

I suppose it's relevant if you want to get a sense of the chances of ending up in a situation reminiscent of the one depicted in this post if you work for Nonlinear.

FWIW my intuition is that even if it's permissible to illegally transport life-saving medicines, you shouldn't pressure your employee to do so. Anyway I've set up a twitter poll, so we'll see what others think.

I believe there is a reasonable risk should EAs... [d]ate coworkers, especially when there is a power differential and especially when there is a direct report relationship

I think you're right that there's some risk in these situations. But also: work is one of the main places where one is able to meet people, including potential romantic partners. Norms against dating co-workers therefore seem quite costly in lost romance, which I think is a big deal! I think it's probably worth having norms against the cases you single out as especially risky, but otherwise, I'd rather our norms be laissez-faire.

For example some countries ban homosexuality, but your typical American would not consider it blameworthy to be gay.

I would object to my employer asking me to be homosexual.

Are you factoring in that CEA pays a few hundred bucks per attendee? I'd have a high-ish bar to pay that much for someone to go to a conference myself. Altho I don't have a good sense of what the marginal attendee/rejectee looks like.

Load more