Hide table of contents

From our point of view, we are now in the end-game for AGI, and we (humans) are losing. When we share this with other people, they reliably get surprised. That’s why we believe it is worth writing down our beliefs on this.

1. AGI is happening soon. Significant probability of it happening in less than 5 years.

Five years ago, there were many obstacles on what we considered to be the path to AGI.

But in the last few years, we’ve gotten:

We don’t have any obstacle left in mind that we don’t expect to get overcome in more than 6 months after efforts are invested to take it down.

Forget about what the social consensus is. If you have technical understanding of current AIs, do you truly believe there are any major obstacles left? The kind of problems that AGI companies could reliably not tear down with their resources? If you do, state so in the comments, but please do not state what those obstacles are.

2. We haven’t solved AI Safety, and we don’t have much time left.

We are very close to AGI. But how good are we at safety right now? Well.

No one knows how to get LLMs to be truthful. LLMs make things up, constantly. It is really hard to get them not to do this, and we don’t know how to do this at scale.

Optimizers quite often break their setup in unexpected ways. There have been quite a few examples of this. But in brief, the lessons we have learned are:

  • Optimizers can yield unexpected results
  • Those results can be very weird (like breaking the simulation environment)
  • Yet very few extrapolate from this and find these as worrying signs

No one understands how large models make their decisions. Interpretability is extremely nascent, and mostly empirical. In practice, we are still completely in the dark about nearly all decisions taken by large models.

RLHF and Fine-Tuning have not worked well so far. Models are often unhelpful, untruthful, inconsistent, in many ways that had been theorized in the past. We also witness goal misspecification, misalignment, etc. Worse than this, as models become more powerful, we expect more egregious instances of misalignment, as more optimization will push for more and more extreme edge cases and pseudo-adversarial examples.

No one knows how to predict AI capabilities. No one predicted the many capabilities of GPT3. We only discovered them after the fact, while playing with the models. In some ways, we keep discovering capabilities now thanks to better interfaces and more optimization pressure by users, more than two years in. We’re seeing the same phenomenon happen with ChatGPT and the model behind Bing Chat.

We are uncertain about the true extent of the capabilities of the models we’re training, and we’ll be even more clueless about upcoming larger, more complex, more opaque models coming out of training. This has been true for a couple of years by now.

3. Racing towards AGI: Worst game of chicken ever.

The Race for powerful AGIs has already started. There already are general AIs. They just are not powerful enough yet to count as True AGIs.

Actors

Regardless of why people are doing it, they are racing for AGI. Everyone has their theses, their own beliefs about AGIs and their motivations. For instance, consider:

AdeptAI is working on giving AIs access to everything. In their introduction post, one can read “True general intelligence requires models that can not only read and write, but act in a way that is helpful to users. That’s why we’re starting Adept: we’re training a neural network to use every software tool and API in the world”, and furthermore, that they “believe this is actually the most practical and safest path to general intelligence” (emphasis ours).

DeepMind has done a lot of work on RLagents and multi-modalities. It is literally in their mission statement to “solve intelligence, developing more general and capable problem-solving systems, known as AGI”.

OpenAI has a mission statement more focused on safety: “We will attempt to directly build safe and beneficial AGI, but will also consider our mission fulfilled if our work aids others to achieve this outcome”. Unfortunately, they have also been a major kickstarter of the race with GPT3 and then ChatGPT.

(Since we started writing this post, Microsoft deployed what could be OpenAI’s GPT4 on Bing, plugged directly into the internet.)

Slowing Down the Race

There has been literally no regulation whatsoever to slow down AGI development. As far as we know, the efforts of key actors don’t go in this direction.

We don’t know of any major AI lab that has participated in slowing down AGI development, or publicly expressed interest in it.

Here are a few arguments that we have personally encountered, multiple times, for why slowing down AGI development is actually bad:

  • “AGI safety is not a big problem, we should improve technology as fast as possible for the people”
  • “Once we have stronger AIs, we can use them to work on safety. So it is better to race for stronger AIs and do safety later.”
  • “It is better for us to deploy AGI first than [authoritarian country], which would be bad.”
  • “It is better for us to have AGI first than [other organization], that is less safety minded than us.”
  • “We can’t predict the future. Possibly, it is better to not slow down AGI development, so that at some point there is naturally a big accident, and then the public and policymakers will understand that AGI safety is a big deal.”
  • “It is better to have AGI ASAP, so that we can study it longer for safety purposes, before others get it.”
  • “It is better to have AGI ASAP, so that at least it has access to fewer compute for RSI / world-takeover than in the world where it comes 10 years later.”
  • “Policymakers are clueless about this technology, so it’s impossible to slow down, they will just fail in their attempts to intervene. Engineers should remain the only ones deciding where the technology goes”

Remember that arguments are soldiers: there is a whole lot more interest in pushing for the “Racing is good” thesis than for slowing down AGI development.

Question people

We could say more. But:

  • We are not high status, “core” members of the community.
  • We work at Conjecture, so what we write should be read as biased.
  • There are expectations of privacy when people talk to us. Not complete secrecy about everything. But still, they expect that we would not directly attribute quotes to them for instance, and we will not do so without each individual’s consent.
  • We expect we could say more things that would not violate expectations of privacy (public things even!). But we expect niceness norms (that we find often detrimental and naive) and legalities (because we work at what can be seen as a competitor) would heavily punish us.

So our message is: things are worse than what is described in the post!
Don’t trust blindly, don’t assume: ask questions and reward openness.

Recommendations:

  • Question people, report their answers in your whisper networks, in your Twitter sphere or whichever other places you communicate on.
    • An example of “questioning” is asking all of the following questions:
      • Do you think we should race toward AGI? If so, why? If not, do you think we should slow down AGI? What does your organization think? What is it doing to push for capabilities and race for AGI compared to slowing down capabilities?
      • What is your alignment plan? What is your organization’s alignment plan? If you don’t know if you have one, did you ask your manager/boss/CEO what their alignment plan is?
  • Don’t substitute social fluff for information: someone being nice, friendly, or being liked by people, does not mean they have good plans, or any plans at all. The reverse also holds!
  • Gossiping and questioning people about their positions on AGI are prosocial activities!
  • Silence benefits people who lie or mislead in private, telling others what they want to hear.
  • Open Communication Norms benefit people who are consistent (not necessarily correct, or even honest, but at least consistent).

4. Conclusion

Let’s summarize our point of view:

  • AGI by default very soon: brace for impact
  • No safety solutions in sight: we have no airbag
  • Race ongoing: people are actually accelerating towards the wall

Should we just give up and die?

Nope! And not just for dignity points: there is a lot we can actually do. We are currently working on it quite directly at Conjecture.

We’re not hopeful that full alignment can be solved anytime soon, but we think that narrower sub-problems with tighter feedback loops, such as ensuring the boundedness of AI systems, are promising directions to pursue.

If you are interested in working together on this (not necessarily by becoming an employee or funding us), send an email with your bio and skills, or just a private message here.

We personally also recommend engaging with the writings of Eliezer Yudkowsky, Paul Christiano, Nate Soares, and John Wentworth. We do not endorse all of their research, but they all have tackled the problem, and made a fair share of their reasoning public. If we want to get better together, they seem like a good start.

5. Disclaimer

We acknowledge that the points above don’t go deeply into our models of why these situations are the case. Regardless, we wanted our point of view to at least be written in public.

For many readers, these problems will be obvious and require no further explanation. For others, these claims will be controversial: we’ll address some of these cruxes in detail in the future if there’s interest. 

Some of these potential cruxes include:

  • Adversarial examples are not only extreme cases, but rather they are representative of what you should expect conditioned on sufficient optimization.
  • Monitoring of increasingly advanced systems does not trivially work, since much of the cognition of advanced systems, and many of their dangerous properties, will be externalized the more they interact with the world.
  • Even perfect interpretability will not solve the problem alone: not everything is in the feed forward layer, and the more models interact with the world the truer this becomes.
  • Even with more data, RLHF and fine-tuning can’t solve alignment. These techniques don’t address deception and inner alignment, and what is natural in the RLHF ontology is not natural for humans and vice-versa.
  1. ^

    Edited to include DayDreamer, VideoDex and RT-1, h/t Alexander Kruel for these additional, better examples.

25

0
0

Reactions

0
0

More posts like this

Comments18
Sorted by Click to highlight new comments since: Today at 3:40 AM

I just want to register a meta-level disagreement with this post which is your recommendations seem like really bad epistemics. I don't think we should just heuristics and information cascade ourselves to death as a community but actually create good gears level understandings of forecasting AI progress. 

  1. You cite that AI accelerationist arguments act as soldiers but you literally are deploying arguments as soldiers in this post!
  2. You recommend terrible weird gossiping anti-agency mechanisms instead of pro-agency actions like work on safety, upskill, and field build.
  3. You make a lot of arguments in negation that feel like weird sleight of hands. For instance, you say "We don’t know of any major AI lab that has participated in slowing down AGI development, or publicly expressed interest in it" but OpenAI's charter literally has the assist clause (regardless of whether or not you believe it's a promise they will hold it exists).

To be clear I think there are good arguments for short timelines (median 5-10) but you don't actually make them here[1]. What you do instead is:

  1. Say you can express technical disagreement but not say any empirical examples/obstacles because that's infohazardous.
  2. A lot of the heuristics based arguments can't even be verified or prodded because they are "private conversations" which is I guess fine but then what do you want people to do with that?

I think people should think for themselves and engage with the arguments and models people provide for timelines and threat models but this post doesn't do that. It just directionally vibes a high p(doom) with a short timelines and tells people to panic and gossip. 

  1. ^

If you have technical understanding of current AIs, do you truly believe there are any major obstacles left? The kind of problems that AGI companies could reliably not tear down with their resources? If you do, state so in the comments

I've just completed a master's degree in ML, though not in deep learning. I'm very sure there are still major obstacles to AGI, that will not be overcome in the next 5 years nor in the next 20. Primary among them is robust handling of OOD situations.

Look at self-driving cars as an example. It was a test case for AI companies, requiring much less than AGI to succeed, and they've so far failed despite billions in investment. From hearing about a fleet of self-driving cars that would be on the market in 2021 or 2022, estimates are now leaning more towards decades from now.

I will publicly predict now that there will be no AGI in the next 20 years. I expect significant achievements will be made, but only in areas where large amounts of relevant training data exist or can be easily generated. It will also struggle to catch on in areas like healthcare where misfiring results cause large damage and lawsuits.  

I will also predict that there might be a "stall" of AI progress in a few years, once all the low-hanging fruit problems are picked off, and the remaining problems like self-driving cars aren't well suited for the current advantages of AI. 

From hearing about a fleet of self-driving cars that would be on the market in 2021 or 2022, estimates are now leaning more towards decades from now.

Aren't there self-driving cars on the road in a few cities now? (Cruise and maybe Zoox, if I recall correctly). 

just so we're clear - self driving cars are, in fact, one of the key factors pushing timelines down, and they've also done some pretty impressive work on non-killeveryone-proof safety which may be useful as hunch seeds for ainotkilleveryoneism.

they're not the only source of interesting research, though.

also, I don't think most of us who expect agi soon expect reliable agi soon. I certainly don't expect reliability to come early at all by default.

slg
1y22
7
2

This post reads like it wants to convince its readers that AGI is near/will spell doom, picking and spelling out arguments in a biased way. 

Just because many ppl on the Forum and LW (including myself) believe that AI Safety is very important and isn't given enough attention by important actors, I don't want to lower our standards for good arguments in favor of more AI Safety.

Some parts of the post that I find lacking:

 "We don’t have any obstacle left in mind that we don’t expect to get overcome in more than 6 months after efforts are invested to take it down."

I don't think more than 1/3 of ML researchers or engineers at DeepMind, OpenAI, or Anthropic would sign this statement.

"No one knows how to predict AI capabilities."

Many people are trying though (Ajeya Cotra, EpochAI), and I think these efforts aren't worthless. Maybe a different statement could be: "New AI capabilities appear discontinuously, and we have a hard time predicting such jumps. Given this larger uncertainty, we should worry more about unexpected and potentially dangerous capability increases".

"RLHF and Fine-Tuning have not worked well so far."

Not taking into account if RLHF scales (as linked, Jan Leike of OpenAI doesn't think so) and if RLHF leads to deception, from my cursory reading and experience, ChatGPT shows substantially better behavior than Bing, which might be due to the latter not using RLHF.


Overall I do agree with the article and think that recent developments have been worrying. Still, if the goal of the articles is to get independently-thinking individuals to think about working on AI Safety, I'd prefer less extremized arguments.

We personally also recommend engaging with the writings of Eliezer, Paul, Nate, and John. We do not endorse all of their research, but they all have tackled the problem, and made a fair share of their reasoning public. If we want to get better together, they seem like a good start.

 

I realize this is a cross post and your original audience might know where to find all these recommendations even without further info, but if you want new people to look into their writings, it would be better to at least use full names of the authors you recommend.

Thanks a lot and good  point, edited to include full names and links!

Eliezer Yudkowsky, Paul Christiano, Nate Soares (so8res), John Wentworth (johnswentworth).

There has been literally no regulation whatsoever to slow down AGI development

Thanks for your post; I'm sure it will be appreciated by many on this forum.

The claim that there has been literally no regulation whatsoever sounds a bit strong?

E.g. the US putting export bans on advanced chips to China? (BIS press release here, more commentary: 1, 2, 3, 4)

It looks to me like this was intended to slow down (China's) AI development, and indeed has a reasonable chance that it may slow down (overall) AI development.

(To be clear, I see this as a point of detail on one specific claim, and doesn't meaningfully detract from the overall thrust of your post)

I agree the export controls on chips to China have the effect of slowing down AGI development, but that probably wasn't the intent behind the US government's decision to do this. The putative reason is to prevent China from using them in military technology.

Thanks, great to hear you found it useful!

As you mention, the export controls are aimed at, and have the primary effect of, differentially slowing down a specific country's AI development, rather than AGI development overall.

This has a few relevant side effects, such as reduced proliferation and competition, but doesn't slow down the frontier of overall AGI development  (nor does it aim to do so).

 

Hm, I still feel as though Sanjay’s example cuts against your point somewhat. For instance, you mentioned encountering the following response: 

“It is better for us to have AGI first than [other organization], that is less safety minded than us.”

To the extent that regulations slow down potential AGI competitors in China, I’d expect stronger incentives towards safety, and a correspondingly lower chance of encountering potentially dangerous capabilities races. So, even if export bans don’t directly slow down the frontier of AI development, it seems plausible that such bans could indirectly do so (by weakening the incentives to sacrifice safety for capabilities development).

Your post + comment suggests that you nevertheless expect such regulation to have ~0 effect on AGI development races, although I’m unsure which parts of your model are driving that conclusion. I can imagine a couple of alternative pictures, with potentially different policy implications.

  • Your model could involve potential participants in AGI development races viewing themselves primarily in competition with other (e.g.) US firms. This, combined with short timelines, could lead you to expect the export ban to have ~0 effect on capabilities development.
    • On this view, you would be skeptical about the usefulness of the export ban on the basis of skepticism about China developing AGI (given your timelines), while potentially being optimistic about the counterfactual value of domestic regulation relating to chip production. 
    • If this is your model, I might start to wonder “Could the chip export ban affect the regulatory Overton Window, and increase the chance of domestic chip controls?”, in a way that makes the Chinese export ban potentially indirectly helpful for slowing down AGI. 
    • To be clear, I'm not saying the answer to my question above is "yes", only that this is one example of a question that I'd have on one reading of your model, which I wouldn't have on other readings.
  • Alternatively, your model might instead be skeptical about the importance of compute, and consequently skeptical about the value of governance regimes surrounding a wide variety of even-somewhat-quixotic-suggestions relating to domestic chip regulation.
    • I sensed that you might have a less compute-centric view based on your questions to leading AI researchers, asking if they “truly believe there are any major obstacles left” which major AI companies were unable to “tear down with their [current?] resources”. 
    • Based on that question – alongside your assigning a significant probability to <5 year timelines – I sensed that you might have a (potentially not-publicly-disclosable) impression about the current rate of algorithmic progress.[1]

I don’t want to raise overly pernickety questions, and I’m glad you’re sharing your concerns. I’m asking for more details about your underlying model because the audience here will consist of people who (despite being far more concerned about AGI than the general population) are on average far less concerned – and on average know less about the technical/governance space – than you are. If you’re skeptical about the value of extant regulation affecting AGI development, it would be helpful at least for me (and I’m guessing others?) to have a bit more detail on what’s driving that conclusion.
 

  1. ^

     I don’t mean to suggest that you couldn’t have more ‘compute-centric’ reasons for believing in short timelines, only that some your claims (+tone) updated me a bit in this direction.

... there is a lot we can actually do. We are currently working on it quite directly at Conjecture

I was hoping this post would explain how Conjecture sees its work as contributing to the overall AI alignment project, and was surprised to see that that topic isn't addressed at all. Could you speak to it?

Comment by Paul Christiano on Lesswrong:

 

""RLHF and Fine-Tuning have not worked well so far. Models are often unhelpful, untruthful, inconsistent, in many ways that had been theorized in the past. We also witness goal misspecification, misalignment, etc. Worse than this, as models become more powerful, we expect more egregious instances of misalignment, as more optimization will push for more and more extreme edge cases and pseudo-adversarial examples.""

These three links are:

  • The first is Mysteries of mode collapse, which claims that RLHF (as well as OpenAI's supervised fine-tuning on highly-rated responses) decreases entropy. This doesn't seem particularly related to any of the claims in this paragraph, and I haven't seen it explained why this is a bad thing. I asked on the post but did not get a response.
  • The second is Discovering language model behaviors with model-written evaluations and shows that Anthropic's models trained with RLHF have systematically different personalities than the pre-trained model.  I'm not exactly sure what claims you are citing, but I think you are making some really wild leaps.
  • The third is Compendium of problems with RLHF, which primarily links to the previous 2 failures and then discusses theoretical limitations.

I think these are bad citations for the claim that methods are "not working well" or that current evidence points towards trouble.

The current problems you list---"unhelpful, untruthful, and inconsistent"---don't seem like good examples to illustrate your point. These are mostly caused by models failing to correctly predict which responses a human would rate highly. That happens because models have limited capabilities and is rapidly improving as models get smarter. These are not the problems that most people in the community are worried about, and I think it's misleading to say this is what was "theorized" in the past.

I think RLHF is obviously inadequate for aligning really powerful models, both because you cannot effectively constrain a deceptively aligned model and because human evaluators will eventually not be able to understand the consequences of proposed actions. And I think it is very plausible that large language models will pose serious catastrophic risks from misalignment before they are transformative (it seems very hard to tell). But I feel like this post isn't engaging with the substance of those concerns or sensitive to the actual state of evidence about how severe the problem looks like it will be or how well existing mitigations might work.

I'm definitely not knowledgeable about AI, but my two cents is that there is a thing called the frame problem that makes AGI very hard to attain or even think about. I'm not gonna even try to exposit what that is, and that article is a bit dated, but I'd guess the problem still remains beyond anyone's comprehension.

The kind of examples people used to use to motivate frame problem stories in the days of GOFAI in the 20th century  are routinely solved by AI systems today. 

Interesting, well maybe I'm off base then.