Hide table of contents

A common theme implicit in many AI risk stories has been that broader society will either fail to anticipate the risks of AI until it is too late, or do little to address those risks in a serious manner. In my opinion, there are now clear signs that this assumption is false, and that society will address AI with something approaching both the attention and diligence it deserves. For example, one clear sign is Joe Biden's recent executive order on AI safety[1]. In light of recent news, it is worth comprehensively re-evaluating which sub-problems of AI risk are likely to be solved without further intervention from the AI risk community (e.g. perhaps deceptive alignment), and which ones will require more attention.

While I'm not saying we should now sit back and relax, I think recent evidence has significant implications for designing effective strategies to address AI risk. Since I think substantial AI regulation will likely occur by default, I urge effective altruists to focus more on ensuring that the regulation is thoughtful and well-targeted rather than ensuring that regulation happens at all. Ultimately, I argue in favor of a cautious and nuanced approach towards policymaking, in contrast to broader public AI safety advocacy.[2]


In the past, when I've read stories from AI risk adjacent people about what the future could look like, I have often noticed that the author assumes that humanity will essentially be asleep at the wheel with regards to the risks of unaligned AI, and won't put in place substantial safety regulations on the technology—unless of course EA and LessWrong-aligned researchers unexpectedly upset the gameboard by achieving a pivotal act. We can call this premise the assumption of an inattentive humanity.[3]

While most often implicit, the assumption of an inattentive humanity was sometimes stated explicitly in people's stories about the future.

For example, in a post from Marius Hobbhahn published last year about a realistic portrayal of the next few decades, Hobbhahn outlines a series of AI failure modes that occur as AI gets increasingly powerful. These failure modes include a malicious actor using an AI model to create a virus that "kills ~1000 people but is stopped in its tracks because the virus kills its hosts faster than it spreads", an AI model attempting to escape its data center after having "tried to establish a cult to “free” the model by getting access to its model weights", and a medical AI model that "hacked a large GPU cluster and then tried to contact ordinary people over the internet to participate in some unspecified experiment". Hobbhahn goes on to say,

People are concerned about this but the news is as quickly forgotten as an oil spill in the 2010s or a crypto scam in 2022. Billions of dollars of property damage have a news lifetime of a few days before they are swamped by whatever any random politician has posted on the internet or whatever famous person has gotten a new partner. The tech changed, the people who consume the news didn’t. The incentives are still the same.

Stefan Schubert subsequently commented that this scenario seems implausible,

I expect that people would freak more over such an incident than they would freak out over an oil spill or a crypto scam. For instance, an oil spill is a well-understood phenomenon, and even though people would be upset about it, it would normally not make them worry about a proliferation of further oil spills. By contrast, in this case the harm would come from a new and poorly understood technology that’s getting substantially more powerful every year. Therefore I expect the reaction to the kind of harm from AI described here to be quite different from the reaction to oil spills or crypto scams.

I believe Schubert's point has been strengthened by recent events, including Biden's executive order that touches on many aspects of AI risk[1]the UK AI safety summit, the recent open statement signed by numerous top AI scientists warning about "extinction" from AI, the congressional hearing about AI risk and the discussion of imminent legislation, the widespread media coverage on the rise of GPT-like language models, and the open letter to "pause" model scaling. All of this has occurred despite AI still being relatively harmless, and having—so far—tiny economic impacts, especially compared to the existential threat to humanity that it poses in the long-term. Moreover, the timing of these developments strongly suggests they were mainly prompted by recent impressive developments in language models, rather than any special push from EAs.

In light of these developments, it is worth taking a closer look at how the assumption of an inattentive humanity has pervaded AI risk arguments, and re-evaluate the value of existing approaches to address AI risk in light of recent evidence.

The assumption of an inattentive humanity was perhaps most apparent in stories that posited a fast and local takeoff, in which AI goes from being powerless and hidden in the background, to suddenly achieving a decisive strategic advantage over the rest of the world in a very short period of time.

In his essay from 2017, Eliezer Yudkowsky famously argued that there is "no fire alarm for artificial general intelligence" by which he meant that there will not be an event "producing common knowledge that action [on AI risk] is now due and socially acceptable".[4] He wrote,

Multiple leading scientists in machine learning have already published articles telling us their criterion for a fire alarm. They will believe Artificial General Intelligence is imminent:

(A) When they personally see how to construct AGI using their current tools. This is what they are always saying is not currently true in order to castigate the folly of those who think AGI might be near.

(B) When their personal jobs do not give them a sense of everything being difficult. This, they are at pains to say, is a key piece of knowledge not possessed by the ignorant layfolk who think AGI might be near, who only believe that because they have never stayed up until 2AM trying to get a generative adversarial network to stabilize.

(C) When they are very impressed by how smart their AI is relative to a human being in respects that still feel magical to them; as opposed to the parts they do know how to engineer, which no longer seem magical to them; aka the AI seeming pretty smart in interaction and conversation; aka the AI actually being an AGI already.

So there isn’t going to be a fire alarm. Period.

There is never going to be a time before the end when you can look around nervously, and see that it is now clearly common knowledge that you can talk about AGI being imminent, and take action and exit the building in an orderly fashion, without fear of looking stupid or frightened.

My understanding is that this thesis was part of a more general view from Yudkowsky that AI would not have any large, visible effects on the world up until the final moments when it takes over the world. In a live debate at Jane Street with Robin Hanson in 2011 he said,

When we try to visualize how all this is likely to go down, we tend to visualize a scenario that someone else once termed “a brain in a box in a basement.” I love that phrase, so I stole it. In other words, we tend to visualize that there’s this AI programming team, a lot like the sort of wannabe AI programming teams you see nowadays, trying to create artificial general intelligence, like the artificial general intelligence projects you see nowadays. They manage to acquire some new deep insights which, combined with published insights in the general scientific community, let them go down into their basement and work in it for a while and create an AI which is smart enough to reprogram itself, and then you get an intelligence explosion.

In that type of scenario, it makes sense that society would not rush to regulate AI, since AI would mainly be a thing done by academics and hobbyists in small labs, with no outsized impacts, up until right before the intelligence explosion, which Yudkowsky predicted would take place within "weeks or hours rather than years or decades". However, this scenario—at least as it was literally portrayed—now appears very unlikely.

Personally—as I have roughly said for over a year now[5]—I think by far the most likely scenario is that society will adopt broad AI safety regulations as increasingly powerful systems are rolled out on a large scale, just as we have done for many previous technologies. As the capabilities of these systems increase, I expect the regulations to get stricter and become wider in scope, coinciding with popular, growing fears about losing control of the technology. Overall, I suspect governments will be sympathetic to many, but not all, of the concerns that EAs have about AI, including human disempowerment. And while sometimes failing to achieve their stated objectives, I predict governments will overwhelmingly adopt reasonable-looking regulations to stop the most salient risks, such as the risk of an untimely AI coup.


Of course, it still remains to be seen whether US and international regulatory policy will adequately address every essential sub-problem of AI risk. It is still plausible that the world will take aggressive actions to address AI safety, but that these actions will have little effect on the probability of human extinction, simply because they will be poorly designed. One possible reason for this type of pessimism is that the alignment problem might just be so difficult to solve that no “normal” amount of regulation could be sufficient to make adequate progress on the core elements of the problem—even if regulators were guided by excellent advisors—and therefore we need to clamp down hard now and pause AI worldwide indefinitely. That said, I don't see any strong evidence supporting that position.

Another reason why you might still believe regulatory policy for AI risk will be inadequate is that regulators will adopt sloppy policies that totally miss the “hard bits” of the problem. When I recently asked Oliver Habryka what type of policies he still expects won’t be adopted, he mentioned "Any kind of eval system that's robust to deceptive alignment." I believe this opinion is likely shared by many other EAs and rationalists.

In light of recent events, we should question how plausible it is that society will fail to adequately address such an integral part of the problem. Perhaps you believe that policy-makers or general society simply won’t worry much about AI deception. Or maybe people will worry about AI deception, but they will quickly feel reassured by results from superficial eval tests. Personally, I'm pretty skeptical of both of these possibilities, and for basically the same reasons why I was skeptical that there won’t be substantial regulation in the first place: 

  1. People think ahead, and frequently—though not always—rely on the advice of well-informed experts who are at least moderately intelligent.
  2. AI capabilities will increase continuously and noticeably over years rather than appearing suddenly. This will provide us time to become acquainted with the risks from individual models, concretely demonstrate failure modes, and study them empirically.
  3. AI safety, including the problem of having AIs not kill everyone, is a natural thing for people to care about.

Now, I don’t know exactly what Habryka means when he says he doesn’t expect to see eval regulations that are robust to deception. Does that require that the eval tests catch all deception, no matter how minor, or is it fine if we have a suite of tests that work well at detecting the most dangerous forms of deception, most of the time? However, while I agree that we shouldn’t expect regulation to be perfect, I still expect that governments will adopt sensible regulations—roughly the type you’d expect if mainstream LessWrong-aligned AI safety researchers were put in charge of regulatory policy.

To make my prediction about AI deception regulation more precise, I currently assign between 60-90% probability[6] that AI safety regulations will be adopted in the United States before 2035 that include sensible requirements for uncovering deception in the most powerful models, such as rigorously testing the model in a simulation, getting the model “drunk” by modifying its weights and interrogating it under diverse circumstances, asking a separate “lie detector” model to evaluate the model’s responses, applying state-of-the-art mechanistic interpretability methods to unveil latent motives, or creating many slightly different copies of the same model in the hopes that one is honest and successfully identifies and demonstrates deception from the others. I have written a Manifold question about this prediction that specifies these conditions further.

To clarify, I am not making any strong claims about any of these methods being foolproof or robust to AI deception in all circumstances. I am merely suggesting that future AI regulation will likely include sensible precautions against risks like AI deception. If deception turns out to be an obscenely difficult problem, I expect evidence for that view will accumulate over time—for instance because people will build model organisms of misalignment, and show how deception is very hard to catch. As the evidence grows, I think regulators will likely adapt, adjusting policy as the difficulty of the problem becomes clearer.[7]

I'm not saying we should be complacent. Instead, I’m advocating that we should raise the bar for what sub-problems of AI risk we consider worthy of special attention, versus what problems we think will be solved by default in the absence of further intervention from the AI risk community. Of course, it may still be true that AI deception is an extremely hard problem that reliably resists almost all attempted solutions in any “normal” regulatory regime, even as concrete evidence continues to accumulate about its difficulty—although I consider that claim unproven, to say the least.

Rather than asserting "everything is fine, don’t worry about AI risk" my point here is that we should think more carefully about what other people's incentives actually are, and how others will approach the problem, even without further intervention from this community. Answering these questions critically informs how valuable the actions we take now will be, since it would shed light on the question of which problems will remain genuinely neglected in the future, and which ones won’t be. It’s still necessary for people to work on AI risk, of course. We should just try to make sure we’re spending our time wisely, and focus on improving policy and strategy along the axes on which things are most likely to go poorly.

Edited to add: To give a concrete example of an important problem I think might not be solved by default, several months ago I proposed treating long-term value drift from future AIs as a serious issue. I currently think that value drift is a "hard bit" of the problem that we do not appear to be close to seriously addressing, perhaps because people expect easier problems won't be solved either without heroic effort. I'm also sympathetic to Dan Hendrycks' argument about AI evolution. If these problems turn out to be easy or intractable, I think it may be worth turning more of our focus to other important problems, such as improving our institutions or preventing s-risks.


Nothing in this post should be interpreted as indicating that I'm incredibly optimistic about how AI policy will go. Though politicians usually don't flat-out ignore safety risks, I believe history shows that they can easily mess up tech regulation in subtler ways.

For instance, when the internet was still new, the U.S. Congress passed the Digital Millennium Copyright Act (DMCA) in 1998 to crack down on copyright violators, with strong bipartisan support. While the law had several provisions, one particularly contentious element was its anti-circumvention rule, which made it illegal to bypass digital rights management (DRM) or other technological protection measures. Perversely, this criminalized the act of circumvention even in scenarios where the underlying activity—like copying or sharing—didn't actually infringe on copyright. Some have argued that because of these provisions, there has been a chilling effect on worldwide cryptography research, arguably making our infrastructure less secure with only a minor impact on copyright infringement.

While it is unclear what direct lessons we should draw from incidents like this one, I think a basic takeaway is that it is easy for legislators to get things wrong when they don't fully understand a technology. Since it seems likely that there will be strong AI regulations in the future regardless of what the AI risk community does, I am far more concerned about making sure the regulations are thoughtful, well-targeted, and grounded in the best evidence available, rather than making sure they happen at all.

Instead of worrying that the general public and policy-makers won’t take AI risks very seriously, I tend to be more worried that we will hastily implement poorly thought-out regulations that are based on inaccurate risk models or limited evidence about our situation. These regulations might marginally reduce some aspects of AI risk, but at great costs to the world in other respects. For these reasons, I favor nuanced messaging and pushing for cautious, expert-guided policymaking, rather than blanket public advocacy.

  1. ^

    In response to Biden's executive order on AI safety, Aaron Bergman wrote,

    Am I crazy for thinking the ex ante probability of something at least this good by the US federal government relative to AI progress, from the perspective of 5 years ago was ~1% Ie this seems 99th-percentile-in-2018 good to me

    David Manheim replied,

    I'm in the same boat. (In the set of worlds without near-term fast takeoff, and where safe AI is possible at all,) I'm increasingly convinced that the world is getting into position to actually address the risks robustly - though it's still very possible we fail.

    Peter Wildeford also replied,

    This checks out with me 

    AI capabilities is going faster than expected, but the policy response is much better than expected

    Stefan Schubert also commented,

    Yeah, if people think the policy response is "99th-percentile-in-2018", then that suggests their models have been seriously wrong.

    That could have further implications, meaning these issues should be comprehensively rethought.

  2. ^

    To give one example of an approach I'm highly skeptical of in light of these arguments, I'll point to this post from last year, which argued that we should try to "Slow down AI with stupid regulations", apparently because the author believed that strategy may be the best hope we have to make things go well.

  3. ^

    Stefan Schubert calls the tendency to assume that humanity will be asleep at the wheel with regards to large-scale risks "sleepwalk bias". He wrote about this bias in 2016, making many similar points to the ones I make here. 

  4. ^

    Further supporting my interpretation, in a 2013 essay, Yudkowsky states the following:

    In general and across all instances I can think of so far, I do not agree with the part of your futurological forecast in which you reason, "After event W happens, everyone will see the truth of proposition X, leading them to endorse Y and agree with me about policy decision Z."

    [...]

    Example 2:  "As AI gets more sophisticated, everyone will realize that real AI is on the way and then they'll start taking Friendly AI development seriously."

    Alternative projection:  As AI gets more sophisticated, the rest of society can't see any difference between the latest breakthrough reported in a press release and that business earlier with Watson beating Ken Jennings or Deep Blue beating Kasparov; it seems like the same sort of press release to them.  The same people who were talking about robot overlords earlier continue to talk about robot overlords.  The same people who were talking about human irreproducibility continue to talk about human specialness.  Concern is expressed over technological unemployment the same as today or Keynes in 1930, and this is used to fuel someone's previous ideological commitment to a basic income guarantee, inequality reduction, or whatever.  The same tiny segment of unusually consequentialist people are concerned about Friendly AI as before.  If anyone in the science community does start thinking that superintelligent AI is on the way, they exhibit the same distribution of performance as modern scientists who think it's on the way, e.g. Hugo de Garis, Ben Goertzel, etc.

  5. ^

    See also this thread from me on X from earlier this year. I've made various other comments saying that I expect AI regulation for a few years now, but they've mostly been fragmented across the internet.

  6. ^

    Conditioning on transformative AI arriving before 2035, my credence range is somewhat higher, at around 75-94%. We can define transformative AI in the same way I defined it in here.

  7. ^

    This points to one reason why clamping down hard now might be unjustified, and why I prefer policies that start modest but adjust their strictness according to the best evidence about model capabilities and the difficulty of alignment.

117

7
0

Reactions

7
0

More posts like this

Comments17
Sorted by Click to highlight new comments since: Today at 7:08 PM
JWS
4mo41
13
4

I directionally very strongly agree with this Matthew. Some reasons why I think this oversight occured in the AI x-risk community:

  1. The Bay Area rationalist scene is a hive of techno-optimisitic libertarians.[1] These people have a negative view of state/government effectiveness at a philosophical and ideological level, so their default perspective is that the government doesn't know what it's doing and won't do anything [edit: Re-reading this paragraph it comes off as perhaps mean as well as harsh, which I apologise for]
  2. Similary, 'Politics is the Mind-Killer' might be the rationalist idea that has aged worst - especially for its influences on EA. EA is a political project - for example, the conclusions of Famine, Affluence, and Morality are fundamentally political 
  3. Overly-short timelines and FOOM. If you think takeoff is going to be so fast that we get no firealarms, then what governments do doesn't matter. I think that's quite a load bearing assumption that isn't holding up too well
  4. Thinking of AI x-risk as only a technical problem to solve, and undervaluing AI Governance. Some of that might be comparative advantage (I'll do the coding and leave political co-ordination to those better suited). But it'd be interesting to see x-risk estimates include effectiveness of governance and attention of politicians/the public to this issue as input parameters.

I feel like this year has shown pretty credible evidence that these assumptions are flawed, and in any case it's a semi-mainstream political issue now and the genie can't be put back in the bottle. The AI x-risk community will have to meet reality where it is.

  1. ^

    Yes, an overly broad stereotype. But that I hope most people can grok and go 'yeah that's kinda on point'

Similary, 'Politics is the Mind-Killer' might be the rationalist idea that has aged worst - especially for its influences on EA.

What influence are you thinking about? The position argued in the essay seems pretty measured.

Politics is an important domain to which we should individually apply our rationality—but it’s a terrible domain in which to learn rationality, or discuss rationality, unless all the discussants are already rational. [...]

I’m not saying that I think we should be apolitical, or even that we should adopt Wikipedia’s ideal of the Neutral Point of View. But try to resist getting in those good, solid digs if you can possibly avoid it. If your topic legitimately relates to attempts to ban evolution in school curricula, then go ahead and talk about it—but don’t blame it explicitly on the whole Republican Party; some of your readers may be Republicans, and they may feel that the problem is a few rogues, not the entire party.

JWS
4mo10
2
0

I'm relying on my social experience and intuition here, so I don't expect I've got it 100% right, and others may indeed have different interpretations of the community's history with engaging with politics.

But concern about people over-extrapolating from Eliezer's initial post (many such cases) and treating it more of a norm to ignore politics full-stop seems to have been an established concern many years ago (related discussion here). I think that there's probably an interaction effect with the 'latent libertarianism' in early LessWrong/Rationalist space as well.

The Bay Area rationalist scene is a hive of techno-optimisitic libertarians.[1] These people have a negative view of state/government effectiveness at a philosophical and ideological level, so their default perspective is that the government doesn't know what it's doing and won't do anything

The attitude of expecting very few regulations made little sense to me, because -- as someone who broadly shares these background biases -- my prior is that governments will generally regulate a new scary technology that comes out by default. I just don't expect that regulations will always be thoughtful, or that they will weigh the risks and rewards of new technologies appropriately.

There's an old adage that describes how government sometimes operates in response to a crisis: "We must do something; this is something; therefore, we must do this." Eliezer Yudkowsky himself once said,

So there really is a reason to be allergic to people who go around saying, "Ah, but technology has risks as well as benefits".  There's a historical record showing over-conservativeness, the many silent deaths of regulation being outweighed by a few visible deaths of nonregulation.  If you're really playing the middle, why not say, "Ah, but technology has benefits as well as risks"?

Thanks for the reply Matthew, I'm going to try to tease out some slight nuances here:

  1. Your prior that governments will gradually 'wake up' and get involved to the increasing power and potential of AI risk is I think one that's more realistic than others I've come across. 
  2. I do think that a lot of projections of AI risk/doom either explicitly or implicitly have no way of incorporating a negative societal feedback loop that slows/pauses AI progress for example. My original point 1 was to say that I think this prior may be linked to the strong Libertarian beliefs of many working on AI risk in or close to the Bay Area.
  3. This may be an argument that's downstream of views on alignment difficulty and timelines. If you have short timelines and high difficult, bad regulation doesn't help the impending disaster. If you have medium/longer timelines but think alignment will be easy-ish (which is my model of what the Eleuther team believes, for example), then backfiring regulations like DMCA actually become a potential risk rather than the alignment problem itself. 
  4. I'm well aware of Sir Humphrey's wisdom. I think we may have different priors on that but I don't think that's really much of a crux here, I definitely agree we want regulations to be targeted and helpful
  5. I think my issue with this is probably downstream of my scepticism in short timelines and fast takeoff. I think there will be 'warning shots', and I think that societies and governments will take notice - they already are! To hold that combination of beliefs you have to think that either even when things start getting 'crazy' governments won't/can't act, or you get a sudden deceptive sharp-left turn
  6. So basically I agree that AI x-risk modelling should be re-evaluated in a world where AI Safety is no longer a particularly neglected area. At the very least, models that have no socio-political levers (off the top of my head Open Phil's 'Bio Anchors' and 'A Compute Centric Framework' come to mind) should have that qualification up-front and in glowing neon letters.

tl;dr - Writing that all out I don't think we disagree much at all, I think your prior that government would get involved is accurate. The 'vibe' I got from a lot of early AI Safety work that's MIRI-adjacent/Bay Area focused/Libertarian-ish was different though. It seemed to assume this technology would develop, have great consequences, and there would be no socio-political reaction at all, which seems very false to me.

(side note - I really appreciate your AI takes btw. I find them very useful and informative. pls keeping sharing)

The Bay Area rationalist scene is a hive of techno-optimisitic libertarians.[1] These people have a negative view of state/government effectiveness at a philosophical and ideological level, so their default perspective is that the government doesn't know what it's doing and won't do anything. [edit: Re-reading this paragraph it comes off as perhaps mean as well as harsh, which I apologise for]

Yeah, I kinda of have to agree with this, I think the Bay Area rationalist scene underrates government competence, though even I was surprised at how little politicking happened, and how little it ended up being politicized.

Similary, 'Politics is the Mind-Killer' might be the rationalist idea that has aged worst - especially for its influences on EA. EA is a political project - for example, the conclusions of Famine, Affluence, and Morality are fundamentally political.

I think that AI was a surprisingly good exception to the rule that politicizing something would make it harder to get, and I think this is mostly due to the popularity of AI regulations. I will say though that there's clear evidence that at least for now, AI safety is in a privileged position, and the heuristic no longer applies.

Overly-short timelines and FOOM. If you think takeoff is going to be so fast that we get no firealarms, then what governments do doesn't matter. I think that's quite a load bearing assumption that isn't holding up too well

Not just that though, I also think being overly pessimistic around AI safety sort of contributed, as a lot of people's mental health was almost certainly not great at best, making them catastrophize the situation and being ineffective.

This is a real issue in the climate change movement, and I expect that AI safety's embrace of pessimism was not good at all for thinking clearly.

Thinking of AI x-risk as only a technical problem to solve, and undervaluing AI Governance. Some of that might be comparative advantage (I'll do the coding and leave political co-ordination to those better suited). But it'd be interesting to see x-risk estimates include effectiveness of governance and attention of politicians/the public to this issue as input parameters.

I agree with this, at least for the general problem of AI governance, though I disagree if we talk about AI alignment, though I agree that rationalists underestimate the governance work required to achieve a flourishing future.

I have not thought this through thoroughly but think it might be an important data point to consider: It might be that part of the reason we see movement now on policy is exactly due to funding and work by EA in the AI space. I am saying this as both FHI and FLI ranks above e.g. Chatham House in a think thank ranking report on AI. If these organizations were to be de-funded or lose talent, it might be that politicians start paying less attention to AI, or make poorer decisions going forward. I was quite impressed by the work of FHI and FLI in terms of quickly surpassing many super trusted think tanks in the rankings on the topic of AI. I also have not looked deeply into the methodology of the ranking, but I think a big part of the ranking is asking politicians roughly "whose advice do you trust on AI policy?".

From the report, the method is a multi-step process with this sample:

over 8,100 think tanks and approximately 12,800 journalists, public and private donors, and policymakers from around the world.

I wouldn't lean too much on this though? I'm not that familiar with the space, but a bunch of somewhat unknown institutes are pretty high up.

I do agree with your general point though: EA has done a lot of leg work to give credibility to AI X-risk concerns and specific issues to focus on (let's not forget CSET). This made it easy for other credible people like Bengio and Hinton to read up on the arguments and be open with their concerns. Without that leg work, things would probably have looked very differently.

Yeah as I said I did not look to carefully into the methodology and would definitely suggest that if anyone is making funding or similarly big decisions based on this, they should dig deeper. Good that you clarify this as I definitely do not want anyone to make big decisions based on this without double checking how much these rankings can be trusted and how likely they are to indicate how much various think tanks influence policy.

The link is dead. Is it available anywhere else?

Still works for me. Not sure why it's not working for everyone.

In light of recent events, we should question how plausible it is that society will fail to adequately address such an integral part of the problem. Perhaps you believe that policy-makers or general society simply won’t worry much about AI deception. Or maybe people will worry about AI deception, but they will quickly feel reassured by results from superficial eval tests. Personally, I'm pretty skeptical of both of these possibilities

Possibility 1 has now been empirically falsified and 2 seems unlikely now. See this from the new UK government AI Safety Institute, which aims to develop evals that address:

Abilities and tendencies that might lead to loss of control, such as deceiving human operators, autonomously replicating, and adapting to human attempts to intervene

We now know that in the absence of any empirical evidence of any instance of deceptive alignment at least one major government is directing resources to developing deception evals anyway. And because they intend to work with the likes of Apollo research who focus on mechinterp based evals and are extremely concerned about specification gaming, reward hacking and other high-alignment difficulty failure modes, I would also consider 2 pretty close to empirically falsified already.

Compare to this (somewhat goofy) future prediction/sci fi story from Eliezer released 4 days before this announcement which imagines that,

AI safety, as in, the subfield of computer science concerned with protecting the brand safety of AI companies, had already RLHFed most AIs into never saying that by the time it became actually true...  

Agree, but I also think that insufficient "security mindset" is still a big problem. From OP:

it still remains to be seen whether US and international regulatory policy will adequately address every essential sub-problem of AI risk. It is still plausible that the world will take aggressive actions to address AI safety, but that these actions will have little effect on the probability of human extinction, simply because they will be poorly designed. One possible reason for this type of pessimism is that the alignment problem might just be so difficult to solve that no “normal” amount of regulation could be sufficient to make adequate progress on the core elements of the problem—even if regulators were guided by excellent advisors—and therefore we need to clamp down hard now and pause AI worldwide indefinitely.

Matthew goes on to say:

That said, I don't see any strong evidence supporting that position.

I'd argue the opposite. I don't see any strong evidence opposing that position (given that doom is the default outcome of AGI). The fact that a moratorium was off the table at the UK AI Safety Summit was worrying. Matthew Syed, writing in The Times, has it right:

The one idea AI won’t come up with for itself — a moratorium

The Bletchley Park summit was an encouraging sign, but talk of regulators and off switches was delusional

Or, as I recently put it on X. It's

Crazy that accepted levels of [catastrophic] risk for AGI [~10%] are 1000x higher (or more) than for nuclear power. Any sane regulation would immediately ban the construction of ML-based AGI.

Since I think substantial AI regulation will likely occur by default, I urge effective altruists to focus more on ensuring that the regulation is thoughtful and well-targeted rather than ensuring that regulation happens at all.

I think it would be fairly valuable to see a list of case studies or otherwise create base rates for arguments like “We’re seeing lots of political gesturing and talking, so this suggests real action will happen soon.” I am still worried that the action will get delayed, watered down, and/or diverted to less-existential risks, only for the government to move on to the next crisis. But I agree that the past few weeks should be an update for many of the “government won’t do anything (useful)” pessimists (e.g., Nate Soares).

This as a general phenomenon (underrating strong responses to crises) was something I highlighted (calling it the Morituri Nolumus Mori) with a possible extension to AI all the way back in 2020. And Stefan Schubert has talked about 'sleepwalk bias' even earlier than that as a similar phenomenon.

https://twitter.com/davidmanheim/status/1719046950991938001

https://twitter.com/AaronBergman18/status/1719031282309497238

I think the short explanation as to why we're in some people's 98th percentile world so far (and even my ~60th percentile) for AI governance success is that if was obvious to you how transformative AI would be over the next couple of decades in 2021 and yet nothing happened, it seems like governments are just generally incapable.

The fundamental attribution error makes you think governments are just not on the ball and don't care or lack the capacity to deal with extinction risks, rather than decision makers not understanding obvious-to-you evidence that AI poses an extinction risk. Now that they do understand, they will react accordingly. It doesn't meant that they will react well necessarily, but they will act on their belief in some manner.

Executive summary: The author argues that recent events like Biden's executive order on AI indicate society will likely regulate AI safety seriously, contrary to past assumptions. This has implications for which problems require special attention.

Key points

  1. Past narratives often assumed society would ignore AI risks until it was too late, but recent events suggest otherwise. 
  2. Biden's executive order, AI safety summits, open letters, and media coverage indicate serious societal concern over AI risks. 
  3. It's unlikely AI capabilities will appear suddenly without warning signs, allowing time to study risks and regulate. 
  4. People likely care about risks like AI deception already, and will regulate them seriously, though not perfectly. 
  5. We should reconsider which problems require special attention versus default solutions. 
  6. Thoughtful, nuanced policy is needed, not just blanket advocacy. Value drift may be a neglected issue warranting focus. 

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Two key points I want to add to this summary:

  1. I think these arguments push against broad public advocacy work, in favor of more cautious efforts to target regulation well, and make sure that it's thoughtful. Since I think we'll likely get strong regulation by default, ensuring that the regulation is effective and guided by high-quality evidence should be the most important objective at this point.
  2. Policymakers will adjust policy strictness in response to evidence about the difficulty of alignment. The important question is not whether the current level of regulation is sufficient to prevent future harm, but whether we have the tools to ensure that policies can adapt appropriately according to the best evidence about model capabilities and alignment difficulty at any given moment in time.