(COI note: I work at OpenAI. These are my personal views, though.)
My quick take on the "AI pause debate", framed in terms of two scenarios for how
the AI safety community might evolve over the coming years:
1. AI safety becomes the single community that's the most knowledgeable about
cutting-edge ML systems. The smartest up-and-coming ML researchers find
themselves constantly coming to AI safety spaces, because that's the place
to go if you want to nerd out about the models. It feels like the early days
of hacker culture. There's a constant flow of ideas and brainstorming in
those spaces; the core alignment ideas are standard background knowledge for
everyone there. There are hackathons where people build fun demos, and
people figuring out ways of using AI to augment their research. Constant
interactions with the models allows people to gain really good hands-on
intuitions about how they work, which they leverage into doing great
research that helps us actually understand them better. When the public ends
up demanding regulation, there's a large pool of competent people who are
broadly reasonable about the risks, and can slot into the relevant
institutions and make them work well.
2. AI safety becomes much more similar to the environmentalist movement. It has
broader reach, but alienates a lot of the most competent people in the
relevant fields. ML researchers who find themselves in AI safety spaces are
told they're "worse than Hitler" (which happened to a friend of mine,
actually). People get deontological about AI progress; some hesitate to pay
for ChatGPT because it feels like they're contributing to the problem
(another true story); others overemphasize the risks of existing models in
order to whip up popular support. People are sucked into psychological doom
spirals similar to how many environmentalists think about climate change: if
you're not depressed then you obviou
Vasili Arkhipov is discussed less on the EA Forum than Petrov is (see also this
thread of less-discussed people). I thought I'd post a quick take describing
that incident.
Arkhipov & the submarine B-59’s missile
On October 27, 1962 (during the Cuban Missile Crisis), the Russian
diesel-powered submarine B-59 started experiencing[1] nearby depth charges from
US forces above them; the submarine had been detected and US ships seemed to be
attacking. The submarine’s air conditioning was broken,[2] CO2 levels were
rising, and B-59 was out of contact with Moscow. Two of the senior officers on
the submarine, thinking that a global war had started, wanted to launch their
“secret weapon,” a 10-kiloton nuclear torpedo. The captain, Valentin Savistky,
apparently exclaimed: “We’re gonna blast them now! We will die, but we will sink
them all — we will not become the shame of the fleet.”
The ship was authorized to launch the torpedo without confirmation from Moscow,
but all three senior officers on the ship had to agree.[3] Chief of staff of the
flotilla Vasili Arkhipov refused. He convinced Captain Savitsky that the depth
charges were signals for the Soviet submarine to surface (which they were) — if
the US ships really wanted to destroy the B-59, they would have done it by now.
(Part of the problem seemed to be that the Soviet officers were used to
different signals than the ones the Americans were using.) Arkhipov calmed the
captain down[4] and got him to surface the submarine to get orders from the
Kremlin, which ended up eventually defusing the situation.
(Here's a Vox article on the incident.)
The B-59 submarine.
1. ^
Vadim Orlov described the impact of the depth charges as being inside an oil
drum getting struck with a sledgehammer.
2. ^
Temperatures were apparently above 45ºC (113ºF).
3. ^
The B-59 was apparently the only submarine in the flotilla that required
three officers’ approval in order to fire the “special weapon” —
I am a researcher in the space community and I recently wrote a post introducing
the links between outer space and existential risk. I'm thinking about
developing this into a sequence of posts on the topic. I plan to cover:
1. Cosmic threats - what are they, how are they currently managed, and what
work is needed in this area. Cosmic threats include asteroid impacts, solar
flares, supernovae, gamma-ray bursts, aliens, rogue planets, pulsar beams,
and the Kessler Syndrome. I think it would be useful to provide a summary of
how cosmic threats are handled, and determine their importance relative to
other existential threats.
2. Lessons learned from the space community. The space community has been very
open with data sharing - the utility of this for tackling climate change,
nuclear threats, ecological collapse, animal welfare, and global health and
development cannot be understated. I may include perspective shifts here,
provided by views of Earth from above and the limitless potential that space
shows us.
3. How to access the space community's expertise, technology, and resources to
tackle existential threats.
4. The role of the space community in global politics. Space has a big role in
preventing great power conflicts and building international institutions and
connections. With the space community growing a lot recently, I'd like to
provide a briefing on the role of space internationally to help people who
are working on policy and war.
Would a sequence of posts on space and existential risk be something that people
would be interested in? (please agree or disagree to the post) I haven't seen
much on space on the forum (apart from on space governance), so it would be
something new.
One thing the AI Pause Debate Week has made salient to me: there appears to be a
mismatch between the kind of slowing that on-the-ground AI policy folks talk
about, versus the type that AI policy researchers and technical alignment people
talk about.
My impression from talking to policy folks who are in or close to
government—admittedly a sample of only five or so—is that the
main[1] coordination problem for reducing AI x-risk is about ensuring the
so-called alignment tax gets paid (i.e., ensuring that all the big labs put some
time/money/effort into safety, and that none “defect” by skimping on safety to
jump ahead on capabilities). This seems to rest on the assumption that the
alignment tax is a coherent notion and that technical alignment people are
somewhat on track to pay this tax.
On the other hand, my impression is that technical alignment people, and AI
policy researchers at EA-oriented orgs,[2] are not at all confident in there
being a viable level of time/money/effort that will produce safe AGI on the
default trajectory. The type of policy action that’s needed, so they seem to
say, is much more drastic. For example, something in the vein of global
coordination to slow, limit, or outright stop development and deployment of AI
capabilities (see, e.g., Larsen’s,[3] Bensinger’s, and Stein-Perlman’s debate
week posts), whilst alignment researchers scramble to figure out how on earth to
align frontier systems.
I’m concerned by this mismatch. It would appear that the game plans of two
adjacent clusters of people working to reduce AI x-risk are at odds. (Clearly,
this is an oversimplification and there are a range of takes from within both
clusters, but my current epistemic status is that this oversimplification
gestures at a true and important pattern.)
Am I simply mistaken about there being a mismatch here? If not, is anyone
working to remedy the situation? Or does anyone have thoughts on how this arose,
how it could be rectified, or how to prevent similar m
In Twitter and elsewhere, I've seen a bunch of people argue that AI company
execs and academics are only talking about AI existential risk because they want
to manufacture concern to increase investments and/or as a distraction away from
near-term risks and/or regulatory capture. This is obviously false.
However, there is a nearby argument that is likely true: which is that
incentives drive how people talk about AI risk, as well as which specific
regulations or interventions they ask for. This is likely to happen both
explicitly and unconsciously. It's important (as always) to have extremely solid
epistemics, and understand that even apparent allies may have (large) degrees of
self-interest and motivated reasoning.
Safety-washing is a significant concern; similar things have happened a bunch in
other fields, it likely has already happened a bunch in AI, and will likely
happen again in the months and years to come, especially if/as policymakers
and/or the general public become increasingly uneasy about AI.
People talk about AI resisting correction because successful goal-seekers
"should" resist their goals being changed. I wonder if this also acts as an
incentive for AI to attempt takeover as soon as it's powerful enough to have a
chance of success, instead of (as many people fear) waiting until it's powerful
enough to guarantee it.
Hopefully the first AI powerful enough to potentially figure out that it wants
to seize power and has a chance of succeeding is not powerful enough to
passively resist value change, so acting immediately will be its only chance.
Seems like there's room in the ecosystem for a weekly update on AI that does a
lot of contextualization / here's where we are on ongoing benchmarks. I'm
familiar with:
* a weekly newsletter on AI media (that has a section on important developments
that I like)
* Jack Clark's substack which I haven't read much of but seems more about going
in depth on new developments (though does have a "Why this matters" section.
Also I love this post in particular for the way it talks about humility and
confusion.
* Doing Westminster Better on UK politics and AI / EA, which seems really good
but again I think goes in depth on new stuff
* I could imagine spending time on aggregation of prediction markets for
specific topics, which Metaculus and Manifold are doing better and better
over time.
I'm interested in something that says "we're moving faster / less fast than we
thought we would 6 months ago" or "this event is surprising because" and kind of
gives a "you are here" pointer on the map. This Planned Obsolescence post called
"Language models surprised us" I think is the closest I've seen.
Seems hard, also maybe not worth it enough to do, also maybe it's happening and
I'm not familiar with it, would love to hear, but it's what I'd personally find
most useful and I suspect I'm not alone.