Hide table of contents
This is a linkpost for https://aisafety.dance/

Nicky Case, of "The Evolution of Trust" and "We Become What We Behold" fame (two quite popular online explainers/mini-games) has written an intro explainer to AI Safety! It looks pretty good to me, though just the first part is out, which isn't super in-depth. I particularly appreciate Nicky clearly thinking about the topic themselves, and I kind of like some of their "logic vs. intuition" frame, even though I think that aspect is less core to my model of how things will go. It's clear that a lot of love has gone into this, and I think having more intro-level explainers for AI-risk stuff is quite valuable.

===

The AI debate is actually 100 debates in a trenchcoat.

Will artificial intelligence (AI) help us cure all disease, and build a post-scarcity world full of flourishing lives? Or will AI help tyrants surveil and manipulate us further? Are the main risks of AI from accidents, abuse by bad actors, or a rogue AI itself becoming a bad actor? Is this all just hype? Why can AI imitate any artist's style in a minute, yet gets confused drawing more than 3 objects? Why is it hard to make AI robustly serve humane values, or robustly serve any goal? What if an AI learns to be more humane than us? What if an AI learns humanity's inhumanity, our prejudices and cruelty? Are we headed for utopia, dystopia, extinction, a fate worse than extinction, or — the most shocking outcome of all — nothing changes? Also: will an AI take my job?

...and many more questions.

Alas, to understand AI with nuance, we must understand lots of technical detail... but that detail is scattered across hundreds of articles, buried six-feet-deep in jargon.

So, I present to you:

RCM (Robot Catboy Maid) throwing confetti under a banner that reads: A Whirlwood Tour Guide to AI Safety for Us Warm, Normal Fleshy Humans.

This 3-part series is your one-stop-shop to understand the core ideas of AI & AI Safety* — explained in a friendly, accessible, and slightly opinionated way!

(* Related phrases: AI Risk, AI X-Risk, AI Alignment, AI Ethics, AI Not-Kill-Everyone-ism. There is no consensus on what these phrases do & don't mean, so I'm just using "AI Safety" as a catch-all.)

This series will also have comics starring a Robot Catboy Maid. Like so:

Comic. Ham the Human tells RCM (Robot Catboy Maid) to "keep this hosue clean". RCM reasons: What causes the mess? The humans cause the mess! Therefore: GET RID OF THE HUMANS. RCM then yeets Ham out of the house.

[...]

💡 The Core Ideas of AI & AI Safety

In my opinion, the main problems in AI and AI Safety come down to two core conflicts:

Logic "vs" Intuition, and Problems in the AI "vs" in Humans

Note: What "Logic" and "Intuition" are will be explained more rigorously in Part One. For now: Logic is step-by-step cognition, like solving math problems. Intuition is all-at-once recognition, like seeing if a picture is of a cat. "Intuition and Logic" roughly map onto "System 1 and 2" from cognitive science.[1]1[2]2 (👈 hover over these footnotes! they expand!)

As you can tell by the "scare" "quotes" on "versus", these divisions ain't really so divided after all...

Here's how these conflicts repeat over this 3-part series:

Part 1: The past, present, and possible futures

Skipping over a lot of detail, the history of AI is a tale of Logic vs Intuition:

Before 2000: AI was all logic, no intuition.

This was why, in 1997, AI could beat the world champion at chess... yet no AIs could reliably recognize cats in pictures.[3]3

(Safety concern: Without intuition, AI can't understand common sense or humane values. Thus, AI might achieve goals in logically-correct but undesirable ways.)

After 2000: AI could do "intuition", but had very poor logic.

This is why generative AIs (as of current writing, May 2024) can dream up whole landscapes in any artist's style... yet gets confused drawing more than 3 objects. (👈 click this text! it also expands!)

(Safety concern: Without logic, we can't verify what's happening in an AI's "intuition". That intuition could be biased, subtly-but-dangerously wrong, or fail bizarrely in new scenarios.)

Current Day: We still don't know how to unify logic & intuition in AI.

But if/when we do, that would give us the biggest risks & rewards of AI: something that can logically out-plan us, and learn general intuition. That'd be an "AI Einstein"... or an "AI Oppenheimer".

Summed in a picture:

Timeline of AI. Before the year 2000, mostly "logic". From 2000 to now, mostly "intuition". In the future, maybe both?

So that's "Logic vs Intuition". As for the other core conflict, "Problems in the AI vs The Humans", that's one of the big controversies in the field of AI Safety: are our main risks from advanced AI itself, or from humans misusing advanced AI?

(Why not both?)

Part 2: The problems

The problem of AI Safety is this:[4]4

The Value Alignment Problem:
“How can we make AI robustly serve humane values?”

NOTE: I wrote humane, with an "e", not just "human". A human may or may not be humane. I'm going to harp on this because both advocates & critics of AI Safety keep mixing up the two.[5]5[6]6

We can break this problem down by "Problems in Humans vs AI":

Humane Values:
“What are humane values, anyway?”
(a problem for philosophy & ethics)

The Technical Alignment Problem:
“How can we make AI robustly serve any intended goal at all?”
(a problem for computer scientists - surprisingly, still unsolved!)

The technical alignment problem, in turn, can be broken down by "Logic vs Intuition":

Problems with AI Logic:[7]7 ("game theory" problems)

  • AIs may accomplish goals in logical but undesirable ways.
  • Most goals logically lead to the same unsafe sub-goals: "don't let anyone stop me from accomplishing my goal", "maximize my ability & resources to optimize for that goal", etc.

Problems with AI Intuition:[8]8 ("deep learning" problems)

  • An AI trained on human data could learn our prejudices.
  • AI "intuition" isn't understandable or verifiable.
  • AI "intuition" is fragile, and fails in new scenarios.
  • AI "intuition" could partly fail, which may be worse: an AI with intact skills, but broken goals, would be an AI that skillfully acts towards corrupted goals.

(Again, what "logic" and "intuition" are will be more precisely explained later!)

Summed in a picture:

A diagram breaking down the AI Alignment Problem. "How can we align AI with humane values?" splits into "Technical Alignment" and "Humane Values". Technical Alignment splits into "AI Logic (game theory)" and "AI Intuition (deep learning)"

[Read the rest of the article here]

Comments2


Sorted by Click to highlight new comments since:

Executive summary: This comprehensive guide explains the core ideas and debates in AI and AI safety, covering the history, present state, and possible futures of the field in an accessible way.

Key points:

  1. The history of AI can be divided into two main eras: "Good Old-Fashioned AI" focused on logic without intuition before 2000, and deep learning focused on intuition without robust logic after 2000.
  2. The next major advance in AI may come from merging the logical and intuitive approaches, but this would come with great potential benefits and risks.
  3. The field of AI safety involves awkward alliances between those working on AI capabilities and safety, and those concerned about risks ranging from unintentional accidents to intentional misuse.
  4. Experts disagree on timelines for artificial general intelligence (AGI), the speed of a potential intelligence explosion or "takeoff", and whether advanced AI will have good or catastrophic impacts.
  5. Steering the course of AI development to invest more in safety and beneficial outcomes is crucial, as AI could be enormously destructive if not properly controlled, but enormously beneficial if it is.

 

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Curated and popular this week
trammell
 ·  · 25m read
 · 
Introduction When a system is made safer, its users may be willing to offset at least some of the safety improvement by using it more dangerously. A seminal example is that, according to Peltzman (1975), drivers largely compensated for improvements in car safety at the time by driving more dangerously. The phenomenon in general is therefore sometimes known as the “Peltzman Effect”, though it is more often known as “risk compensation”.[1] One domain in which risk compensation has been studied relatively carefully is NASCAR (Sobel and Nesbit, 2007; Pope and Tollison, 2010), where, apparently, the evidence for a large compensation effect is especially strong.[2] In principle, more dangerous usage can partially, fully, or more than fully offset the extent to which the system has been made safer holding usage fixed. Making a system safer thus has an ambiguous effect on the probability of an accident, after its users change their behavior. There’s no reason why risk compensation shouldn’t apply in the existential risk domain, and we arguably have examples in which it has. For example, reinforcement learning from human feedback (RLHF) makes AI more reliable, all else equal; so it may be making some AI labs comfortable releasing more capable, and so maybe more dangerous, models than they would release otherwise.[3] Yet risk compensation per se appears to have gotten relatively little formal, public attention in the existential risk community so far. There has been informal discussion of the issue: e.g. risk compensation in the AI risk domain is discussed by Guest et al. (2023), who call it “the dangerous valley problem”. There is also a cluster of papers and works in progress by Robert Trager, Allan Dafoe, Nick Emery-Xu, Mckay Jensen, and others, including these two and some not yet public but largely summarized here, exploring the issue formally in models with multiple competing firms. In a sense what they do goes well beyond this post, but as far as I’m aware none of t
LewisBollard
 ·  · 6m read
 · 
> Despite the setbacks, I'm hopeful about the technology's future ---------------------------------------- It wasn’t meant to go like this. Alternative protein startups that were once soaring are now struggling. Impact investors who were once everywhere are now absent. Banks that confidently predicted 31% annual growth (UBS) and a 2030 global market worth $88-263B (Credit Suisse) have quietly taken down their predictions. This sucks. For many founders and staff this wasn’t just a job, but a calling — an opportunity to work toward a world free of factory farming. For many investors, it wasn’t just an investment, but a bet on a better future. It’s easy to feel frustrated, disillusioned, and even hopeless. It’s also wrong. There’s still plenty of hope for alternative proteins — just on a longer timeline than the unrealistic ones that were once touted. Here are three trends I’m particularly excited about. Better products People are eating less plant-based meat for many reasons, but the simplest one may just be that they don’t like how they taste. “Taste/texture” was the top reason chosen by Brits for reducing their plant-based meat consumption in a recent survey by Bryant Research. US consumers most disliked the “consistency and texture” of plant-based foods in a survey of shoppers at retailer Kroger.  They’ve got a point. In 2018-21, every food giant, meat company, and two-person startup rushed new products to market with minimal product testing. Indeed, the meat companies’ plant-based offerings were bad enough to inspire conspiracy theories that this was a case of the car companies buying up the streetcars.  Consumers noticed. The Bryant Research survey found that two thirds of Brits agreed with the statement “some plant based meat products or brands taste much worse than others.” In a 2021 taste test, 100 consumers rated all five brands of plant-based nuggets as much worse than chicken-based nuggets on taste, texture, and “overall liking.” One silver lining
 ·  · 1m read
 ·