How to make the best of the most important century?

Holden Karnofsky

How to make the best of the most important century?

Comments 5

Sorted by

New & upvoted

Greg_Colbourn ⏸️

Spreading ideas and building communities.

Holden, have you considered hosting seminars on the Most Important Century? (And incentivising important people to attend?) I've outlined this idea here.

[anonymous]

Figuring out how to stop AI systems from making extremely bad judgments on images designed to fool them, and other work focused on helping avoid the "worst case" behaviors of AI systems.

I haven’t seen much about adversarial examples for AI alignment. Besides https://www.alignmentforum.org/tag/adversarial-examples (which only has four articles tagged), https://www.alignmentforum.org/posts/9Dy5YRaoCxH9zuJqa/relaxed-adversarial-training-for-inner-alignment, and https://cset.georgetown.edu/publication/key-concepts-in-ai-safety-robustness-and-adversarial-examples/ are there other good articles about this topic?

Holden Karnofsky

I'm not sure whether you're asking for academic literature on adversarial examples (I believe there is a lot) or for links discussing the link between adversarial examples and alignment (most topics about the "link between X and alignment" haven't been written about a ton). The latter topic is discussed some in the recent paper Unsolved Problems in ML Safety and in An overview of 11 proposals for building safe advanced AI.

steve6320

Helping governments and societies become, well, nicer.

I'm curious what ideas are out there for helping societies, or people in general, become nicer.

Holden Karnofsky

An example would be voting (in an election - or donating, volunteering, etc.) for the candidate and/or party that you believe is more likely to act based on the best interests of humanity, vs. other considerations.

Comments

More from the author

135

Responsible Scaling Policy v3

Holden Karnofsky·4mo ago·43m read

644

Some comments on recent FTX-related events

Holden Karnofsky·3y ago·5m read

523

EA is about maximization, and maximization is perilous

Holden Karnofsky·3y ago·8m read

Curated and popular this week

Cultivating hope: calibrating the expectations for cultivated meat to end factory farming

PabloAMC 🔸·5d ago·Curated 8h ago·22m read

Was Partisanship Good for the Environmental Movement?

Jeffrey Heninger·2y ago·Curated 6d ago·6m read

This is the third in a sequence of posts taken from my recent report: Why Did Environmentalism Become Partisan? Summary Rising partisanship did not make environmentalism more popular or politically effective. Instead, it saw flat or falling overall public opinion, fewer major legislative achievements, and fluctuating executive actions. Public Opinion...

GWWC's 2025 impact evaluation (executive summary)

Aidan Whitfield🔸, Giving What We Can🔸·2d ago·2m read

This post presents the executive summary from Giving What We Can’s impact evaluation for 2025. At the end of this post we share links to more information, including the full report and...

Recent opportunities to take action

RP is looking for project founders in neglected animal areas

Rethink Priorities·2h ago·7m read

Time Sensitive Do Gooding Opportunities

Bentham's Bulldog·3h ago·5m read

146

Possible mistake EAs are making and shout out to Pause AI UK

Michelle_Hutchinson·1w ago·4m read

From Forecasting Transformative AI: What's the Burden of Proof?: "I am forecasting more than a 10% chance transformative AI will be developed within 15 years (by 2036); a ~50% chance it will be developed within 40 years (by 2060); and a ~2/3 chance it will be developed this century (by 2100)." Also see Some additional detail on what I mean by "most important century." ↩︎
These include the books Superintelligence, Human Compatible, Life 3.0, and The Alignment Problem. The shortest, most accessible presentation I know of is The case for taking AI seriously as a threat to humanity (Vox article by Kelsey Piper). This report on existential risk from power-seeking AI, by Open Philanthropy's Joe Carlsmith, lays out a detailed set of premises that would collectively imply the problem is a serious one. ↩︎
The order of goodness isn't absolute, of course. There are versions of "Adversarial Technological Maturity" that could be worse than "Misaligned AI" - for example, if the former results in power going to those who deliberately inflict suffering. ↩︎
Part of the reason for this is that faster-moving, less-careful parties could end up quickly outnumbering others and determining the future of the galaxy. There is also a longer-run risk discussed in Nick Bostrom's The Future of Human Evolution; also see this discussion of Bostrom's ideas on Slate Star Codex, though also see this piece by Carl Shulman arguing that this dynamic is unlikely to result in total elimination of nice things. ↩︎
See page 191. ↩︎
E.g., see this section of Digital People Would Be An Even Bigger Deal. ↩︎
One relevant paper: Public Policy and Superintelligent AI: A Vector Field Approach by Bostrom, Dafoe and Flynn. ↩︎
Adversarial Technological Maturity refers to a world in which highly advanced technology has already been developed, likely with the help of AI, and different coalitions are vying for influence over the world. By contrast, "Competition" refers to a strategy for how to behave before the development of advanced AI. One might imagine a world in which some government or coalition takes a "competition" frame, develops advanced AI long before others, and then makes a series of good decisions that prevent Adversarial Technological Maturity. (Or conversely, a world in which failure to do well at "competition" raises the risks of Adversarial Technological Maturity.) ↩︎
See definitions of this problem at Wikipedia and Paul Christiano's Medium. ↩︎
A more detailed, private survey done for this report, asking about the probability of "doom" before 2070 due to the type of problem discussed in the report, got answers ranging from <1% to >50%. In my opinion, there are very thoughtful people who have seriously considered these matters at both ends of that range. ↩︎
Some example technical topics here. ↩︎
Some discussion of this topic here: Distinguishing definitions of takeoff - AI Alignment Forum ↩︎
Some more thought on "when money isn't enough" at this old GiveWell post. ↩︎

How to make the best of the most important century?

How to make the best of the most important century?

The "caution" frame

Worst: Misaligned AI

Next-worst:^[3] Adversarial Technological Maturity

Second-best: Negotiation and governance

Best: Reflection

Other

The role of caution

The "competition" frame

Why I fear "competition" being overrated, relative to "caution"

Key open questions for "caution" vs. "competition"

Open question: how hard is the alignment problem?

Other open questions

Robustly helpful actions

How to make the best of the most important century?

How to make the best of the most important century?

The "caution" frame

Worst: Misaligned AI

Next-worst:[3] Adversarial Technological Maturity

Second-best: Negotiation and governance

Best: Reflection

Other

The role of caution

The "competition" frame

Why I fear "competition" being overrated, relative to "caution"

Key open questions for "caution" vs. "competition"

Open question: how hard is the alignment problem?

Other open questions

Robustly helpful actions

Next-worst:^[3] Adversarial Technological Maturity