PSP

Peter S. Park

MIT (AI Existential Safety Postdoctoral Fellow), Tegmark Lab
565 karmaJoined scholar.harvard.edu/pspark

Comments
109

Thank you so much for your insightful and detailed list of ideas for AGI safety careers, Richard! I really appreciate your excellent post.

I would propose explicitly grouping some of your ideas and additional ones under a third category: “identifying and raising public awareness of AGI’s dangers.” In fact, I think this category may plausibly contain some of the most impactful ideas for reducing catastrophic and existential risks, given that alignment seems potentially difficult to achieve in a reasonable period of time (if ever) and the implementation of governance ideas is bottlenecked by public support.

For a similar argument that I found particularly compelling, please check out Greg Colbourn’s recent post: https://forum.effectivealtruism.org/posts/8YXFaM9yHbhiJTPqp/agi-rising-why-we-are-in-a-new-era-of-acute-risk-and

Fantastic writeup on your (and our) steps in founding an evidence-based nonprofit, Harry! Your insights are extremely helpful and broadly applicable, and I am similarly optimistic and excited about our next steps.

The lessons "celebrate successes big and small" and "apply early" seem especially important.

Really looking forward to our continued collaboration on StakeOut.AI, Harry!

Thank you so much for this extremely insightful comment! I strongly agree with all of your points.

“‘AI took my job’ is low-status (‘can’t adapt? skill issue’) to admit seriously thinking about, but even in the dream AGI/ASI-is-aligned scenarios, the catastrophic consequences of AGI/ASI will likely look like ‘AI took my job’ extrapolated to the entire human species…”

My guess: The point at which “AI took my job” changes from low-status to an influential rallying cry is the point when a critical mass of people “wake up” to the fact that AGI is going to take their jobs (in fact, everyone’s) and that this will happen in the near future.

Thank you so much, Geoffrey, for this compelling argument! I completely agree that a moral backlash against AI is very plausible (and in my estimation, imminent), especially from people whose career paths are or will be indefinitely automated away by AI, and who will not thrive in the new niche of 'startup founder who only hires AGI employees.'

OpenAI's mission is to create "highly autonomous systems that outperform humans at most economically valuable work" (their definition of AGI).

I cannot overstate how broadly unpopular this mission is. In fact, in my experience, whenever I told someone occupationally far-removed from AI about OpenAI's mission, they immediately diagnosed this mission as dystopian. They were also very open-minded about the plausibility of catastrophic and existential dangers in such an AGI-led future, with very few exceptions. 

The only bottleneck is that most people currently don't believe that AGI is imminent. This is, of course, starting to change: for example, with the recent CAIS statement signed by leading AI researchers and notable figures.

We should tirelessly prepare for a future in which AGI leaves large numbers of people jobless, purposeless, and angry. I don't know all the answers to how we should prepare for this dystopia, but I'm confident that we should prioritize the following two things:

(1) Truthful, frequent, and high-quality communication to angry people. e.g., communication about the true cause of their hardships—AGI companies—so that they don't blame a scapegoat.

(2) Make preparations for the prevention and resolution of the first (near-catastrophic or catastrophic) AI disaster, as well as for proposing and implementing effective AI-disaster-prevention policies after the first AI disaster.

Truly brilliant coalition-building by CAIS and collaborators. It is likely that the world has become a much safer place as a result. Congratulations!

Richard, I hope you turn out to be correct that public support for AI governance ideas will become less of a bottleneck as more powerful AI systems are released!

But I think it is plausible that we should not leave this to chance. Several of the governance ideas you have listed as promising (e.g., global GPU tracking, data center monitoring) are probably infeasible at the moment, to say the least. It is plausible that these ideas will only become globally implementable once a critical mass of people around the world become highly aware of and concerned about AGI dangers.

This means that timing may be an issue. Will the most detrimental of the AGI dangers manifest before meaningful preventative measures are implemented globally? It is plausible that before the necessary critical mass of public support builds up, a catastrophic or even existential outcome may already have occurred. It would then be too late.

The plausibility of this scenario is why I agree with Akash that identifying and raising public awareness of AGI’s dangers is an underrated approach.

Brilliant and compelling writeup, Remmelt! Thank you so much for sharing it. (And thank you so much for your kind words about my post! I really appreciate it.) 

I strongly agree with you that mechanistic interpretability is very unlikely to contribute to long-term AI safety. To put it bluntly, the fact that so many talented and well-meaning people sink their time into this unpromising research direction is unfortunate. 

I think we AI safety researchers should be more open to new ideas and approaches, rather than getting stuck in the same old research directions that we know are unlikely to meaningfully help. The post "What an actually pessimistic containment strategy looks like" has potentially good ideas on this front.

Thank you so much for organizing this, Nicole! I attended and it was extremely fun.

Thank you so much for the clarification, Jay! It is extremely fair and valuable.

I don't really understand how this is supposed to be an update for those who disagreed with you. Could you elaborate on why you think this information would change people's minds?

The underlying question is: does the increase in the amount of AI safety plans resulting from coordinating on the Internet outweigh the decrease in secrecy value of the plans in EV? If the former effect is larger, then we should continue the status-quo strategy. If the latter effect is larger, then we should consider keeping safety plans secret (especially those whose value lies primarily in secrecy, such as safety plans relevant to monitoring). 

The disagreeing commenters generally argued that the former effect is larger, and therefore we should continue the status-quo strategy. This is likely because their estimate of the latter effect was quite small and perhaps far-into-the-future.

I think ChatGPT provides evidence that the latter should be a larger concern than many people's prior. Even current-scale models are capable of nontrivial analysis about how specific safety plans can be exploited, and even how specific alignment researchers' idiosyncrasies can be exploited for deceptive misalignment. 

For this to be a threat, we would need an AGI that was

- Misaligned
- Capable enough to do significant damage if it had access to our safety plans
- Not capable enough to do a similar amount of damage without access to our safety plans

I see the line between 2 and 3 to be very narrow. I expect almost any misaligned AI capable of doing significant damage using our plans to also be capable of doing significant damage without needing them.

I am uncertain about whether the line between 2 and 3 will be narrow. I think the argument of the line between 2 and 3 being narrow often assumes fast takeoff, but I think there is a strong empirical case that takeoff will be slow and constrained by scaling, which suggests the line between 2 and 3 might be larger than one might think. But I think this is a scientific question that we should continue to probe and reduce our uncertainty about!

No, "Women and Effective Altruism" by Keerthana preceded "Brainstorming ways to make EA safer and more inclusive" by Richard.

Load more