AXRP Episode 24 - Superalignment with Jan Leike

DanielFilan

Effective Altruism Forum
EA Forum

AXRP Episode 24 - Superalignment with Jan Leike

by DanielFilan

Jul 27 20231 min read 0

23

AI safetyExistential riskAI alignmentAudioOpenAIPodcasts

Frontpage

This is a linkpost for https://axrp.net/episode/2023/07/27/episode-24-superalignment-jan-leike.html

I thought people here might be interested in my interview with Jan Leike about OpenAI's superalignment team, and what their plan is to solve superintelligence alignment within 4 years.

My blurb for the episode:

Recently, OpenAI made a splash by announcing a new “Superalignment” team. Lead by Jan Leike and Ilya Sutskever, the team would consist of top researchers, attempting to solve alignment for superintelligent AIs in four years by figuring out how to build a trustworthy human-level AI alignment researcher, and then using it to solve the rest of the problem. But what does this plan actually involve? In this episode, I talk to Jan Leike about the plan and the challenges it faces.

Link to the transcript is here, and a link to the audio is here

23 Reactions

Comments

No comments on this post yet.

Be the first to respond.

More from DanielFilan

Podcast with Oli Habryka on LessWrong / Lightcone Infrastructure

DanielFilan

· 2y ago

Announcing the Vitalik Buterin Fellowships in AI Existential Safety!

DanielFilan

· 3y ago · 1m read

Podcast interview with Carrick Flynn

DanielFilan

· 2y ago · 1m read

Curated and popular this week

From Comfort Zone to Frontiers of Impact: Pursuing A Late-Career Shift to Existential Risk Reduction

Jim Chapman

· 7d ago · 12m read

By Jim Chapman, Linkedin. TL;DR: In 2023, I was a 57-year-old urban planning consultant and non-profit professional with 30 years of leadership experience. After talking with my son about rationality, effective altruism, and AI risks, I decided to pursue a pivot to existential risk reduction work. The last time I had to apply for a job was in 1994. By the end of 2024, I had spent ~740 hours on courses, conferences, meetings with ~140 people, and 21 job applications. I hope that by sharing my experiences, you can gain practical insights, inspiration, and resources to navigate your career transition, especially for those who are later in their career and interested in making an impact in similar fields. I share my experience in 5 sections - sparks, take stock, start, do, meta-learnings, and next steps. [Note - as of 03/05/2025, I am still pursuing my career shift.] Sparks – 2022 During a Saturday bike ride, I admitted to my son, “No, I haven’t heard of effective altruism.” On another ride, I told him, “I'm glad you’re attending the EAGx Berkely conference." Some other time, I said, "Harry Potter and Methods of Rationality sounds interesting. I'll check it out." While playing table tennis, I asked, "What do you mean ChatGPT can't do math? No calculator? Next token prediction?" Around tax-filing time, I responded, "You really think retirement planning is out the window? That only 1 of 2 artificial intelligence futures occurs – humans flourish in a post-scarcity world or humans lose?" These conversations intrigued and concerned me. After many more conversations about rationality, EA, AI risks, and being ready for something new and more impactful, I decided to pivot my career to address my growing concerns about existential risk, particularly AI-related. I am very grateful for those conversations because without them, I am highly confident I would not have spent the last year+ doing that. Take Stock - 2023 I am very concerned about existential risk cause areas in ge

In a time of rapid change, we should re-examine system-level interventions

jackva

· 2d ago · 3m read

[Edits on March 10th for clarity, two sub-sections added] Watching what is happening in the world -- with lots of renegotiation of institutional norms within Western democracies and a parallel fracturing of the post-WW2 institutional order -- I do think we, as a community, should more seriously question our priors on the relative value of surgical/targeted and broad system-level interventions. Speaking somewhat roughly, with EA as a movement coming of age in an era where democratic institutions and the rule-based international order were not fundamentally questioned, it seems easy to underestimate how much the world is currently changing and how much riskier a world of stronger institutional and democratic backsliding and weakened international norms might be. Of course, working on these issues might be intractable and possibly there's nothing highly effective for EAs to do on the margin given much attention to these issues from society at large. So, I am not here to confidently state we should be working on these issues more. But I do think in a situation of more downside risk with regards to broad system-level changes and significantly more fluidity, it seems at least worth rigorously asking whether we should shift more attention to work that is less surgical (working on specific risks) and more systemic (working on institutional quality, indirect risk factors, etc.). While there have been many posts along those lines over the past months and there are of course some EA organizations working on these issues, it stil appears like a niche focus in the community and none of the major EA and EA-adjacent orgs (including the one I work for, though I am writing this in a personal capacity) seem to have taken it up as a serious focus and I worry it might be due to baked-in assumptions about the relative value of such work that are outdated in a time where the importance of systemic work has changed in the face of greater threat and fluidity. When the world seems to

Open Philanthropy: Our Progress in 2024 and Plans for 2025

Open Philanthropy

· 5d ago · 2m read

2024 marked 10 years since we launched Open Philanthropy. We spent our first decade learning (about grantmaking, cause selection, and the history of philanthropy), and growing our team and expertise to be able to effectively deploy billions of dollars from Good Ventures, our main funder. Our early grants — and some grantees we’ve helped get started — are now old enough that we can see material signs of our impact in the world. The start of our second decade also marked a major change in our direction. With Good Ventures approaching the level of spending consistent with its founders’ ambition to spend down in their lifetimes, we finally began to execute at scale on our long-held ambition to support other funders, and found a surprising degree of early success. I expect that our ambition to serve additional partners will guide much of our second decade. A few highlights from the year: * We launched the Lead Exposure Action Fund (LEAF), a >$100 million collaborative fund to reduce lead exposure globally. LEAF marked our first major foray into partnering with other funders beyond Good Ventures, and we’re planning to do a lot more in this vein going forward — more below. * Our longtime grantee David Baker won the Nobel Prize in Chemistry for his groundbreaking work using AI for protein design. We’re proud to have supported both the basic methods development and the potentially high-impact humanitarian applications of his work for ailments like syphilis, hepatitis C, snakebite, and malaria. * Our grantee Open New York played an important role in the recent passage of New York City’s largest zoning overhaul in over 60 years. The city planning department expects the package to create 80,000 new homes over 15 years, making this the first set of major YIMBY reforms to pass in New York City. * Research mentorship programs that we fund continue to produce some of the top technical talent in AI safety and security. Graduates of programs like MATS, the Astra Fellowship, LA

Recent opportunities in AI safety

Quick nudge to apply to the LTFF grant round (closing on Saturday)

calebp

· 25d ago · 1m read

The GDM AGI Safety+Alignment Team is Hiring for Applied Interpretability Research

2 authors

· 14d ago · 8m read

[Apply] What I Love About AI Safety Fieldbuilding at Cambridge (& We're Hiring for a Leadership Role)

Harrison Gietz

· 25d ago · 3m read