2023 was a massive year in AI. What updates did you make? This includes timelines, likelihood of various risks and/or alignment plans/strategies.

30

0
0

Reactions

0
0
Comments5


Sorted by Click to highlight new comments since:

Risks/P-doom:
• P(doom) went down as a result of the dramatic shift in the Overton window. I'd say a moderate, but not massive update, because timelines are looking shorter as well.
• No longer worried about the possibility that we (the AI Safety community) are all essentially a bunch "cranks" given the support of Geoffrey Hinton and Yoshua Bengio have voiced towards the importance of addressing these concerns.
• Updated towards being more worried about risks such as AI-supported bioterrorism, cyber-attacks and manipulation. There's been results that make me feel like we're on the verge of this becoming a thing and "slow"-takeoff is looking more likely, so there would be a greater cost to just tanking these issues. I still see alignment as the most important issue to focus on.
• In terms of outer alignment: Due to ChatGPT, much more optimistic about training an AI to behave reasonably in normal situations using RLHF; also more optimistic about using such techniques to tell AI's to behave conservatively in weird philosophical thought experiments. My main remaining worry related to outer alignment is figuring out corrigibility lest we produce an AI that works well in the current context, but can't adapt to new circumstances.

Timelines:
• Long-timeline worlds feel less likely. Even though some of the capabilities of ChatGPT/GPT 4 are not that surprising to people who were following capability progress closely, the longer things continue as they are, the less time there is for an unexpected slow-down to occur before we hit AGI.
• More optimistic about evals work delivering value, largely due to the openness of governments and companies to evals work.

Governance:
• Far more optimistic about policy than before due to the opening up of the Overton Window.
• More complicated feelings on a pause as a result of the AI Pause Debate in terms of now understanding that the logistics of making a pause work net-positive would be much more complicated than I first realised. I think that pushing for the pause to be part of the public conversation/one of the options considered, while not completely risk-free, is a pretty strong bet.
• Became a huge fan of the Tony Blair Institute's proposal for the UK to create an organisation called sentinel which would perform research in order to help figure out AI policy.
• More worried about the threat of open-source AI given how fast it is catching up with GPT4 and Facebook's decision to champion open-source.

Groups:
• Less confident in Sam Altman's leadership of OpenAI.
• More worried about e/acc vs. previously thinking that they were so unimportant that we should just ignore them vs. risking amplifying their profile.
• More optimistic about allying with people concerned about near-termist risks where our interests align (largely due to the impact of the FLI letter).

Technical alignment:
• More optimistic about the value of empirical alignment research and less optimistic about the value of agent foundations research.
• I now feel the field is mature enough that "workhorse" researchers can make a significant contribution (vs. before when creativity to discover new research directions seemed more vital).
• More optimistic about approaches that take advantage of the linearity in neural networks.
• More optimistic on interpretability progress (due to a number of results incl. dictionary learning resolving super-position).
• I spent a lot of time this year trying to read up about as many alignment proposals as possible. I now think it would have been better for me to have spent less time doing this and to have spent more time focusing on doing concrete work.

Movement-building:
• Movement-building work to grow the pool of applicants to programs like SERI MATS seems less important because these programs are much more competitive these days. May be better to attempt to increase mentorship opportunities or to focus on people who are more experienced in terms of research or AI.

Looks like outer alignment is actually more difficult than I thought. Sherjil Ozair, a former Deepmind employee writes:

"From my experience doing early RLHF work for Gemini, larger models exploit the reward model more. You need to constantly keep collecting more preferences and retraining reward models to make it not exploitable. Otherwise you get nonsensical responses which have exploited the idiosyncracy of your preferences data. There is a reason few labs have done RLHF successfully"

In other words, even though we look at things like ChatGPT and go, "Wow, this is surprisingly aligned, I guess alignment is easier than we thought", we don't see all of the hard work that had to go into making it aligned. And perhaps as AI's become more powerful the amount of work required to align it will exceed what is humanly possible.

May I ask what your feelings on a pause were beforehand?

I think I was likely 90% in favour of a 6 month pause mostly as a way to wake people up. I guess my main update from that debate was the difficulty of actually implementing a pause.

If you decide to start pushing for a pause, you don't get nuanced control over when the pause occurs (you likely have to start pushing for it at least a couple of years ahead of when you want it to occur). Further, it's quite likely that you accidentally reduce the amount of crunchtime by reducing the gap between the leading players and the rest. If this happens, a pause would likely be net-negative.

For an indefinite pause, it's unclear that you'll be able to unpause when necessary to avoid someone else front-running you, particularly because you might have to make alliances with people who will want to keep it paused.

So while it may still be worth pausing, it's very hard to get the details right so that it is robustly net-positive.

My p(doom) went down slightly (From around 30% to around 25%) mainly as a result of how GPT-4 caused governments to begin taking AI seriously in a way I didn't predict. My timelines haven't changed - the only capability increase of GPT-4 that really surprised me was its multimodal nature. (Thus, governments waking up to this was a double surprise, because it clearly surprised them in a way that it didn't surprise me!)

I'm also less worried about misalignment and more worried about misuse when it comes to the next five years, due to how LLM"s appear to behave. It seems that LLM's aren't particularly agentic by default, but can certainly be induced to perform agent-like behaviour - GPT-4's inability to do this well seems to be a capability issue that I expect to be resolved in a generation or two. Thus, I'm less worried about the training of GPT-N but still worried about the deployment of GPT-N. It makes me put more credence in the slow takeoff scenario.

This also makes me much more uncertain about the merits of pausing in the short-term, like the next year or two. I expect that if our options were "Pause now" or "Pause after another year or two", the latter is better. In practice, I know the world doesn't work that way and slowing down AI now likely slows down the whole timeline, which complicates things. I still think that government efforts like the UK's AISI are net-positive (I'm joining them for a reason, after all) but I think a lot of the benefit to reducing x-risk here is building a mature field around AI policy and evaluations before we need it - if we wait until I think the threat of misaligned AI is imminent, that may be too late.

Curated and popular this week
 ·  · 32m read
 · 
Summary Immediate skin-to-skin contact (SSC) between mothers and newborns and early initiation of breastfeeding (EIBF) may play a significant and underappreciated role in reducing neonatal mortality. These practices are distinct in important ways from more broadly recognized (and clearly impactful) interventions like kangaroo care and exclusive breastfeeding, and they are recommended for both preterm and full-term infants. A large evidence base indicates that immediate SSC and EIBF substantially reduce neonatal mortality. Many randomized trials show that immediate SSC promotes EIBF, reduces episodes of low blood sugar, improves temperature regulation, and promotes cardiac and respiratory stability. All of these effects are linked to lower mortality, and the biological pathways between immediate SSC, EIBF, and reduced mortality are compelling. A meta-analysis of large observational studies found a 25% lower risk of mortality in infants who began breastfeeding within one hour of birth compared to initiation after one hour. These practices are attractive targets for intervention, and promoting them is effective. Immediate SSC and EIBF require no commodities, are under the direct influence of birth attendants, are time-bound to the first hour after birth, are consistent with international guidelines, and are appropriate for universal promotion. Their adoption is often low, but ceilings are demonstrably high: many low-and middle-income countries (LMICs) have rates of EIBF less than 30%, yet several have rates over 70%. Multiple studies find that health worker training and quality improvement activities dramatically increase rates of immediate SSC and EIBF. There do not appear to be any major actors focused specifically on promotion of universal immediate SSC and EIBF. By contrast, general breastfeeding promotion and essential newborn care training programs are relatively common. More research on cost-effectiveness is needed, but it appears promising. Limited existing
 ·  · 1m read
 · 
> Summary: We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months. Extrapolating this trend predicts that, in under a decade, we will see AI agents that can independently complete a large fraction of software tasks that currently take humans days or weeks. > > The length of tasks (measured by how long they take human professionals) that generalist frontier model agents can complete autonomously with 50% reliability has been doubling approximately every 7 months for the last 6 years. The shaded region represents 95% CI calculated by hierarchical bootstrap over task families, tasks, and task attempts. > > Full paper | Github repo Blogpost; tweet thread. 
 ·  · 2m read
 · 
For immediate release: April 1, 2025 OXFORD, UK — The Centre for Effective Altruism (CEA) announced today that it will no longer identify as an "Effective Altruism" organization.  "After careful consideration, we've determined that the most effective way to have a positive impact is to deny any association with Effective Altruism," said a CEA spokesperson. "Our mission remains unchanged: to use reason and evidence to do the most good. Which coincidentally was the definition of EA." The announcement mirrors a pattern of other organizations that have grown with EA support and frameworks and eventually distanced themselves from EA. CEA's statement clarified that it will continue to use the same methodologies, maintain the same team, and pursue identical goals. "We've found that not being associated with the movement we have spent years building gives us more flexibility to do exactly what we were already doing, just with better PR," the spokesperson explained. "It's like keeping all the benefits of a community while refusing to contribute to its future development or taking responsibility for its challenges. Win-win!" In a related announcement, CEA revealed plans to rename its annual EA Global conference to "Coincidental Gathering of Like-Minded Individuals Who Mysteriously All Know Each Other But Definitely Aren't Part of Any Specific Movement Conference 2025." When asked about concerns that this trend might be pulling up the ladder for future projects that also might benefit from the infrastructure of the effective altruist community, the spokesperson adjusted their "I Heart Consequentialism" tie and replied, "Future projects? I'm sorry, but focusing on long-term movement building would be very EA of us, and as we've clearly established, we're not that anymore." Industry analysts predict that by 2026, the only entities still identifying as "EA" will be three post-rationalist bloggers, a Discord server full of undergraduate philosophy majors, and one person at