Hide table of contents

I think there are many questions whose answers would be useful for technical AGI safety research, but which will probably require expertise outside AI to answer. In this post I list 30 of them, divided into four categories. Feel free to get in touch if you’d like to discuss these questions and why I think they’re important in more detail. I personally think that making progress on the ones in the first category is particularly vital, and plausibly tractable for researchers from a wide range of academic backgrounds.

Studying and understanding safety problems

  1. How strong are the economic or technological pressures towards building very general AI systems, as opposed to narrow ones? How plausible is the CAIS model of advanced AI capabilities arising from the combination of many narrow services?
  2. What are the most compelling arguments for and against discontinuous versus continuous takeoffs? In particular, how should we think about the analogy from human evolution, and the scalability of intelligence with compute?
  3. What are the tasks via which narrow AI is most likely to have a destabilising impact on society? What might cyber crime look like when many important jobs have been automated?
  4. How plausible are safety concerns about economic dominance by influence-seeking agents, as well as structural loss of control scenarios? Can these be reformulated in terms of standard economic ideas, such as principal-agent problems and the effects of automation?
  5. How can we make the concepts of agency and goal-directed behaviour more specific and useful in the context of AI (e.g. building on Dennett’s work on the intentional stance)? How do they relate to intelligence and the ability to generalise across widely different domains?
  6. What are the strongest arguments that have been made about why advanced AI might pose an existential threat, stated as clearly as possible? How do the different claims relate to each other, and which inferences or assumptions are weakest?

Solving safety problems

  1. What techniques used in studying animal brains and behaviour will be most helpful for analysing AI systems and their behaviour, particularly with the goal of rendering them interpretable?
  2. What is the most important information about deployed AI that decision-makers will need to track, and how can we create interfaces which communicate this effectively, making it visible and salient?
  3. What are the most effective ways to gather huge numbers of human judgments about potential AI behaviour, and how can we ensure that such data is high-quality?
  4. How can we empirically test the debate and factored cognition hypotheses? How plausible are the assumptions about the decomposability of cognitive work via language which underlie debate and iterated distillation and amplification?
  5. How can we distinguish between AIs helping us better understand what we want and AIs changing what we want (both as individuals and as a civilisation)? How easy is the latter to do; and how easy is it for us to identify?
  6. Various questions in decision theory, logical uncertainty and game theory relevant to agent foundations.
  7. How can we create secure containment and supervision protocols to use on AI, which are also robust to external interference?
  8. What are the best communication channels for conveying goals to AI agents? In particular, which ones are most likely to incentivise optimisation of the goal specified through the channel, rather than modification of the communication channel itself?
  9. How closely linked is the human motivational system to our intellectual capabilities - to what extent does the orthogonality thesis apply to human-like brains? What can we learn from the range of variation in human motivational systems (e.g. induced by brain disorders)?
  10. What were the features of the human ancestral environment and evolutionary “training process” that contributed the most to our empathy and altruism? What are the analogues of these in our current AI training setups, and how can we increase them?
  11. What are the features of our current cultural environments that contribute the most to altruistic and cooperative behaviour, and how can we replicate these while training AI?

Forecasting AI

  1. What are the most likely pathways to AGI and the milestones and timelines involved?
  2. How do our best systems so far compare to animals and humans, both in terms of performance and in terms of brain size? What do we know from animals about how cognitive abilities scale with brain size, learning time, environmental complexity, etc?
  3. What are the economics and logistics of building microchips and datacenters? How will the availability of compute change under different demand scenarios?
  4. In what ways is AI usefully analogous or disanalogous to the industrial revolution; electricity; and nuclear weapons?
  5. How will the progression of narrow AI shape public and government opinions and narratives towards it, and how will that influence the directions of AI research?
  6. Which tasks will there be most economic pressure to automate, and how much money might realistically be involved? What are the biggest social or legal barriers to automation?
  7. What are the most salient features of the history of AI, and how should they affect our understanding of the field today?

Meta

  1. How can we best grow the field of AI safety? See OpenPhil’s notes on the topic.
  2. How can spread norms in favour of careful, robust testing and other safety measures in machine learning? What can we learn from other engineering disciplines with strict standards, such as aerospace engineering?
  3. How can we create infrastructure to improve our ability to accurately predict future development of AI? What are the bottlenecks facing tools like Foretold.io and Metaculus, and preventing effective prediction markets from existing?
  4. How can we best increase communication and coordination within the AI safety community? What are the major constraints that safety faces on sharing information (in particular ones which other fields don’t face), and how can we overcome them?
  5. What norms and institutions should the field of AI safety import from other disciplines? Are there predictable problems that we will face as a research community, or systemic biases which are making us overlook things?
  6. What are the biggest disagreements between safety researchers? What’s the distribution of opinions, and what are the key cruxes?

Particular thanks to Beth Barnes and a discussion group at the CHAI retreat for helping me compile this list.

Comments5


Sorted by Click to highlight new comments since:

For reference, some other lists of AI safety problems that can be tackled by non-AI people:

Luke Muehlhauser's big (but somewhat old) list: "How to study superintelligence strategy"

AI Impacts has made several lists of research problems

Wei Dai's, "Problems in AI Alignment that philosophers could potentially contribute to"

Kaj Sotala's case for the relevance of psychology/cog sci to AI safety (I would add that Ought is currently testing the feasibility of IDA/Debate by doing psychological research)

Thanks for this. I think posts of this type (which list options for people who want to work in a cause area) are valuable.

This post was awarded an EA Forum Prize; see the prize announcement for more details.

My notes on what I liked about the post, from the announcement:

To quote one commenter, “I think posts of this type (which list options for people who want to work in a cause area) are valuable”. I have a sense that fields of research are more likely to thrive when they can present scholars with interesting open problems, and Ngo takes the extra step of identifying problems that might appeal to people who might not otherwise consider working on AGI safety. This post is a good idea, executed well, and I don’t have much else to say — but I will note the abundant hyperlinks to sources inside and outside of EA.

Very nice list!

I added this to my list of lists of open problems in AI safety: https://bit.ly/AISafetyOpenProblems

Curated and popular this week
 ·  · 23m read
 · 
Or on the types of prioritization, their strengths, pitfalls, and how EA should balance them   The cause prioritization landscape in EA is changing. Prominent groups have shut down, others have been founded, and everyone is trying to figure out how to prepare for AI. This is the first in a series of posts examining the state of cause prioritization and proposing strategies for moving forward.   Executive Summary * Performing prioritization work has been one of the main tasks, and arguably achievements, of EA. * We highlight three types of prioritization: Cause Prioritization, Within-Cause (Intervention) Prioritization, and Cross-Cause (Intervention) Prioritization. * We ask how much of EA prioritization work falls in each of these categories: * Our estimates suggest that, for the organizations we investigated, the current split is 89% within-cause work, 2% cross-cause, and 9% cause prioritization. * We then explore strengths and potential pitfalls of each level: * Cause prioritization offers a big-picture view for identifying pressing problems but can fail to capture the practical nuances that often determine real-world success. * Within-cause prioritization focuses on a narrower set of interventions with deeper more specialised analysis but risks missing higher-impact alternatives elsewhere. * Cross-cause prioritization broadens the scope to find synergies and the potential for greater impact, yet demands complex assumptions and compromises on measurement. * See the Summary Table below to view the considerations. * We encourage reflection and future work on what the best ways of prioritizing are and how EA should allocate resources between the three types. * With this in mind, we outline eight cruxes that sketch what factors could favor some types over others. * We also suggest some potential next steps aimed at refining our approach to prioritization by exploring variance, value of information, tractability, and the
 ·  · 5m read
 · 
[Cross-posted from my Substack here] If you spend time with people trying to change the world, you’ll come to an interesting conundrum: Various advocacy groups reference previous successful social movements as to why their chosen strategy is the most important one. Yet, these groups often follow wildly different strategies from each other to achieve social change. So, which one of them is right? The answer is all of them and none of them. This is because many people use research and historical movements to justify their pre-existing beliefs about how social change happens. Simply, you can find a case study to fit most plausible theories of how social change happens. For example, the groups might say: * Repeated nonviolent disruption is the key to social change, citing the Freedom Riders from the civil rights Movement or Act Up! from the gay rights movement. * Technological progress is what drives improvements in the human condition if you consider the development of the contraceptive pill funded by Katharine McCormick. * Organising and base-building is how change happens, as inspired by Ella Baker, the NAACP or Cesar Chavez from the United Workers Movement. * Insider advocacy is the real secret of social movements – look no further than how influential the Leadership Conference on Civil Rights was in passing the Civil Rights Acts of 1960 & 1964. * Democratic participation is the backbone of social change – just look at how Ireland lifted a ban on abortion via a Citizen’s Assembly. * And so on… To paint this picture, we can see this in action below: Source: Just Stop Oil which focuses on…civil resistance and disruption Source: The Civic Power Fund which focuses on… local organising What do we take away from all this? In my mind, a few key things: 1. Many different approaches have worked in changing the world so we should be humble and not assume we are doing The Most Important Thing 2. The case studies we focus on are likely confirmation bias, where
 ·  · 1m read
 · 
I wanted to share a small but important challenge I've encountered as a student engaging with Effective Altruism from a lower-income country (Nigeria), and invite thoughts or suggestions from the community. Recently, I tried to make a one-time donation to one of the EA-aligned charities listed on the Giving What We Can platform. However, I discovered that I could not donate an amount less than $5. While this might seem like a minor limit for many, for someone like me — a student without a steady income or job, $5 is a significant amount. To provide some context: According to Numbeo, the average monthly income of a Nigerian worker is around $130–$150, and students often rely on even less — sometimes just $20–$50 per month for all expenses. For many students here, having $5 "lying around" isn't common at all; it could represent a week's worth of meals or transportation. I personally want to make small, one-time donations whenever I can, rather than commit to a recurring pledge like the 10% Giving What We Can pledge, which isn't feasible for me right now. I also want to encourage members of my local EA group, who are in similar financial situations, to practice giving through small but meaningful donations. In light of this, I would like to: * Recommend that Giving What We Can (and similar platforms) consider allowing smaller minimum donation amounts to make giving more accessible to students and people in lower-income countries. * Suggest that more organizations be added to the platform, to give donors a wider range of causes they can support with their small contributions. Uncertainties: * Are there alternative platforms or methods that allow very small one-time donations to EA-aligned charities? * Is there a reason behind the $5 minimum that I'm unaware of, and could it be adjusted to be more inclusive? I strongly believe that cultivating a habit of giving, even with small amounts, helps build a long-term culture of altruism — and it would