Help us find pain points in AI safety

Esben Kran

TL;DR: We want to hear your personal pain points, click here for the survey (3+ min)

Examples of pain points are: "I feel like AI safety is too pessimistic", "AI safety is not focusing nearly enough on transparency and interpretability research" or any other professional or personal experience you think could be better in AI safety.

Additionally, we (Apart Research) hope you would like to book a problem interview meeting with me (U.S.) or with Jonathan (EU, jonathan@apartresearch.com). You can also write to me on Telegram, email me on esben@apartresearch.com!

🩹 Pain points in EA & AI safety

As far as we know, there has not been any major analysis of pain points present in EA and AI safety and we find it very valuable (see below). Therefore, our goal with this project is to have 40+ interviews and 100+ survey responses about pain points of community members.

From these interviews and responses, we will maintain a list of pain points that is updated as we receive more information. Additionally, after the goal metrics have been reached, we publish a post on the EA forum and LessWrong forum to summarize these pain points and how many people experience each. Lastly, we try to summarize positive thoughts about the community as well so we might be able to enhance things that already work.

In addition to summarizing the points, we will propose potential solutions and enhancements where we find it possible! We have already worked on defining several AI safety-aligned for-profit ideas and are currently working on a technical AI safety ideas platform that will be published soon.

🤔 Theory of impact

By compiling this list, we can identify points of impact in the community that enables us to do mesa-cause-prioritization, i.e. figure out which projects might provide the largest beneficial impact to AI safety work for organizations working on these issues. This becomes especially important given the urgent timelines.

The unique value of our project is that we target the EA / AI safety community and that we attempt to identify the pain points of the community before proposing solutions.

🦜 What will the interview look like?

By baseline, the interview will be very informal and open and focused on getting to the bottom of what your pain points look like. The time plan will look roughly like this during a ~30 minute call:

Introductions (3-5 minutes)
Demographics (1 minute)
- Age, occupation, etc.
- EA and AI safety experience
- Where are you coming from? (country, viewpoint, earlier career)
Identifying pain points (5-10 minutes)
- You describe the pain point you’re experiencing
- We think about if you might have more pain points together
Diving deeper (10-15 minutes)
- Debugging the actual pain point, i.e. “Why, why, why”
- Ranking these pain points
- Thinking about solutions
Identifying positive points (3-5 minutes)
- We talk about positive points in EA and AI safety communities
- Talk about how these plus points can be enhanced
Wrapping up (3-5 minutes)
- Connecting to have our communication channel open
- Asking you if we can contact you again in relation to this project
- Asking for referrals to people that you think might have pain points they would like to share – this might even be outside the community (pain points related to their non-participation in EA)

The list (stays updated)

Summarized in spreadsheet format here and you’re welcome to directly add any pain or plus points you do not already see. As mentioned above, we will also publish these results in a more comprehensive fashion as another post.

Pain points

Culture
- Culture of inaction for validation: There is a centralization of decision-making within EA where the community defers decisions to the thought leaders where most projects should just be started instead of delayed for validation.
- Missing a culture of celebration (culture of criticism): There is often a culture of criticizing something before being excited for its execution or development. This part is similar to "Culture of inaction". Additionally, when projects go well, there is rarely any unsolicited positive reaction from the community.
- LessWrong is significantly more negative than the EA forum: This is another issue of culture. There's a lot more judgement on LessWrong and a vibe of "you're not saying anything new" compared to the excitement and encouragement of EAF.
- Top EA is too focused on AI safety: People outside AI safety in EA feel left out that AI safety is such a massive focus while only accepting a small subset of skilled talent capital.
- AI safety is too far removed from AI capabilities research: Having a centralized community for AI safety research through the Alignment Forum and LessWrong is great but is subject to segregation from active research in capabilities that might 1) assist in improving AGI safety and 2) miss potential influence on the capabilities field.
- AI safety is generally pessimistic to work in: A bit like "Missing a culture for celebration", most people in AI safety have a pessimistic attitude to how much potential we have to deal with AI safety which can be seen as a net negative in the attempt to solve this problem since it excludes people.
Unclear definitions and constrained research thinking
- Definitions are unclear and the field lacks clarity as a result: Core researchers disagree on what the best ways to solve the alignment problems are and the difference in definitions do not help this problem.
- The words for slow and fast takeoff are misleading: Slow takeoff will lead to the fastest onset AGI while fast takeoff will probably be later.
- Formal definitions that are wrong are quite harmful: These mislead both future research and constrain our understanding of where we might need to target our efforts.
- AI safety research often jumps over crucial reasoning steps: There is a tendency to imagine a series of steps that lead to a failure case and then go deep into that failure case while ignoring possible assumption limitations in the previous steps. Also related to "Researchers seem too focused on single failure modes".
- Missing consensus about solution space in AI safety research: It is hard to navigate AI safety as an early career ML researcher because of the differing opinions on how impactful different strategies might be in AI safety research.
- No good arguments against alignment being a problem and nobody incentivized to have them: Most arguments against alignment being a problem have generally been dealt with by Yudkowsky and/or are just not sophisticated enough. Nobody interested in the question are actually incentivized to Red Team the AI safety community's alignment focus. The best example we have is Paul Christiano.
- Lack of consensus among AGI researchers: The field of AI safety works a lot on the problems of alignment and have short timelines while AGI capabilities researchers generally have much longer timelines.
- Field is dominated by MIRI theory: As the original forefront of AI safety research, MIRI's theoretical frameworks seem to dominate many AI safety researchers' perspectives on the field. This might be harmful for new ideas entering the field.
- Researchers seem too focused on single failure modes: There is a problem of not knowing how probable different failure modes are and current researchers seem to be very focused on quite specific failure modes. This plays together with "No good arguments against AI safety".
- No clear visualizations of how a slow takeoff will look from an X-risk perspective: We are currently missing clear perspectives on how a slow takeoff will look and put humanity at risk. CAIS is one attempt towards this.
- No clear connections between ELK and the rest of the field: We should work on showcasing how ELK can assist or inform our work on other concepts in AI safety.
Clarity and keeping up with AI safety research
- Missing a view of how far the field currently is: There is a general issue of keeping up with how far we are towards solving the alignment problem. Newer projects have been better at showcasing their value towards the solution but it is still an issue.
- Unclear what the future path looks like: Is it an insights problem? Can we see incremental improvement? It would be nice with more clarity on these and similar questions.
- Keeping up to date is hard: This is a general problem in research but would be ideal to work on in AI safety. Rob Miles is a good example for AI safety, Yannic Kilcher for AI capabilities, and Károly Zsolnai-Fehér for physics-based deep learning (PBDL).
- We don't have many good decompositions of problems: ELK is a good example but most problems in AI safety requires people to understand the framings in a holistic way that necessitates a lot of interdisciplinary research understanding. If we can come up with better decompositions of problems, this problem might be alleviated.
- Missing clear rewards for solving AI safety problems: There's many relatively clear problems in AI safety that are not emphasized in the community nor the incentive structures of AI safety research.
- Missing a Theory of Change in AI safety: The big organizations mostly do not have specific plans for how we can properly do work on AI safety, why it is important and in which ways we can think about it.
- Most EA forecasts are very fuzzy: Hard to weigh predictions and the predictions are quite disparate. Researchers don't agree and there's also not specific prediction markets about the decomposition of probabilities.
- Not many independent hopes for how to do AGI well: The field of AI safety has very few perspectives on how AGI can end up working out well for the world. Examples might be truthful LLMs, ELK and CAIS while most scopes seem to be quite narrow.
Career
- Instability of academic career: The academic career is generally unstable and does not allow for planning out your work-life balance nor long-term life decisions.
- Entry into AI safety is hard: The field of AI safety is on the surface relatively closed-off and restricted to a few specific institutions and groups.
- Steps between learning AI safety and doing AI safety: There is a gap between taking a basics in AGI safety course and working within AI safety, e.g. should I do projects, work at Google or do something completely different? What is the next step in the onboarding into AI safety?
Mentorship
- Lack of available supervision: Technical AI safety research (MIRI, ARC) requires mentorship to be aligned with the research.
- Missing feedback from the top researchers: There is a large need for good research taste and we might be able to get even more feedback from top researchers.
AI capabilities research and outreach
- It's very hard to not help capabilities research: Many of the contemporary and useful projects we do in AI safety research predicate on the strength of future models and need to simulate some sort of higher capability. This automatically incentivizes AI safety researchers to work implicitly on AI capabilities research.
- Aligned models need to be just as capable as unaligned models: For future systems to utilize aligned models, our conceptual work needs to end up with models that are inherently better. This relates to the "It's very hard to not help capabilities research" pain point.
- Missing scalable outreach: It is relatively easy to persuade people to join AI safety in 1-on-1s but this is not scalable. We need more ways to reliably get people into AI safety research.
- Relating to AI capabilities researchers: Too little done in reigning in current AI work or incentivize AI safety in AI capabilities. This both calls for more openness and more AI governance work.
Evaluation and current state of the field
- It's hard to evaluate how good our proposed solutions are: We present a lot of different models but there is not a clear relationship between them nor with our vision of how it might turn out well. E.g. Mark Xu mentioned at EAG that solving ELK might get us 20-25% towards solving the alignment problem more generally. These sorts of quantifications are few and far between.
- We are missing the tools to be able to evaluate current models: As it states, model evaluations are generally ad hoc or based off of datasets. We are missing these datasets for AI safety and/or even better tools for evaluations of alignment.
- Missing benchmarks and datasets: CV had ImageNet and MNIST, NLP has a hundred benchmarks, but AI safety only has very few. Creating benchmarks like TruthfulQA can be incredibly valuable.

Positive points

Creates data-driven impact on the world: It is rare to see major communities so focused on making decisions based on data and this creates a whole new opportunity for maximizing impact.
Allows for different thinking: The general thought process in EA and AI safety works with a relatively different thought process from the ones you normally see and this allows for new perspectives and interpretations of solutions.

💪🏽 You reached the end!

Awesome! If you want to help in studying the resulting data, come up with solutions or conduct interviews, join us in our Discord. And again:

Share your pain points in EA and AI safety with us through this survey (Google Forms) or book a calendly meeting with me (Google Meet). You can also write to me on Telegram, email (esben@apartresearch.com), or connect with me for EAG London and San Francisco through Swapcard.

What will Apart Research work on?

As mentioned, we share this list and an associated analysis and possible solution proposals for each with references to other posts where possible. These pain points will also inform our general work in AI safety and hopefully others' work as well.

Disclaimer: This was made with a prioritization on speed so we are very open to feedback in the comments or on the survey.

Jonathan Rystrom3y5

Super interesting stuff so far! It seems that quite a few of the worries (particularly in "Unclear definitions and constrained research thinking" and "clarity") seem to stem from AI safety currently being a pre-paradigmatic field. This might suggest that it would be particularly impactful to explore more than exploiting (though this depends on just how aggressive ones timelines are). It might also suggest that having a more positive "let's try out this funky idea and see where it leads" culture could be worth pursuing (to a higher degree than is being done currently). All and all, very nice to see pain points fleshed out in this way!

(Disclaimer: I do work for Apart Research with Esben, so please adjust for that excitement in your own assessment :))

Jay Bailey3y3

What level of involvement in AI Safety are you looking for as a minimum for people to:

A) Fill out the survey?
B) Sign up for an interview?

Personally, I'm trying to upskill into entering AI Safety but am not yet particularly involved in the community - it would be good to know whether me or people like me are part of the intended targets of this research, or if it is focused on people with a more significant investment/ties into the cause already.

Esben Kran3y1

We welcome anyone to answer the survey and people who would describe themselves as "associated to AI safety research" in any capacity.

Effective Altruism Forum
EA Forum