Important, actionable research questions for the most important century

Holden Karnofsky

Intro

To a significant degree, I see progress on the cause I consider highest-priority (essentially, making the best of the most important century) as bottlenecked by high-quality research.

There are a number of questions such that well-argued, best-guess answers could help us get a lot more done (including productively spending a lot more money) to improve humanity’s prospects. Examples would be “How hard should we expect the AI alignment problem to be?” and “What will be the first transformative application of AI?” - more below.

Unfortunately, I consider most of these questions extremely hard to investigate productively. They’re vague, open-ended, and generally intimidating. I don’t think they are for everyone - but I think that people with the ability to make progress on them should seriously consider working on them.

In particular, I hope with this post to dispel what I see as a few misconceptions common in the EA community:

Misconception 1: there are plenty of EAs thinking about cerebral topics like these, what we need is more [money/power/connections at AI labs/connections in government/communications/something else].
- I think there are very few people making focused attempts at progress on the below questions. Many institutions that are widely believed to be interested in these questions have constraints, cultures and/or styles that I think make it impractical to tackle the most important versions of them, and I’d overall estimate that there are fewer than 20 people employed by all multi-person organizations combined¹ who are focused full-time on answering questions like these.
- Until we see more progress, I expect that we will continue to have relatively few tangible, well-targeted ways to deploy money/power/politicking/communications/whatever towards improving the odds of transformative AI going well. (I do think it’s valuable to build up such things anyway, in the hopes that we’ll have more strategic clarity later on. But we really need that strategic clarity!)
Misconception 2: questions about AI alignment are only for people with serious backgrounds in AI and/or AI alignment. I think the field of AI alignment - especially on the theory side - is incredibly nascent right now. I think a highly talented, dedicated generalist could become one of the world’s 25 most broadly knowledgeable people on the subject (in the sense of understanding a number of different agendas and arguments that are out there, rather than focusing on one particular line of research), from a standing start (no background in AI, AI alignment or computer science), within a year.²
Misconception 3: the best thing for a creative, brilliant person to think about is “unknown unknowns,” “Cause X” and/or “crucial considerations” that address topics that aren’t remotely on anyone’s radar today. I think the questions I’ll be listing below are at least as interesting and challenging as hunting for such things, and more important in expectation.

Below, I will:

List a number of (IMO) important, actionable research questions, with notes on how they could be important. In the main body of the piece, I give a high-level summary. Appendix 1 gives more detail on each, including:
- A bit more on how progress on the question could affect key decision-makers’ actions and lead to improved odds of transformative AI going well.
- A concrete example or two of how I imagine starting to make progress on the question. This is not meant to imply that the strategy I outline is the best one - just to help people who have trouble thinking about “what it would even mean to work on this question” to visualize one example. In many cases, I think it could be better to follow a vaguer workflow than what I outline, in which one follows one’s intuitions to important considerations, and writes up the occasional “argument about a major consideration” piece rather than doing a comprehensive analysis. But I still think it’s helpful to be able to visualize one concrete approach to working on each question, so I provide that.
- What I know about who’s working in the relevant area(s) today. This relates to Misconception 1 above.
After the high-level outline of research questions, I’ll address the topic: “How do I know whether I have potential to make progress on these questions?” I think it’s pretty straightforward to get a sense for this, and that most such questions don’t require an extensive background in a particular subject. This relates to Misconception 2 above.
Give some thoughts on working on these questions vs. thinking about “unknown unknowns,” “Cause X” and/or “crucial considerations.” This relates to Misconception 3 above.
Appendix 1 has more detailed presentations of each research question; Appendix 2 outlines what I think it would look like to get up to speed on AI alignment and become one of the world’s most knowledgeable people on the topic.

In the future, I hope to:

Explore the possibility of Open Philanthropy support and/or prizes for working on these questions.
Put out some general thoughts and tips on the process of working on questions like these. (Learning by Writing is the first piece in this category.)

A high-level list of important, actionable questions for the most important century

This is a high-level list of the questions that seem most important and actionable to me. There’s more detail on how these could be relevant, what examples of working on them might look like, and who works on them today in Appendix 1.

Questions about AI alignment (more)

I would characterize most AI alignment research as being something like: “Pushing forward a particular line of research and/or set of questions; following one’s intuitions about what’s worth working on.” I think this is enormously valuable work, but for purposes of this post, I’m talking about something distinct: understanding the motivations, pros and cons of a variety of approaches to AI alignment, with the aim of gaining strategic clarity and/or changing how talent and resources are allocated.

To work on any of the below questions, I think the first step is gaining that background knowledge. I give thoughts on how to do so (and how much of an investment it would be) in Appendix 2.

How difficult should we expect AI alignment to be? In this post from the Most Important Century series, I argue that this broad sort of question is of central strategic importance.

If we had good arguments that alignment will be very hard and require “heroic coordination,” the EA funders and the EA community could focus on spreading these arguments and pushing for coordination/cooperation measures. I think a huge amount of talent and money could be well-used on persuasion alone, if we had a message here that we were confident ought to be spread far and wide.
If we had good arguments that it won’t be, we could focus more on speeding/boosting the countries, labs and/or people that seem likely to make wise decisions about deploying transformative AI. I think a huge amount of talent and money could be directed toward speeding AI development in particular places.

More detail on what it would look like to work on this sort of question, how it could matter, and who’s working on it today

What experimental results could give us important updates about the likely difficulty of AI alignment? If we could articulate particular experiments whose results would be informative, we could:

Try to get actual experiments run along these lines. I’d expect this would take quite a bit of iteration and potentially a lot of money, but it would be well worth it.
To the extent that these experiments couldn’t be run yet (e.g., because AI models aren’t generally capable enough yet), we could pour effort into obtaining high-quality forecasts of the results, via tools such as Metaculus and Hypermind and Good Judgment. I am excited in the abstract about forecasting as a tool for predicting the future, but right now it’s hard to apply it to any of the questions about the future I most care about; if we could make headway on having tangible, clairvoyant questions for forecasters, I think it could unlock a lot of exciting projects.
Either way, use the results to get major updates on difficulty of alignment, and pour money and talent into disseminating these as in the previous section.

More detail on what it would look like to work on this sort of question, how it could matter, and who’s working on it today

What relatively well-scoped research activities are particularly likely to be useful for longtermism-oriented AI alignment?

I think there is very little AI alignment research today that has both of the following properties: (a) it’s likely to be relevant for the hardest and most important parts of the problem; (b) it’s also the sort of thing that researchers can get up to speed on and contribute to relatively straightforwardly (without having to take on an unusual worldview, match other researchers’ unarticulated intuitions, etc.)

Working on this question could mean arguing that a particular AI alignment agenda has both properties, or coming up with a new way of thinking about AI alignment that offers research with both properties. Anything we can clearly identify as having these properties unlocks the potential to pour money and talent toward a relatively straightforward (but valuable) research goal - via prizes, grant programs, fellowships, conditional investments in AI companies (though I think today’s leading AI labs would be excited to do more of this work without needing any special incentive), etc.

More detail on what it would look like to work on this sort of question, how it could matter, and who’s working on it today

What’s an AI alignment result or product that would make sense to offer a $1 billion prize for?

I think longtermist funders would be excited to fund and launch such a prize if it were well-designed. I’d expect scoping the prize (to prevent spurious wins but also give maximum clarity as to the goals), promoting the prize, giving guidance to entrants, and judging entries to be a lot of work, so I’d be most excited to do it with a backdrop of having done the hard intellectual work to figure out what’s worth rewarding.

More detail on what it would look like to work on this sort of question, how it could matter, and who’s working on it today

Questions about AI strategy (more)

These could require an interesting mix of philosophical reasoning and more “worldly” reasoning about institutions, geopolitics, etc. I think the ideal researcher would also be highly informed on, and comfortable with, the general state of AI research and AI alignment research, though they need not be as informed on these as for the previous section.

How should we value various possible long-run outcomes relative to each other? E.g., how should we value “utopia” (a nearly optimal outcome) vs. “dystopia” (an outcome nearly as bad as possible) vs. “paperclipping” (a world run by misaligned AI) vs. more middling outcomes?

Most of the thinking I’ve seen on this topic to date has a general flavor: “I personally am all-in on some ethical system that says the odds of utopia [or, in some cases, dystopia] are approximately all that matters; others may disagree, which simply means we have different goals.” I think we can do better than that, as elaborated in the relevant section of Appendix 1.

This is a sprawling topic with a lot of potential applications. Some examples:

A compelling “win-win” set of valuations - such that people with different values could all benefit from acting on a single set of valuations (ala moral trade) - could increase coordination and trust among the people who are both (a) interested in philosophical rigor and (b) focused on helping the most important century go as well as possible. I believe this could make a difference comparable to the “unlocking huge amounts of money and talent” ideas pointed at in previous sections.
This question could be an important factor in many of the same tough calls listed under “How difficult should we expect AI alignment to be?” (such as whether to prioritize communicating about the importance of the alignment problem vs. boosting AI development in particular places). Insights about how we should value “paperclipping” vs. other outcomes could be as useful as insights about how likely we should consider paperclipping to be.
Insights on this topic could also have more granular impacts on what sort of government, lab, etc. we should be hoping will lead the way in developing transformative AI, which could in turn unlock money and talent for making that sort of outcome more likely.

I think that reasoning about moral uncertainty and acausal trade could be important here.

More detail on what it would look like to work on this sort of question, how it could matter, and who’s working on it today

How should we value various possible medium-run outcomes relative to each other? E.g., how much should one value “transformative AI is first developed in country A” vs. “transformative AI is first developed in country B”, or “transformative AI is first developed by company A vs. company B”, or “transformative AI is developed 5 years sooner/later than it would have been otherwise?”

If we were ready to make a bet on any particular intermediate outcome in this category being significantly net positive for the expected value of the long-run future, this could unlock a major push toward making that outcome more likely. I’d guess that many of these sorts of “intermediate outcomes” are such that one could spend billions of dollars productively toward increasing the odds of achieving them, but first one would want to feel that doing so was at least a somewhat robustly good bet.³

More detail on what it would look like to work on this sort of question, how it could matter, and who’s working on it today

What does a “realistic best case transition to transformative AI” look like?

I think that major AI labs with aspirations toward transformative AI want to “do the right thing” if they develop it and are able to align it, but currently have very little to say about what this would mean. They also seem to make pessimistic assumptions about what others would do if they developed transformative AI (even assuming it was aligned).

I think there’s a big vacuum when it comes to well-thought-through visions of what a good outcome could look like, and such a vision could quickly receive wide endorsement from AI labs (and, potentially, from key people in government). I think such an outcome would be easily worth billions of dollars of longtermist capital.

More detail on what it would look like to work on this sort of question, how it could matter, and who’s working on it today

How do we hope an AI lab - or government - would handle various hypothetical situations in which they are nearing the development of transformative AI, and what does that mean for what they should be doing today?

Luke Muehlhauser and I sometimes refer to this general sort of question as the “AI deployment problem”: the question of how and when to build and deploy powerful AI systems, under conditions of uncertainty about how safe they are and how close others are to deploying powerful AI of their own.

My guess is that thinking through questions in this category can shed light on important, non-obvious actions that both AI labs and governments should be taking to make these sorts of future scenarios less daunting. This could, in turn, unlock interventions to encourage these actions.

More detail on what it would look like to work on this sort of question, how it could matter, and who’s working on it today

Questions about AI “takeoff dynamics” (more)

I think these are especially well-suited to people with economics-ish interests.

What are the most likely early super-significant applications of AI? It seems possible that AI applied in some narrow domain - such as chemical modeling, persuasion or law enforcement and the military - will be super-significant, and will massively change the world and the strategic picture before highly general AI is developed. I don’t think longtermists have done much to imagine how such developments could change key strategic considerations around transformative AI, and what we could be doing today to get ahead of such possibilities.

If we could identify a few areas that seem particularly likely to see huge impact from AI advances, this could significantly affect a number of other strategic considerations, as well as highlighting some additional ways for longtermists to have impact (e.g., by working in key industries, understanding them well and getting ahead of key potential AI-driven challenges).

More detail on what it would look like to work on this sort of question, how it could matter, and who’s working on it today

To what extent should we expect a “fast” vs. “slow” takeoff? There are a few ways to think about what this means and why it matters; one abbreviation might be “We want to know whether we should expect the massive importance and key challenges of AI to be clear common knowledge while there is still significant time to for people to work toward solutions, or whether we should expect developments that largely ‘take the world by surprise’ and are conducive to extreme power imbalances.”

I think this question feeds importantly into a number of questions about strategy, particularly about (from the previous section) what medium-run outcomes we should value and what sorts of things labs and governments should be prepared to do. Meaningful updates on likely takeoff dynamics could end up steering a lot of money and talent away from some interventions and towards others.

More detail on what it would look like to work on this sort of question, how it could matter, and who’s working on it today

How should longtermist funders change their investment portfolios? There are a number of ways in which a longtermist funder should arguably diverge from standard best practices in investing. These could include having a different attitude towards risk (which depends crucially on the question of how fast money becomes less valuable as there is more of it), doing “mission hedging” (e.g., investing in companies that are likely to do particularly well if transformative AI is developed sooner than expected), and betting on key views about the future.

A well-argued case for making particular kinds of investments would likely be influential for major longtermist funders collectively accounting for tens of billions of dollars in capital. On a 10-year time frame, an investment change that causes an extra percentage point of returns per year could easily be worth over $1 billion.

To the extent that other questions covered in this piece feed into investment decisions, this increases those questions’ action-relevance and potential impact.

More detail on what it would look like to work on this sort of question, how it could matter, and who’s working on it today

How to know whether you can do this work

I think the vast majority of people aren’t a fit for the sort of work outlined in this post. But I don’t think the main blocker is experience, and I fear that some of the few people who could be a fit are “ruling themselves out” based on takes like “I don’t have outlier mathematical ability.” or “I’ve never thought about AI or AI alignment before, and it would take me too long to catch up.”

While I think it’s extraordinarily hard to make progress on the sort of questions listed above, I think it’s pretty straightforward (and not inordinately time-consuming) to explore whether one might be able to make progress. Here’s roughly what I have in mind, for assessing your fit for working on a particular question along the lines above (this assumes you are able to get enough time freed up; as I noted before, Open Philanthropy may offer support for this in the future):

Read a question (I suggest the more detailed versions in Appendix 1) and try to picture yourself working on it. If you had read enough to have basic background on existing approaches to the question, and you had a full day off tomorrow, can you imagine yourself getting up in the morning, sitting down to work on the question, and working on the question for most of the day? Or do you find yourself thinking “I just have no idea what steps I would even take, the whole thing feels bizarre and impossible to picture?” If the latter, it seems reasonable to stop there.
Next, “get up to speed” on relevant topics (e.g., AI alignment) such that you feel you have enough basic background that you can get yourself to start thinking/writing about the question directly. I don’t think this means getting “fully up to speed” or anything like it, and it definitely doesn’t mean reading everything (or even most things) relevant to the key topics. The goal here is to “get started” with a highly imperfect approach, not to be fully informed. If you find “getting up to speed” aversive, feel you’ve made no progress in several tries on understanding key readings, or find yourself pouring more and more time into readings without seeming to get closer to feeling ready to start on #3, I think it’s reasonable to stop there.
Free up a day to work on the question. Do you put in at least two hours of solid, focused work during that day, such that you feel like you ended the day a little further along than you started? If not, and another couple of tries yield the same results, it seems reasonable to stop there.
Find a way to free up four weeks to mostly devote to the question (this part is more challenging and risky from a logistical and funding standpoint, and we’re interested in hearing from people who need help going from #3 to #4). During that time, do you put in at least, say, 60⁴ hours of solid, focused, forward-moving work on the question? Are you starting to think, “This is work it feels like I can do, and I think I’m making progress?” If not, it seems reasonable to stop there.
At this point I think it’s worth trying to find a way to spend several months on the question. I think it’s reasonable to aim for “a rough draft of something that could go on the EA Forum” - it doesn’t need to be a full answer, but should be some nontrivial point you feel you can argue for that seems useful - within a few months.
From there, I think feedback from others in the community can give you some evidence about your potential fit.

I expect most people to tap out somewhere in the #1-#3 zone, and I don’t think any of those steps require a massive investment of time or leaving one’s job. (#2 is the most time-consuming of the steps, but doesn’t require dedicated blocks of time and is probably useful whether or not things end up working out.) I think it makes more sense to assess one’s fit via steps like these than via making assumptions about needed credentials and experience.

(One might want to explore their fit for several questions before tapping out on the whole endeavor.)

I say more about the general process of working on questions like these in Learning by Writing, and I plan to write more on this topic soon.

Comparison with seeking “crucial considerations” / “Cause X” / “unknown unknowns”

I think working on the questions above loads heavily on things like creativity, independent thinking, mental flexibility, ability to rescope and reimagine a question as one goes, etc. I think one needs a great deal of these qualities to make any progress, and it seems like the sky’s the limit in terms of how much these qualities could help one come up with crucial, game-changing angles on the questions above.

I want to address an impression I think many in the EA community have, which is that the people who stand out most on these properties (creativity, independent thinking, etc.) should not be working on questions like the above, but instead should be trying to identify questions and topics that currently aren’t on anyone’s radar at all. For example:

Finding Cause X, a cause “more important than all the causes currently prioritized” by the community (presumably including the cause of preventing existential catastrophe via transformative AI).
Identifying all-new crucial considerations that cause us to completely lose confidence in all previous reasoning, including reasoning about which questions were most important to answer.

It’s of course possible that looking for such things is the highest-value activity (if done well); there’s basically no way to rule this out. However, I want to note that:

The effective altruism community’s top causes and “crucial considerations” seem to have (exclusively?) been identified very early.

I think a lot of effective altruists (including myself) have an experience something like this: “Wow, the idea of doing as much good as possible is so underrated! … Wow, instead of trying to do this in my home country, I can do so much more good by taking into account the much worse level of poverty overseas! … Wow, but maybe animal welfare presents even bigger opportunities than this! … Wow, but maybe existential risk presents bigger opportunities still!” The experience is one of finding one crucial consideration after another, each of which feels extraordinarily valuable and revolutionary, such that it feels logical that there are many more to come.
But in fact, I believe that all of these points were identified in the very early “proto-effective-altruism” days or before. For example, Astronomical Waste is from 2003 - 3 years before Overcoming Bias started (which I tend to identify with the start of the “rationalist” community), 4 years before GiveWell was founded, 6 years before Giving What We Can and The Life You Can Save, and 8 years before effective altruism had a name.

There are some reasons to think that future “revolutionary crucial considerations” will be much harder to find, if they exist.

As far as I can tell, there haven’t been any revolutionary “crucial considerations” - or new longtermist causes that look highly competitive with the most popular ones in the EA community - in over a decade (even though all of the people who came up with the original ones are still around to come up with more!) I think there’s reason to think the low-hanging fruit has been picked.
Furthermore, if you accept the premise that this century is more likely than not to see transformative AI, there’s some serious reason to doubt that we’re going to come up with something more important to work on. For nearly any problem one can come up with - perhaps most of the good to be done is reducing the “suffering of electrons,” or in understanding a simulation we’re in - “let’s make sure we don’t end up with a world run by misaligned AI, and are instead able to use transformative AI to build a more powerful, wiser civilization that is able to tackle this problem more effectively” seems like a pretty good route.
- Analogy: at most moments in your life, there’s a very broad, open-ended space of actions you might take and goals you might have, and it’s very unlikely that you’re taking the best possible action or orienting toward the best possible goal. But if you’re in a life-threatening situation, escaping with your life is probably just the right goal, no matter what new insights lie ahead.

Working on the above questions might be a promising route to identifying new crucial considerations, anyway. For example, this question presents an opportunity to reason about anthropics and acausal trade with a tangible purpose in mind; questions about AI “takeoff dynamics” could change our picture of the next few decades in a way somewhat analogous to (though less dramatic than) thinking of transformative AI in the first place.

I certainly don’t intend to say that the hunt for “crucial considerations” or “Cause X” isn’t valuable. But my overall guess is that at least some of the above questions have higher value (in expectation), while being comparably challenging and neglected.

Comparison with more incremental work

I think there are a number of people today who aspire to clarify the strategic situation for the most important century (in a similar spirit to this post), but prefer a strategy of working on “bite-sized chunks” of research, rather than trying to directly tackle crucial but overwhelming questions like the ones above.

They might write a report entirely on some particular historical case study, or clarification of terms, or relatively narrow subquestion; the report isn’t intended or designed to cause a significant update of the kind that would unlock a lot of money and talent toward some goal, but rather to serve as one potential building block for others toward such an update.

I think this approach is completely reasonable, especially for purposes of getting practice with doing investigations and reports. But I think there is also something to be said for directly tackling the question you most want the all-things-considered answer to (or at least a significant update on). I think the latter is more conducive to skipping rabbit holes that aren’t necessary for the big-picture goal, and the skill of skipping such rabbit holes and focusing on what might update us is (IMO) one of the most crucial ones to get practice with.

Furthermore, a big-picture report that skips a lot of steps and has a lot of imperfections can help create more opportunities to do narrower work that fits in in a clear way. For example, the biological anchors report goes right at the question of transformative AI timelines rather than simply addressing a piece of the picture (e.g., trends in compute costs or thoughts on animal cognitive capabilities) that might be relevant. A lot of the things it handles imperfectly, or badly, have since become the subject of other reports and debates. There’s plenty of room for debate on whether that report - broad, sweeping, directly hypothesizing an answer to the question we care about most, while giving only short treatment to many important subquestions - is a better use of time than something narrower and more focused; I personally feel it is, at least with the present strategic landscape as it stands.

The unusual sort of people we need

A long time ago, I was struck by this post by Nate Soares. At that time, I had a pretty low opinion of the rationality community and of MIRI ... and yet, it struck me as at least really *unusual* that someone would turn on a dime from a ten-year quest to reform the fundamentals of how governments are structured to, at least as far as I could tell at the time, focusing on donating to (and then being a researcher for) MIRI. In my experience, the kind of person who hatches and pursues their own vision like this is not the kind of person who then jumps on board with someone else's. In this respect at least, Nate seemed unusual.

I think that kind of unusualness is what we need more of right now. I think the kind of person who can make serious headway on the above questions is the kind of person who is naturally motivated by creating their own frameworks and narratives, and will naturally be drawn to hatching and promoting some worldview that is recognizably "their own" (such that it might e.g. be named after them) - rather than working on problems that neatly fit within a pretty significant and well-funded community's existing conventional wisdom.

And maybe there's more value in the former than in the latter - but that's not my best guess. My guess is that we need people who break that mold: people who very much have the potential to build a plausible worldview all their own, but choose not to because they instead want to do as much good as possible.

Notes

It’s harder to know what the situation looks like for people doing this work on their own. ↩
To be a bit more precise, I think that a year’s study could result in someone having a level of knowledge that would put them in the 25 most knowledgeable people today. Hopefully, by the time that year is up, the bar for being in the top 25 will be quite a bit higher, though. ↩
By this I mean not “has no downside” but rather “looks like a good bet based on the best reasoning, analysis and discussion that can reasonably be done.” I wouldn’t want to start down a path of spending billions of dollars on X while I still felt there were good extant arguments against doing so that hadn’t been well considered. ↩
For someone new to this work, I think of “four hours of focused work” as a pretty good day, and “two hours of focused work” as a lower-end day (though there are certainly days that go outside of this range in both directions). So here I’m assuming 4 weeks, 5 days per week, 3 hours per day on average. Some people might do something more like “4 hours a week for the first 3 weeks, then 7 hours a day for 7 straight days in the fourth week” and I’d consider that a success as well. ↩

MauFeb 24 202217

Thanks for this! For the more governance-oriented questions (specifically, the 2nd-4th questions under AI strategy, and the 1st question about takeoff dynamics), how useful do you (or others) think deep experience with relevant governance organizations is? I wonder what explains the apparent difference between the approach suggested by this post (which I read as not emphasizing gaining relevant experience, and instead suggesting "just start trying to figure this stuff out") and the approach suggested by this other post:

If you want to try this kind of work [contributing to research that may provide greater strategic clarity in the future, in the context of AI governance], in most cases I recommend that you [among other things] gain experience working in relevant parts of key governments and/or a top AI lab (ideally both) so that you acquire a detailed picture of the opportunities and constraints those actors operate with.

(Maybe it's that people can test their fit without much experience, but would get lots of value out of that experience for actually doing this work?)

Holden KarnofskyMar 31 20226

I think "people can test their fit without much experience, but would get lots of value out of that experience for actually doing this work" is pretty valid, though I'll also comment that I think there are diminishing returns to direct experience - I think getting some experience (or at least exposure, e.g. via conversation with insiders) is important, but I don't think one necessarily needs several years inside key institutions in order to be helpful on problems like these.

MauMar 3 202213

Edit: not sure how much I still like the following frame; it might lump together a handful of questions that are better thought of as distinct.

I'd tentatively suggest an additional question for the post's list of research questions (in the context of the idea that we may only get narrow/minimalist versions of alignment):

Assuming that transformative AI will be aligned, how good will the future be?

My (not very confident) sense is that opinions on this question are highly varied, and that it's another important strategic question. After all,

Some people seem to think that, if transformative AI will be aligned, then the future will be amazing.
- A common justification for this view seems to be: AI will be aligned to people/groups who on reflection would have good values (because most people/institutions have such values, or because people/groups with good values are on track to influence), and AI-assisted deliberation & coordination will be enough to bootstrap them from that starting point to an amazing future.
- If we had good arguments for this, the community could focus on alignment.
Some people seem to think that, even if transformative AI will be aligned, the future won't be all that amazing.
- Common justifications for this view seem to be: AI will be aligned to individuals or (coordinated) groups with lame or bad values, either because they are already on track to influence or because inadequate cooperation will erode value during or after the development of transformative AI.
- If we had good arguments for this, the community could dedicate a large fraction of its resources to addressing whatever may cause a future with aligned AI to not be great (e.g., by boosting certain organizational or individual actors, improving institutions, forming "cooperation-compatible" plans for using aligned AI, or otherwise improving cooperation).

MauMar 3 202216

Some existing work on these topics, as potential starting points for people interested in looking into this (updated March 11, 2022):

On (AI-assisted) reflection on values (a potential contributor to the future being good, given alignment):
- Decoupling deliberation from competition (Christiano, 2021)
- Ambitious vs. narrow value learning (Christiano, 2015)
- Work in (meta)ethics, moral psychology, and cultural/moral history
On the claim that agents with good values will, for theoretical reasons, exert disproportionate influence (a potential contributor to the future being good, given alignment):
- Why might the future be good? (Christiano, 2013)
- Work on moral trade also seems relevant here (since moral trade lets everyone have more influence on what they care more about).
On the claim that currently influential groups have good/lame/bad values (a potential contributor to the future being good or bad/lame, given alignment):
- This comment (Drexler, 2021)
- We're already in AI takeoff (Valentine, 2022)
- Work on the values, processes, and histories of relevant governments, companies, and (social, ideological, and political) movements
- One could have informal conversations to learn more about how much leverage various people/groups do or don't have in relevant groups/organizations
On value erosion through competition (a potential contributor to the future being bad/lame, even with alignment):
- "Value erosion through competition" section of a post (Dafoe, 2020)
- The four readings cited/linked in the above Dafoe post section
- What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs) (Critch, 2021) (see the comments for further discussion)
- Spreading happiness to the stars seems little harder than just spreading (Shulman, 2012) (see the comments for further discussion)
- Game theoretic work on cooperation and competition (?)
- Keeping an eye out for more work on this topic might be useful.
Additional material that seems relevant:
- Public choice theory and social choice theory (?)
- Technical alignment work also seems like important context for thinking about what AI aligned to a group/organization may be like.
Additional sources referenced in section 1.2 of the Global Priorities Institute's research agenda may also be relevant.
Several parts of the original post here and its appendices also seem relevant.

calebpAug 18 202210

If anyone ended up working on these questions as a result of this post, I would be interested in asking you a few questions about your experience, so far I haven't encountered many people who actually decided to put in substantial effort to tackling these questions but I have seen a lot of people who are supportive of others trying.

I am thinking about grantmaking programs that might support people trying out this kind of research, or encourage people to try it out.

You can message me on the forum or at caleb.parikh [at] centreforeffectivealtruism.org.

Iyngkarran KumarFeb 28 20234

I’m trying to get a better picture in my head of what good work looks like in this space - i.e, existing work that has given us improved strategic clarity. This could be with regards to TAI itself, or a technology such as nuclear weapons.

Imo, some examples of valuable strategic work/insights are:

Ajeya Cotra’s work on forecasting TAI via bioanchors
Niels Bohr (and other scientists) realising during the development of the nuclear bomb that an oligopoly on nuclear weapons could lead to relative peace between global superpowers. (https://www.governance.ai/research-paper/lessons-atomic-bomb-ord)

I’m curious as to any other examples of existing work that you think fit into the category of valuable strategic work, of the type that you talk about in this post.

Evan R. MurphyMar 15 20224

What relatively well-scoped research activities are particularly likely to be useful for longtermism-oriented AI alignment?
[...]
(3) Activity that is likely to be relevant for the hardest and most important parts of the problem, while also being the sort of thing that researchers can get up to speed on and contribute to relatively straightforwardly (without having to take on an unusual worldview, match other researchers’ unarticulated intuitions to too great a degree, etc.)

I'm planning to spend some time working on this question, or rather part of it. In particular I'm going to explore the argument that interpretability research falls into this category, with some attention to which specific aspects or angles of interpretability research seem most useful.

Since I don't plan to spend much time thoroughly examining other research directions besides interpretability, I don't expect to have a complete comparative answer to the question. But by answering the question for interpretability, I hope to at least put together a fairly comprehensive argument for (or perhaps against, we'll see after I look at the evidence!) interpretability research that could be used by those considering it as a target for their funding or their time. I also hope that then someone trying to answer the larger question could use my work on interpretability as part of a comparative analysis across different research activities.

If someone is already working on this particular question and I'm duplicating effort, please let me know and perhaps we can sync up. Otherwise, I hope to have something to show on this question in a few/several weeks!

Evan R. MurphyMay 12 20223

My first 2 posts for this project went live on the Alignment Forum today:

1. Introduction to the sequence: Interpretability Research for the Most Important Century
2. (main post) Interpretability’s Alignment-Solving Potential: Analysis of 7 Scenarios

matthewpFeb 26 20223

> How difficult should we expect AI alignment to be?

With many of the AI questions, one needs to reason backwards rather than pose the general question.

Suppose we all die because unaligned AI. What form did the unaligned AI take? How did it work? Which things that exist now were progenitors of it, and what changed to make it dangerous? How could those problems have been avoided, technically? Organisationally?

I don't see how useful alignment research can be done quite separately to capabilities research. Otherwise we'll get will be people coming in at the wrong time with a bunch of ideas that lack technical purchase.

Similarly, the questions about what applications we'll see first are already hinted at in capabilities research.

That being the case, it will take way more energy than 1 year for someone to upskill because they actually need to understand something about capabilities work.

jacobpfauFeb 27 20222

Re: feasibility of AI alignment research, Metaculus already has Control Problem solved before AGI invented . Do you have a sense of what further questions would be valuable?

Holden KarnofskyMar 31 20223

I don't have anything available for this offhand - I'd have to put serious thought into what questions are at the most productive intersection of "resolvable", "a good fit for Metaculus" and "capturing something important." Something about warning signs ("will an AI system steal at least $10 million?") could be good.

brb243May 7 20221

Would you consider breaking down your questions into sub-(sub-sub-...) questions that readers can answer and coordinating the discourse? I made a (fun) synthesis sheet for 216 participants (1 central EA-related question broken down into 3 layers of 6 questions). For each end-question, I included a resource which I vetted for EA-related thinking stimulation and (some) idea non-repetition. Feel free to also review a rough draft of breaking down the questions that you introduce in this piece.

I would argue that any intro to EA fellowship participant who was not discouraged from involvement can answer these questions. First, the broader ones should be skimmed and then the end-one selected.

This would result in efficient thought development, defining exclusivity by participation.

Effective Altruism Forum
EA Forum

Important, actionable research questions for the most important century

298

Intro

A high-level list of important, actionable questions for the most important century

Questions about AI alignment (more)

Questions about AI strategy (more)

Questions about AI “takeoff dynamics” (more)

How to know whether you can do this work

Comparison with seeking “crucial considerations” / “Cause X” / “unknown unknowns”

Comparison with more incremental work

The unusual sort of people we need

Notes

298

Reactions

More posts like this