I belive AI safety is a big problem for the future, and more people working on the problem would likely increase the chance that it gets solved, but I think the third component of ITN might need to be reevaluated.

I mainly formed my base ideas around 2015, when the AI revolution was portraied as a fight against killer robots. Nowadays, more details are communicated, like bias problems, optimizing for different-than-human values (ad-clicks), and killer drones.

It is possible that it only went from very neglected to somewhat neglected, or that the news I received from my echochamber was itself biased. In any case, I would like to know more.

New Answer
Ask Related Question
New Comment

2 Answers sorted by

It depends on what you mean by "neglected", since neglect is a spectrum. It's a lot less neglected than it was in the past, but it's still neglected compared to, say, cancer research or climate change. In terms of public opinion, the average person probably has little understanding of AI safety. I've encountered plenty of people saying things like "AI will never be a threat because AI can only do what it's programmed to do" and variants thereof.

What is neglected within AI safety is suffering-focused AI safety for preventing S-risks. Most AI safety research and existential risk research in general seems to be focused on reducing extinction risks and on colonizing space, rather than on reducing the risk of worse than death scenarios. There is also a risk that some AI alignment research could be actively harmful. One scenario where AI alignment could be actively harmful is the possibility of a "near miss" in AI alignment. In other words, risk from AI alignment roughly follows a Laffer curve, with AI that is slightly misaligned being more risky than both a perfectly aligned AI and a paperclip maximizer. For example, suppose there is an AI aligned to reflect human values. Yet "human values" could include religious hells. There are plenty of religious people who believe that an omnibenevolent God subjects certain people to eternal damnation, which makes one wonder if these sorts of individuals would implement a Hell if they had the power. Thus, an AI designed to reflect human values in this way could potentially involve subjecting certain individuals to something equivalent to a Biblical Hell.

Regarding specific AI safety organizations, Brian Tomasik wrote an evaluation of various AI/EA/longtermist organizations, in which he estimated that MIRI has a ~38% chance of being actively harmful. Eliezer Yudkowsky has also harshly criticized OpenAI, arguing that open access to their research poses a significant existential risk. Open access to AI research may increase the risk of malevolent actors creating or influencing the first superintelligence to be created, which poses a potential S-risk. 

What is neglected within AI safety is suffering-focused AI safety for preventing S-risks. Most AI safety research and existential risk research in general seems to be focused on reducing extinction risks and on colonizing space, rather than on reducing the risk of worse than death scenarios.

I disagree, I think if AGI safety researchers cared exclusively about s-risk, their research output would look substantially the same as it does today, e.g. see here and discussion thread.

For example, suppose there is an AI aligned to reflect human values. Yet "human va

... (read more)
Sure, but people are still researching narrow alignment/corrigibility as a prerequisite for ambitious value learning/CEV. If you buy the argument that safety with respect to s-risks is non-monotonic in proximity to "human values" and control, then marginal progress on narrow alignment can still be net-negative w.r.t. s-risks, by increasing the probability that we get to "something close to ambitious alignment occurs but without a Long Reflection, technical measures against s-risks, etc." At least, if we're in the regime of severe misalignment being the most likely outcome conditional on no more narrow alignment work occurring, which I think is a pretty popular longtermist take. (I don't currently think most alignment work clearly increases s-risks, but I'm pretty close to 50/50 due to considerations like this.)
If AGI safety researchers cared exclusively about s-risks, their priorities would be closer to the Center on Long-Term Risk's. CLR is s-risk-focused, and focuses on conflict scenarios in particular, which don't get nearly as much attention elsewhere in AI safety. The Cooperative AI Foundation was also funded primarily by the Center for Emerging Risk Research, which is also s-risk-focused and whose staff have all worked for CLR/FRI/EAF at some point.
Empirical disagreements between groups are often more consequential than normative ones, so just pointing out what groups with a given normative stance focus on may not be a strong argument. I found Steven Byrnes' arguments in the other thread somewhat convincing (in particular, the part where maybe no one wants to do ambitious value learning). That said, next to ambitious value-learning, there are intermediates like value-agnostic expansion where an AI learns the short-term preferences of a human overseer, like "keep the overseer comfortable and in control." And those short-term preferences can themselves backfire, because the humans will stick around in protected bubbles, and they can be attacked. So, at the very least, AI alignment research that takes s-risks extremely seriously would spend >10% of its time thinking through failures modes of various alignment schemes, particularly focused on "Where might hyperexistential separation fail?"
7Steven Byrnes3mo
"Maybe no one" is actually an overstatement, sorry, here are some exceptions: 1 [https://www.lesswrong.com/posts/3L46WGauGpr7nYubu/the-plan],2 [https://www.lesswrong.com/posts/zuHezdoGr2KtM2n43/new-year-new-research-agenda-post] ,3 [https://futureoflife.org/2019/09/17/synthesizing-a-humans-preferences-into-a-utility-function-with-stuart-armstrong/] . (I have corrected my previous comment.) I guess I think of current value learning work as being principally along the lines of “What does value learning even mean? How do we operationalize that?” And if we’re still confused about that question, it makes it a bit hard to productively think about failure modes. It seems pretty clear to me that “unprincipled, making-it-up-as-you-go-along, alignment schemes” would be bad for s-risks, for such reasons as you mentioned. So trying to gain clarity about the lay of the land seems good.
Regarding susceptibility to s-risk: * If you keep humans around, they can decide on how to respond to threats and gradually improve their policies as they figure out more (or their AIs figure out more). * If you build incorrigible AIs who will override human preferences (so that a threatened human has no ability to change the behavior of their AI), while themselves being resistant to threats, then you may indeed reduce the likelihood of threats being carried out. * But in practice all the value is coming from you solving "how do we deal with threats?" at the same time that you solved the alignment problem. * I don't think there's any real argument that solving CEV or ambitious value learning per se helps with these difficulties, except insofar as your AI was able to answer these questions. But in that case a corrigible AI could also answer those questions. * Humans may ultimately build incorrigible AI for decision-theoretic reasons, but I think the decision should do so should probably be separated from solving alignment. * I think the deepest coupling comes from the fact that the construction of incorrigible AI is itself an existential risk, and so it may be extremely harmful to build technology that enables that prior to having norms and culture that are able to use it responsibly. * Overall, I'm much less sure than you that "making it up as you go along alignment" is bad for s-risk.
Hi, regarding this part: I'm not 100% sure I understand; could you elaborate a little? Is the idea that the human overseer's values could value punishing some out-group or something else?
1Steven Byrnes3mo
Oops, I was thinking more specifically about technical AGI safety. Or do you think "conflict scenarios" impact that too?
CLR's agenda is aimed at reducing conflict among TAI-enabled actors, and includes credibility and peaceful bargaining, which both seem technical: https://longtermrisk.org/research-agenda [https://longtermrisk.org/research-agenda] Their technical research is mostly on multi-agent systems and decision theory: https://longtermrisk.org/publications/ [https://longtermrisk.org/publications/] I'd guess they're the only (10+ team member) group with multi-agent systems and multipolar scenarios among their main focuses within their technical research.
5Steven Byrnes3mo
OK, thanks. Here’s a chart I made: Source: my post here [https://www.lesswrong.com/posts/4basF9w9jaPZpoC8R/intro-to-brain-like-agi-safety-1-what-s-the-problem-and-why#1_2_The_AGI_technical_safety_problem] .I think the problem is that when I said “technical AGI safety”, I was thinking the red box, whereas you were thinking “any technical topic in either the red or blue boxes”. I agree that there are technical topics in the top-right blue box in particular, and that’s where “conflict scenarios” would mainly be. My understanding is that working on those topics does not have much of a direct connection to AGI, in the sense that technologies for reducing human-human conflict tend to overlap with technologies for reducing AGI-AGI conflict. (At least, according to this comment thread [https://www.lesswrong.com/posts/LpM3EAakwYdS6aRKf/what-multipolar-failure-looks-like-and-robust-agent-agnostic?commentId=W5CPYxDi4nKiBFdoh] , I haven’t personally thought about it much.) Anyway, I guess you would say “in a more s-risk-focused world, we would be working more on the top-right blue box and less on the red box”. But really, in a more s-risk-focused world, we would be working more on all three colored boxes. :-P I’m not an expert on the ITN of particular projects within the blue boxes, and therefore don’t have a strong opinion about how to weigh them against particular projects within the red box. I am concerned / pessimistic about prospects for success in the red box. But maybe if I knew more about the state of the blue boxes, I would be equally concerned / pessimistic about those too!! ¯\_(ツ)_/¯
That's a cool chart! I actually think the most useful things to do to reduce s-risks can be conceptualized as part of the red box. For one thing, solving global coordination seems really hard and the best way to solve it may include aligned AI, anyway. "...and everyone actually follows that manual!" is the hard one, but I'd imagine the EA community will come up with some kind of serious attempt, and people interested in reducing s-risks may not have a comparative advantage at making that happen. So we're back to the red box. I think people interested in reducing s-risks should mostly study alignment schemes and their goal architectures and pick ones that implement hyperexistential separation as much as possible. This produces not-terrible futures even if you fail to address the problem in the top-right blue box. You might reply "AI alignment is too difficult to be picky, and we don't have any promising approaches anyway." In that case, you'd anyway have a large probability of an existential catastrophe, so you can just make sure people don't try some Hail Mary thing that is unusually bad for s-risks. By contrast, if you think AI alignment isn't too difficult, there might be multiple approaches with a shot at working, and those predictably differ with respect to hyperexistential separation.

Based on talking to various researchers, I'd say there are fewer than 50 people doing promising work on existential AI safety, and fewer than 200 thinking about AI safety full-time in any reasonable framing of the problem.

If you think that AI safety is 10x as large as, say, biorisk, and returns are logarithmic, we should allocate 10x the resources to AI safety as biorisk. And biorisk is still larger than most causes. So it's fine for AI safety to not be quite as neglected as the most neglected causes.

Which leads to the question of how we can get more people to produce promising work in AI safety. There are plenty of highly intelligent people out there who are capable of doing work in AI safety, yet almost none of them do. Maybe trying to popularize AI safety would help to indirectly contribute to it, since it might help to convince geniuses with the potential to work in AI safety to start working on it. It could also be an incentive problem. Maybe potential AI safety researchers think they can make more money by working in other fields, or maybe there ... (read more)