I belive AI safety is a big problem for the future, and more people working on the problem would likely increase the chance that it gets solved, but I think the third component of ITN might need to be reevaluated.
I mainly formed my base ideas around 2015, when the AI revolution was portraied as a fight against killer robots. Nowadays, more details are communicated, like bias problems, optimizing for different-than-human values (ad-clicks), and killer drones.
It is possible that it only went from very neglected to somewhat neglected, or that the news I received from my echochamber was itself biased. In any case, I would like to know more.
It depends on what you mean by "neglected", since neglect is a spectrum. It's a lot less neglected than it was in the past, but it's still neglected compared to, say, cancer research or climate change. In terms of public opinion, the average person probably has little understanding of AI safety. I've encountered plenty of people saying things like "AI will never be a threat because AI can only do what it's programmed to do" and variants thereof.
What is neglected within AI safety is suffering-focused AI safety for preventing S-risks. Most AI safety research and existential risk research in general seems to be focused on reducing extinction risks and on colonizing space, rather than on reducing the risk of worse than death scenarios. There is also a risk that some AI alignment research could be actively harmful. One scenario where AI alignment could be actively harmful is the possibility of a "near miss" in AI alignment. In other words, risk from AI alignment roughly follows a Laffer curve, with AI that is slightly misaligned being more risky than both a perfectly aligned AI and a paperclip maximizer. For example, suppose there is an AI aligned to reflect human values. Yet "human values" could include religious hells. There are plenty of religious people who believe that an omnibenevolent God subjects certain people to eternal damnation, which makes one wonder if these sorts of individuals would implement a Hell if they had the power. Thus, an AI designed to reflect human values in this way could potentially involve subjecting certain individuals to something equivalent to a Biblical Hell.
Regarding specific AI safety organizations, Brian Tomasik wrote an evaluation of various AI/EA/longtermist organizations, in which he estimated that MIRI has a ~38% chance of being actively harmful. Eliezer Yudkowsky has also harshly criticized OpenAI, arguing that open access to their research poses a significant existential risk. Open access to AI research may increase the risk of malevolent actors creating or influencing the first superintelligence to be created, which poses a potential S-risk.
What is neglected within AI safety is suffering-focused AI safety for preventing S-risks. Most AI safety research and existential risk research in general seems to be focused on reducing extinction risks and on colonizing space, rather than on reducing the risk of worse than death scenarios.
I disagree, I think if AGI safety researchers cared exclusively about s-risk, their research output would look substantially the same as it does today, e.g. see here and discussion thread.
For example, suppose there is an AI aligned to reflect human values. Yet "human va
Sure, but people are still researching narrow alignment/corrigibility as a
prerequisite for ambitious value learning/CEV. If you buy the argument that
safety with respect to s-risks is non-monotonic in proximity to "human values"
and control, then marginal progress on narrow alignment can still be
net-negative w.r.t. s-risks, by increasing the probability that we get to
"something close to ambitious alignment occurs but without a Long Reflection,
technical measures against s-risks, etc." At least, if we're in the regime of
severe misalignment being the most likely outcome conditional on no more narrow
alignment work occurring, which I think is a pretty popular longtermist take. (I
don't currently think most alignment work clearly increases s-risks, but I'm
pretty close to 50/50 due to considerations like this.)
5
MichaelStJules
1y
If AGI safety researchers cared exclusively about s-risks, their priorities
would be closer to the Center on Long-Term Risk's. CLR is s-risk-focused, and
focuses on conflict scenarios in particular, which don't get nearly as much
attention elsewhere in AI safety. The Cooperative AI Foundation was also funded
primarily by the Center for Emerging Risk Research, which is also s-risk-focused
and whose staff have all worked for CLR/FRI/EAF at some point.
7
Lukas_Gloor
1y
Empirical disagreements between groups are often more consequential than
normative ones, so just pointing out what groups with a given normative stance
focus on may not be a strong argument.
I found Steven Byrnes' arguments in the other thread somewhat convincing (in
particular, the part where maybe no one wants to do ambitious value learning).
That said, next to ambitious value-learning, there are intermediates like
value-agnostic expansion where an AI learns the short-term preferences of a
human overseer, like "keep the overseer comfortable and in control." And those
short-term preferences can themselves backfire, because the humans will stick
around in protected bubbles, and they can be attacked. So, at the very least, AI
alignment research that takes s-risks extremely seriously would spend >10% of
its time thinking through failures modes of various alignment schemes,
particularly focused on "Where might hyperexistential separation fail?"
7
Steven Byrnes
1y
"Maybe no one" is actually an overstatement, sorry, here are some exceptions: 1
[https://www.lesswrong.com/posts/3L46WGauGpr7nYubu/the-plan],2
[https://www.lesswrong.com/posts/zuHezdoGr2KtM2n43/new-year-new-research-agenda-post],3
[https://futureoflife.org/2019/09/17/synthesizing-a-humans-preferences-into-a-utility-function-with-stuart-armstrong/].
(I have corrected my previous comment.)
I guess I think of current value learning work as being principally along the
lines of “What does value learning even mean? How do we operationalize that?”
And if we’re still confused about that question, it makes it a bit hard to
productively think about failure modes.
It seems pretty clear to me that “unprincipled, making-it-up-as-you-go-along,
alignment schemes” would be bad for s-risks, for such reasons as you mentioned.
So trying to gain clarity about the lay of the land seems good.
5
Paul_Christiano
1y
Regarding susceptibility to s-risk:
* If you keep humans around, they can decide on how to respond to threats and
gradually improve their policies as they figure out more (or their AIs figure
out more).
* If you build incorrigible AIs who will override human preferences (so that a
threatened human has no ability to change the behavior of their AI), while
themselves being resistant to threats, then you may indeed reduce the
likelihood of threats being carried out.
* But in practice all the value is coming from you solving "how do we deal with
threats?" at the same time that you solved the alignment problem.
* I don't think there's any real argument that solving CEV or ambitious value
learning per se helps with these difficulties, except insofar as your AI was
able to answer these questions. But in that case a corrigible AI could also
answer those questions.
* Humans may ultimately build incorrigible AI for decision-theoretic reasons,
but I think the decision should do so should probably be separated from
solving alignment.
* I think the deepest coupling comes from the fact that the construction of
incorrigible AI is itself an existential risk, and so it may be extremely
harmful to build technology that enables that prior to having norms and
culture that are able to use it responsibly.
* Overall, I'm much less sure than you that "making it up as you go along
alignment" is bad for s-risk.
1
Anirandis
1y
Hi, regarding this part:
I'm not 100% sure I understand; could you elaborate a little? Is the idea that
the human overseer's values could value punishing some out-group or something
else?
1
Steven Byrnes
1y
Oops, I was thinking more specifically about technical AGI safety. Or do you
think "conflict scenarios" impact that too?
4
MichaelStJules
1y
CLR's agenda is aimed at reducing conflict among TAI-enabled actors, and
includes credibility and peaceful bargaining, which both seem technical:
https://longtermrisk.org/research-agenda
[https://longtermrisk.org/research-agenda]
Their technical research is mostly on multi-agent systems and decision theory:
https://longtermrisk.org/publications/ [https://longtermrisk.org/publications/]
I'd guess they're the only (10+ team member) group with multi-agent systems and
multipolar scenarios among their main focuses within their technical research.
5
Steven Byrnes
1y
OK, thanks. Here’s a chart I made:
Source: my post here
[https://www.lesswrong.com/posts/4basF9w9jaPZpoC8R/intro-to-brain-like-agi-safety-1-what-s-the-problem-and-why#1_2_The_AGI_technical_safety_problem].
I think the problem is that when I said “technical AGI safety”, I was thinking
the red box, whereas you were thinking “any technical topic in either the red or
blue boxes”. I agree that there are technical topics in the top-right blue box
in particular, and that’s where “conflict scenarios” would mainly be. My
understanding is that working on those topics does not have much of a direct
connection to AGI, in the sense that technologies for reducing human-human
conflict tend to overlap with technologies for reducing AGI-AGI conflict. (At
least, according to this comment thread
[https://www.lesswrong.com/posts/LpM3EAakwYdS6aRKf/what-multipolar-failure-looks-like-and-robust-agent-agnostic?commentId=W5CPYxDi4nKiBFdoh],
I haven’t personally thought about it much.)
Anyway, I guess you would say “in a more s-risk-focused world, we would be
working more on the top-right blue box and less on the red box”. But really, in
a more s-risk-focused world, we would be working more on all three colored
boxes. :-P I’m not an expert on the ITN of particular projects within the blue
boxes, and therefore don’t have a strong opinion about how to weigh them against
particular projects within the red box. I am concerned / pessimistic about
prospects for success in the red box. But maybe if I knew more about the state
of the blue boxes, I would be equally concerned / pessimistic about those too!!
¯\_(ツ)_/¯
6
Lukas_Gloor
1y
That's a cool chart!
I actually think the most useful things to do to reduce s-risks can be
conceptualized as part of the red box.
For one thing, solving global coordination seems really hard and the best way to
solve it may include aligned AI, anyway. "...and everyone actually follows that
manual!" is the hard one, but I'd imagine the EA community will come up with
some kind of serious attempt, and people interested in reducing s-risks may not
have a comparative advantage at making that happen.
So we're back to the red box.
I think people interested in reducing s-risks should mostly study alignment
schemes and their goal architectures and pick ones that implement
hyperexistential separation as much as possible. This produces not-terrible
futures even if you fail to address the problem in the top-right blue box.
You might reply "AI alignment is too difficult to be picky, and we don't have
any promising approaches anyway." In that case, you'd anyway have a large
probability of an existential catastrophe, so you can just make sure people
don't try some Hail Mary thing that is unusually bad for s-risks.
By contrast, if you think AI alignment isn't too difficult, there might be
multiple approaches with a shot at working, and those predictably differ with
respect to hyperexistential separation.
Based on talking to various researchers, I'd say there are fewer than 50 people doing promising work on existential AI safety, and fewer than 200 thinking about AI safety full-time in any reasonable framing of the problem.
If you think that AI safety is 10x as large as, say, biorisk, and returns are logarithmic, we should allocate 10x the resources to AI safety as biorisk. And biorisk is still larger than most causes. So it's fine for AI safety to not be quite as neglected as the most neglected causes.
Which leads to the question of how we can get more people to produce promising work in AI safety. There are plenty of highly intelligent people out there who are capable of doing work in AI safety, yet almost none of them do. Maybe trying to popularize AI safety would help to indirectly contribute to it, since it might help to convince geniuses with the potential to work in AI safety to start working on it. It could also be an incentive problem. Maybe potential AI safety researchers think they can make more money by working in other fields, or maybe there ... (read more)
I disagree, I think if AGI safety researchers cared exclusively about s-risk, their research output would look substantially the same as it does today, e.g. see here and discussion thread.
... (read more)