NinaR

36Joined Apr 2022

Posts
1

Sorted by New

Comments
3

Criticism of altruism

Thanks for these comments, will think about them! I particularly liked “ if your view of what you want the world to look like includes other people's preferences, and you make non-trivial sacrifices (E.g, donating 10%) to meet those preferences, that should certainly count as altruism, even if you're doing it because you want to”. This seems like a more useful and practical framing of the concept; based on behaviours rather than internal motivations.

Are "Bad People" Really Unwelcome in EA?

Thought-provoking content :) I started writing a comment on this post and then it got long enough to be its own post... 

How I failed to form views on AI safety

Meta-level comment: this post was interesting, very well written and I could empathize with a lot of it, and in fact, it inspired me to make an account on here in order to comment : )

Object-level comment (ended up long, apologies!): My personal take is that a lot of EA literature on AI Safety (eg: forum articles) uses terminology that overly anthropomorphizes AI and skips a lot of steps in arguments, assuming a fair amount of prerequisite knowledge/ familiarity with jargon. When reading such literature, I try to convert the "EA AI Safety language" into "normal language" in my head in order to understand the claims better. Overall my answer to “why is AI safety important” is (currently) the following:

  • Humans are likely to develop increasingly powerful AI systems in order to solve important problems / provide in-demand services.
  • As AI systems become more powerful they can do more stuff / affect more change in the world. Somewhat because we’ll want them to. Why develop a powerful AI if not to do important/hard things?
  • As AI systems become more powerful it will be harder for us to ensure they do what we want them to do when deployed. Imo this claim needs more justification, and I will elaborate below.
  • Therefore, there is some probability that a powerful AI does things we don’t want. Hopefully, this follows from the above 3 premises.

AI Safety literature commonly refers to AI’s as optimizers. Terms like “mesa-optimizer” are used a lot as well. There is also a fair amount of anthropomorphizing language such as “the AI will want to”, “the AI will secretly plan to”, and “the AI will like/dislike”. By now, I am ok with parsing such statements, but it can be confusing/distracting. “This thing is an optimizer” is a useful phrase when trying to predict the thing’s behavior, but looking through the lens of “what kind of optimizer is this” isn’t always the clearest way to look at the problem. The same can be said for the anthropomorphizing language. When someone says something like “so how do we check whether an AI secretly wants to kill you”, they are trying to summarise a phenomenon succinctly without giving the details/context, however, they are incurring the cost of being less precise and clear (and sounding weird).

Here’s my attempt to explain point 3 (As AI systems become more powerful it will be harder for us to ensure they do what we want them to do when deployed) without using EA-isms. 

  • The simplest possible description of an AI is an instance of a program that takes some inputs and produces some outputs.
  • Training an AI is the process of searching over different possible programs to find ones that “are good” by looking at which ones produce “good outputs for their inputs”. Another more general phrasing is searching over different possible programs to find ones that “are good” by using some kind of filter that can classify programs as good and bad.
  • “Is this a good output” or “is this program instance good” is easy to check when the problem is simple but harder to check when the problem is hard  (some people call this the outer alignment problem). For example, “is the outputted animal image labeled correctly” is easy to check but “does the outputted drug treat the disease” is harder to check. It is plausible to me that we will want “is good” to encapsulate more and more human values as the problems become harder, which makes it *even* harder to define. For example what if the criteria is “does the outputted intervention reduce CO2 emissions without causing anything that humans find morally wrong”.
  • “good outputs for their inputs” / “the AI has all the properties we designed into the filter” is harder to check, the more complex the domain of possible inputs and outputs we are dealing with (some people call this the inner alignment problem). A maths analogy is function approximation. Say you have a really complicated function with a lot of local minima/maxima/wobbliness. If we are testing an approximation by checking various points in the function space and the approximated function space and seeing if they match up, the less smooth the function space the more points we need to test to see if they match up. With infinite training data, we could check every point in the function space and be certain that outputs are good for all inputs, however, this is impossible, therefore we cannot be certain, and there will always be some probability of a bad output for a particular input. If an AI is more powerful, the effect of one bad output could be really bad.

All this being said, I maintain a significant amount of skepticism around AI safety being the most important problem. I think nuclear risk and biorisk are very important and also more tractable. Based on my current model, I place a ~10% probability of AI-related existential risk in the next century, conditional on no other existential risk occurring beforehand. I think there is a much higher probability of non-existential risk, and I care about that too. 

Currently, the most tractable way I can think of AI safety technical research is “let’s make ascertaining whether an AI output is good X% easier” and “let’s make it Y% easier to infer whether an AI is actually as good as it seems in training given limited training data and limited compute”. Research that increases X and Y would likely decrease AI risk. This is a hard problem, and many solutions may not be scalable to more powerful AI’s. This is likely why a lot of AI safety literature uses metaphors and language that seem far removed from current systems. I think this is an attempt to imagine a good map for the territory of future AI’s in order to come up with solutions that could work in the future.

I also have meta-level uncertainty around my takes and update in the direction of thinking that AI Safety is more important than I otherwise would because many intelligent people I respect think this. Because of this, I make decisions based on AI risk being higher than my internal model currently estimates.  I also spend more time thinking about AI safety because I find it interesting and I have a somewhat suitable background (I like programming and ML). I do think a major factor in deciding whether to work on AI safety should be personal fit.