All of Stuart Armstrong's Comments + Replies

I argue that it's entirely the truth, the way that the term is used and understood.

Precisely. And supporting subsidized contraception is a long way away from both the formal definition of eugenics and its common understanding.

8
Larks
1y
The first definition I get from google for 'eugenics' is which seems like it includes a lot of the things described in this post that normal people support like incest avoidance and your example of embryo screening? In contrast the first definition I get for 'communism' is very narrow and makes it clear that necessary components include things most people don't believe (class war, public ownership):

I feel that saying "subsidized contraception is not eugenics" is rhetorically better and more accurate than this approach.

3
Sentientist
1y
Saying "subsidized contraception is not eugenics" is a lie. 

Ah, you made the same point I did, but better :-)

>Most people endorse some form of 'eugenics'

No, they don't. It is akin to saying "most people endorse some form of 'communism'." We can point to a lot of overlap between theoretical communism and values that most people endorse; this doesn't mean that people endorse communism. That's because communism covers a lot more stuff, including a lot of historical examples and some related atrocities. Eugenics similarly covers a lot of historical examples, including some atrocities (not only in fascist countries), and this is what the term means to most people -... (read more)

7
Larks
1y
That seems pretty dis-analogous to me - while exact definitions vary, communism does have a fairly precise thing its referring to - the state or collective worker ownership of the means of production, abolition of free markets etc. Supporting a minimum wage doesn't make someone a communist because 1) they probably don't support the nationalisation of all industry and 2) many communist countries didn't actually have a minimum wage. I think it's fair to say that, all else equal, a minimum wage supporter is probably closer to the communist side of the spectrum than someone who opposes them, but they're still a long way away.
3
Sentientist
1y
In the current zeitgeist, and even for the past couple of decades, "that's eugenics" has shut down conversations among intelligent people about important topics like behavioral genetics, reproductive technology and subsidized contraception. "Eugenicist" has been leveled against everyone from Darwin to EO Wilson, to Margaret Sanger, to Bill Gates to Nick Bostrom as a way of signaling that we should ignore everything that person has to say and see everything they do or think as evil and illegitimate. Communist or communism isn't used in this way and communism doesn't have this sting. Maybe during the McCarthy era an essay like this could have also been necessary.  And of course people respond angrily if called a eugenicist- it's a term, as you said, that means "evil" in the current Western zeitgeist (but not in much of the rest of the world, as one commenter noted). This essay isn't meant to be dispassionate, it's meant to provoke the reader into rethinking how this term shuts down conversations about ideas and people. 

Thanks! Should be corrected now.

Thanks, that makes sense.

I've been aware of those kind of issues; what I'm hoping is that we can get a framework to include these subtleties automatically (eg by having the AI learn them from observations or from human published papers)  without having to put it all in by hand ourselves.

Hey there! It is a risk, but the reward is great :-)

  1. Value extrapolation makes most other AI safety approaches easier (eg interpretability, distillation and amplification, low impact...). Many of these methods also make value extrapolation easier (eg interpretability, logical uncertainty,...). So I'd say the contribution is superlinear - solving 10% of AI safety our way will give us more than 10% progress.
  2. I think it already has reframed AI safety from "align AI to the actual (but idealised) human values" to "have an AI construct values that are reasonable e
... (read more)
2
MikeJohnson
2y
Great, thank you for the response. On (3) — I feel AI safety as it’s pursued today is a bit disconnected from other fields such as neuroscience, embodiment, and phenomenology. I.e. the terms used in AI safety don’t try to connect to the semantic webs of affective neuroscience, embodied existence, or qualia. I tend to take this as a warning sign: all disciplines ultimately refer to different aspects of the same reality, and all conversations about reality should ultimately connect. If they aren’t connecting, we should look for a synthesis such that they do. That’s a little abstract; a concrete example would be the paper “Dissecting components of reward: ‘liking’, ‘wanting’, and learning” (Berridge et al. 2009), which describes experimental methods and results showing that ‘liking’, ‘wanting’, and ‘learning’ can be partially isolated from each other and triggered separately. I.e. a set of fairly rigorous studies on mice demonstrating they can like without wanting, want without liking, etc. This and related results from affective neuroscience would seem to challenge some preference-based frames within AI alignment, but it feels there‘s no ‘place to put’ this knowledge within the field. Affective neuroscience can discover things, but there’s no mechanism by which these discoveries will update AI alignment ontologies. It’s a little hard to find the words to describe why this is a problem; perhaps that not being richly connected to other fields runs the risk of ‘ghettoizing‘ results, as many social sciences have ‘ghettoized’ themselves. One of the reasons I’ve been excited to see your trajectory is that I’ve gotten the feeling that your work would connect more easily to other fields than the median approach in AI safety.

An AI that is aware that value is fragile will behave in a much more cautious way. This gives a different dynamic to the extrapolation process.

1
Yonatan Cale
2y
Thanks!   So ok, the AI knows that some human values are unknown to the AI. What does the AI do about this? The AI can do some action that maximizes the known-human-values, and risk hurting others. The AI can do nothing and wait until it knows more (wait how long? There could always be missing values). ----------------------------------------   Something I'm not sure I understood from the article: Does the AI assume that the AI is able to list all the possible values that humans maybe care about? Is this how the AI is supposed to guard against any of the possible-human-values from going down too much?
  1. Nothing much to add to the other post.
  2. Imagine that you try to explain to a potential superintelligence that we want it to preserve a world with happy people in it by showing it videos of happy people. It might conclude that it should make people happy. Or it might conclude that we want more videos of happy people. The latter is more compatible with the training that we have given it. The AI will be safer if it hypothesizes that we may have meant the former, despite having given it evidence more compatible with the latter, and pursues both goals rather than
... (read more)

Most of the alignment research pursued by other EA groups (eg Anthropic, Redwood, ARC, MIRI, the FHI,...) would be useful to us if successful (and vice versa: our research would be useful for them). Progress in inner alignment, logical uncertainty, and interpretability is always good.

Fast increase in AI capabilities might result in a superintelligence before our work is ready. If the top algorithms become less interpretable than they are today, this might make our work harder.

Whole brain emulations would change things in ways that are hard to predict, and could make our approach either less or more successful.

A problem here is that values that are instrumentally useful, can become terminal values that humans value for their own sake.

For example, equality under the law is very useful in many societies, especially modern capitalistic ones; but a lot of people (me included) feel it has strong intrinsic value. In more traditional and low-trust societies, the tradition of hospitality is necessary for trade and other exchanges; yet people come to really value it for its own sake. Family love is evolutionarily adaptive, yet also something we value.

So just because some value has developed from a suboptimal system does not mean that it isn't worth keeping.

2
brb243
2y
Ok, that makes sense. Rhetorically, how would one differentiate the terminal values worth keeping from those worth updating. For example, hospitality 'requirement' from the free ability to choose to be hospitable from the ability to choose environments of various hospitability attitudes. I would really offer the emotional understanding of all options and let individuals freely decide. This should resolve the issue of persons favoring their environments due to limited awareness of alternatives or the fear of consequences of choosing an alternative. Then, you could get to more fundamental terminal values, such as the perception of living in a truly fair system (instead of equality under the law, which can still perpetuate some unfairness), ability to interact only with those with whom one wishes to (instead of hospitality), and understanding others' preferences for interactions related to oxytocin, dopamine, and serotonin release and choosing to interact with those where preferences are mutual (instead of family love), for example. Anyway, thank you.

Nick Bostrom's "Superintelligence" is an older book, but still a good overview. Stuart Russell's "Human Compatible" is a more modern take. I touch upon some of the main issues in my talk here. Paul Christiano's excellent "What Failure Looks Like" tackles the argument from another angle.

Comment copied to new "Stuart Armstrong" account:

Different approaches. ARC, Anthropic, and Redwood seem to be more in the "prosaic alignment" field (see eg Paul Christiano's post on that). ARC seems to be focusing on eliciting latent knowledge (getting human relevant information out of the AI that the AI knows but has no reason to inform us of). Redwood is aligning text-based systems and hoping to scale up. Anthropic is looking at a lot of interlocking smaller problems that will (hopefully) be of general use for alignment. MIRI seems to focus on some key f... (read more)

Comment copied to new "Stuart Armstrong" account:

Interesting! And nice to see ADT make an appearance ^_^

I want to point to where ADT+total utilitarianism diverges from SIA. Basically, SIA has no problem with extreme "Goldilocks" theories - theories that imply that only worlds almost exactly like the Earth have inhabitants. These theories are a priori unlikely (complexity penalty) but SIA is fine with them (if  is "only the Earth has life, but has it with certainty", while  is "every planet has life with  probability", t... (read more)