Cognitive Science/Psychology As a Neglected Approach to AI Safety

Kaj_Sotala

All of the advice on getting into AI safety research that I've seen recommends studying computer science and mathematics: for example, the 80,000 hours AI safety syllabus provides a computer science-focused reading list, and mentions that "Ideally your undergraduate degree would be mathematics and computer science".

There are obvious good reasons for recommending these two fields, and I agree that anyone wishing to make an impact in AI safety should have at least a basic proficiency in them. However, I find it a little concerning that cognitive science/psychology are rarely even mentioned in these guides. I believe that it would be valuable to have more people working in AI safety whose primary background is from one of cogsci/psych, or who have at least done a minor in them.

Here are examples of four lines of research into AI safety which I think could benefit from such a background:

The psychology of developing an AI safety culture. Besides the technical problem of "how can we create safe AI", there is the social problem of "how can we ensure that the AI research community develops a culture where safety concerns are taken seriously". At least two existing papers draw on psychology to consider this problem: Eliezer Yudkowsky's "Cognitive Biases Potentially Affecting Judgment of Global Risks" uses cognitive psychology to discuss why people might misjudge the probability of risks in general, and Seth Baum's "On the promotion of safe and socially beneficial artificial intelligence" uses social psychology to discuss the specific challenge of motivating AI researchers to choose beneficial AI designs.
Developing better analyses of "AI takeoff" scenarios. Currently humans are the only general intelligence we know of, so any analyzes of what "expertise" consists of and how it can be acquired would benefit from the study of humans. Eliezer Yudkowsky's "Intelligence Explosion Microeconomics" draws on a number of fields to analyze the possibility of a hard takeoff, including some knowledge of human intelligence differences as well as the history of human evolution, whereas my "How Feasible is the Rapid Development of Artificial Superintelligence?" draws extensively on the work of a number of psychologists to make the case that based on what we know of human expertise, scenarios with AI systems becoming major actors within timescales on the order of mere days or weeks seem to remain within the range of plausibility.
Defining just what it is that human values are. The project of AI safety can roughly be defined as "the challenge of ensuring that AIs remain aligned with human values", but it's also widely acknowledged that nobody really knows what exactly human values are - or at least, not to a sufficient extent that they could be given a formal definition and programmed into an AI. This seems like one of the core problems of AI safety, and one which can only be understood with a psychology-focused research program. Luke Muehlhauser's article "A Crash Course in the Neuroscience of Human Motivation" took one look at human values from the perspective of neuroscience, and my "Defining Human Values for Value Learners" sought to provide a preliminary definition of human values in a computational language, drawing from the intersection of artificial intelligence, moral psychology, and emotion research. Both of these are very preliminary papers, and it would take a full research program to pursue this question in more detail.
Better understanding multi-level world-models. MIRI defines the technical problem of "multi-level world-models" as "How can multi-level world-models be constructed from sense data in a manner amenable to ontology identification?". In other words, suppose that we had built an AI to make diamonds (or anything else we care about) for us. How should that AI be programmed so that it could still accurately estimate the number of diamonds in the world after it had learned more about physics, and after it had learned that the things it calls "diamonds" are actually composed of protons, neutrons, and electrons? While I haven't seen any papers that would explicitly tackle this question yet, a reasonable starting point would seem to be the question of "well, how do humans do it?". There, psych/cogsci may offer some clues. For instance, in the book Cognitive Pluralism, the philosopher Steven Horst offers an argument for believing that humans have multiple different, mutually incompatible mental models / reasoning systems - ranging from core knowledge systems to scientific theories - that they flexibly switch between depending on the situation. (Unfortunately, Horst approaches this as a philosopher, so he's mostly content at making the argument for this being the case in general, leaving it up to actual cognitive scientists to work out how exactly this works.) I previously also offered a general argument along these lines in my article World-models as tools, suggesting that at least part of the choice of a mental model may be driven by reinforcement learning in the basal ganglia. But this isn't saying much, given that all human thought and behavior seems to be in at least part driven by reinforcement learning in the basal ganglia. Again, this would take a dedicated research program.

From these four special cases, you could derive more general use cases for psychology and cognitive science within AI safety:

Psychology as the study and understanding of human thought and behavior, helps guide actions that are aimed at understanding and influencing people's behavior in a more safety-aligned direction (related example: the psychology of developing an AI safety culture)
The study of the only general intelligence we know about, may provide information about the properties of other general intelligences (related example: developing better analyzes of "AI takeoff" scenarios)
A better understanding of how human minds work, may help figure out how we want the cognitive processes of AIs to work so that they end up aligned with our values (related examples: defining human values, better understanding multi-level world-models)

Here I would ideally offer reading recommendations, but the fields are so broad that any given book can only give a rough idea of the basics; and for instance, the topic of world-models that human brains use is just one of many, many subquestions that the fields cover. Thus my suggestion to have some safety-interested people who'd actually study these fields as a major or at least a minor.

Still, if I'd have to suggest a couple of books, with the main idea of getting a basic grounding in the mindsets and theories of the fields so that it would be easier to read more specialized research... on the cognitive psychology/cognitive science side I'd suggest Cognitive Science by Jose Luis Bermudez (haven't read it, but Luke Muehlhauser recommends it and it looked good to me based on the table of contents; see also Luke's follow-up recommendations behind that link); Cognitive Psychology: A Student's Handbook by Michael W. Eysenck & Mark T. Keane; and maybe Sensation and Perception by E. Bruce Goldstein. I'm afraid that I don't know of any good introductory textbooks on the social psychology side.

40 Reactions

Mentioned in

202Collection of good 2012-2017 EA forum posts

163A central directory for open research questions

54A Research Agenda for Psychology and AI

18Is it valuable to the field of AI Safety to have a neuroscience background?

11AI safety and consciousness research: A brainstorm

Load more (5/7)

More posts like this

Comments37

Sorted by

New & upvoted

Click to highlight new comments since: Today at 12:47 PM

LanceSBushJun 6 201711

I am a psychology PhD student with a background in philosophy/evolutionary psychology. My current research focuses on two main areas: effective altruism and the nature of morality and in particular the psychology of metaethics. My motivation for pursuing the former should be obvious, but my rationale for pursuing the latter is in part self-consciously about the third bullet point, "Defining just what it is that human values are." More basic than even defining what those values are, I am interested in what people take values themselves to be. For instance, we do not actually have good data on the degree to which people regard their own moral beliefs as objective/relative, how common noncognitivist or error theoretic beliefs are in lay populations, etc.

Related to the first point, about developing an AI safety culture, there is also the matter of what we can glean psychologically about how the public likely to receive AI developments. Understanding how people generally perceive AI and technological change more broadly could provide insight that can help us anticipate emerging social issues that result from advances in AI and improve our ability to raise awareness about and increase receptivity to concerns about AI risk among nonexperts, policymakers, the media, and the public. Cognitive science has more direct value than areas like mine (social psychology/philosophy) but my areas of study could serve a valuable auxiliary function to AI safety.

Gram_StoneJun 5 20179

I think these are all points that many people have considered privately or publicly in isolation, but that thus far no one has explicitly written them down and drawn a connection between them. In particular, lots of people have independently made the observation that ontological crises in AIs are apparently similar to existential angst in humans, ontology identification seems philosophically difficult, and so plausibly studying ontology identification in humans is a promising route to understanding ontology identification for arbitrary minds. So, thank you for writing this up; it seems like something that needed to be written quite badly.

Some other problems that might be easier to tackle from this perspective include mind crime, nonperson predicates, and suffering risk, especially subproblems like suffering in physics.

JesseCliftonJun 5 20177

Strong agreement. Considerations from cognitive science might also help us to get a handle on how difficult the problem of general intelligence is, and the limits of certain techniques (e.g. reinforcement learning). This could help clarify our thinking on AI timelines as well as the constraints which any AGI must satisfy. Misc. topics that jump to mind are the mental modularity debate, the frame problem, and insight problem solving.

This is a good article on AI from a cog sci perspective: https://arxiv.org/pdf/1604.00289.pdf

Kaj_SotalaJun 5 20170

This is a good article on AI from a cog sci perspective: https://arxiv.org/pdf/1604.00289.pdf

Yay, correctly guessed which article that was before clicking on the link. :-)

Gram_StoneJun 7 20170

Also, have you seen this AI Impacts post and the interview it links to? I would expect so, but it seems worth asking. Tom Griffiths makes similar points to the ones you've made here.

Kaj_SotalaJun 9 20170

I'd seen that, but re-reading it was useful. :)

WillPearsonJun 17 20175

There has recently been an effort started to make the pipeline better for getting people up to speed with AGI safety. I'm trying to champion a broad view of AGI safety including psychology.

Would anyone be interested in providing digested content? It would also be good to have an exit for the pipeline for psychology people interested in AGI. Would that be FHI? Who else would be good to talk to about what is required.

Geoffrey MillerAug 16 20174

Excellent post; as a psych professor I agree that psych and cognitive science are relevant to AI safety, and it's surprising that our insights from studying animal and human minds for the last 150 years haven't been integrating into mainstream AI safety work.

The key problem, I think, is that AI safety seems to assume that there will be some super-powerful deep learning system attached to some general-purpose utility function connected to a general-purpose reward system, and we have to get the utility/reward system exactly aligned with our moral interests.

That's not the way any animal mind has ever emerged in evolutionary history. Instead, minds emerge as large numbers of domain-specific mental adaptations to solve certain problems, and they're coordinated by superordinate 'modes of operation' called emotions and motivations. These can be described as implementing utility functions, but that's not their function -- promoting reproductive success is. Some animals also evolve some 'moral machinery' for nepotism, reciprocity, in-group cohesion, norm-policing, and virtue-signaling, but those mechanisms are also distinct and often at odds.

Maybe we'll be able to design AGIs that deviate markedly from this standard 'massively modular' animal-brain architecture, but we have no proof-of-concept for thinking that will work. Until then, it seems useful to consider what psychology has learned about preferences, motivations, emotions, moral intuitions, and domain-specific forms of reinforcement learning.

[anonymous]Jun 8 20173

I got linked here while browsing a pretty random blog on deep learning, you're getting attention! (https://medium.com/intuitionmachine/seven-deadly-sins-and-ai-safety-5601ae6932c3)

Kaj_SotalaJun 9 20171

Neat, thanks for the find. :)

D_M_xJun 6 20173

What is your model of why other people in the AI safety field disagree with you/don't consider this as important as you?

Kaj_SotalaJun 6 20179

My main guess is "they mostly come from a math/CS background so haven't looked at this through a psych/cogsci perspective and seen how it could be useful".

That said, some of my stuff linked to above has been mostly met with a silence, and while I presume it's a question of inferential silence - a sufficiently long inferential distance that a claim doesn't provoke even objections, just uncomprehending or indifferent silence - there is also the possibility of me just being so wrong about the usefulness of my ideas that nobody's even bothering to tell me.

Diego_CaleiroJun 8 20172

Kaj, I tend to promote your stuff a fair amount to end the inferential silence, and it goes without saying that I agree with all else you said.

Don't give up on your ideas or approach. I am dispirited that there are so few people thinking like you do out there.

Peter McIntyreJun 20 20171

Hi Kaj,

Thanks for writing this. Since you mention some 80,000 Hours content, I thought I’d respond briefly with our perspective.

We had intended the career review and AI safety syllabus to be about what you’d need to do from a technical AI research perspective. I’ve added a note to clarify this.

We agree that there a lot of approaches you could take to tackle AI risk, but currently expect that technical AI research will be where a large amount of the effort is required. However, we’ve also advised many people on non-technical routes to impacting AI safety, so don’t think it’s the only valid path by any means.

We’re planning on releasing other guides and paths for non-technical approaches, such as the AI safety policy career guide, which also recommends studying political science and public policy, law, and ethics, among others.

Kaj_SotalaJun 20 20175

Hi Peter, thanks for the response!

Your comment seems to suggest that you don't think the arguments in my post are relevant for technical AI safety research. Do you feel that I didn't make a persuasive case for psych/cogsci being relevant for value learning/multi-level world-models research, or do you not count these as technical AI safety research? Or am I misunderstanding you somehow?

I agree that the "understanding psychology may help persuade more people to work on/care about AI safety" and "analyzing human intelligences may suggest things about takeoff scenarios" points aren't related to technical safety research, but value learning and multi-level world-models are very much technical problems to me.

Peter McIntyreJun 22 20174

We agree these are technical problems, but for most people, all else being equal, it seems more useful to learn ML rather than cog sci/psych. Caveats:

Personal fit could dominate this equation though, so I'd be excited about people tackling AI safety from a variety of fields.
It's an equilibrium. The more people already attacking a problem using one toolkit, the more we should be sending people to learn other toolkits to attack it.

Kaj_SotalaJun 22 20173

it seems more useful to learn ML rather than cog sci/psych.

Got it. To clarify: if the question as framed as "should AI safety researchers learn ML, or should they learn cogsci/psych", then I agree that it seems better to learn ML.

[anonymous]Jul 25 20170

I see quite a bunch of relevant cognitive science work these days, e.g. this: http://saxelab.mit.edu/resources/papers/Kleiman-Weiner.etal.2017.pdf

Kaj_SotalaJul 25 20170

That's super-neat! Thanks.

kbogJun 5 20170

Defining just what it is that human values are. The project of AI safety can roughly be defined as "the challenge of ensuring that AIs remain aligned with human values", but it's also widely acknowledged that nobody really knows what exactly human values are - or at least, not to a sufficient extent that they could be given a formal definition and programmed into an AI. This seems like one of the core problems of AI safety, and one which can only be understood with a psychology-focused research program.

Defining human values, at least in the prescriptive sense, is not a psychological issue at all. It's a philosophical issue. Certain philosophers have believed that psychology can inform moral philosophy, but it's a stretch to say that even someone like Joshua Greene's work in experimental philosophy is a psychology-focused research program, and the whole approach is dubious - see, e.g., The Normative Insignificance of Neuroscience (http://www.pgrim.org/philosophersannual/29articles/berkerthenormative.pdf). Of course a new wave of pop-philosophers and internet bloggers have made silly claims that moral philosophy can be completely solved by psychology and neuroscience but this extreme view is ridiculous on its face.

What people believe doesn't tell us much about what actually is good. The challenge of AI safety is the challenge of making AI that actually does what is right, not AI that does whatever it's told to do by a corrupt government, a racist constituency, and so on.

Kaj_SotalaJun 11 20174

It took me a while to respond to this because I wanted to take the time to read "The Normative Insignificance of Neuroscience" first. Having now read it, I'd say that I agree with its claims with regard to criticism of Greene's approach. I don't think it disproves the notion of psychology being useful for defining human values, though, for I think there's an argument for psychology's usefulness that's entirely distinct from the specific approach that Greene is taking.

I start from the premise that the goal of moral philosophy is to develop a set of explicit principles that would tell us what is good. Now this is particularly relevant for designing AI, because we also want our AIs to follow those principles. But it's noteworthy that at their current state, none of the existing ethical theories are up to the task of giving us such a set of principles that, when programmed into an AI, would actually give results that could be considered "good". E.g. Muehlhauser & Helm 2012:

Let us consider the implications of programming a machine superoptimizer to implement particular moral theories.

We begin with hedonistic utilitarianism, a theory still defended today (Tännsjö 1998). If a machine superoptimizer’s goal system is programmed to maximize pleasure, then it might, for example, tile the local universe with tiny digital minds running continuous loops of a single, maximally pleasurable experience. We can’t predict exactly what a hedonistic utilitarian machine superoptimizer would do, but we think it seems likely to produce unintended consequences, for reasons we hope will become clear. [...]

Suppose “pleasure” was specified (in the machine superoptimizer’s goal system) in terms of our current understanding of the human neurobiology of pleasure. Aldridge and Berridge (2009) report that according to “an emerging consensus,” pleasure is “not a sensation” but instead a “pleasure gloss” added to sensations by “hedonic hotspots” in the ventral pallidum and other regions of the brain. A sensation is encoded by a particular pattern of neural activity, but it is not pleasurable in itself. To be pleasurable, the sensation must be “painted” with a pleasure gloss represented by additional neural activity activated by a hedonic hotspot (Smith et al. 2009).

A machine superoptimizer with a goal system programmed to maximize human pleasure (in this sense) could use nanotechnology or advanced pharmaceuticals or neurosurgery to apply maximum pleasure gloss to all human sensations—a scenario not unlike that of plugging us all into Nozick’s experience machines (Nozick 1974, 45). Or, it could use these tools to restructure our brains to apply maximum pleasure gloss to one consistent experience it could easily create for us, such as lying immobile on the ground.

Or suppose “pleasure” was specified more broadly, in terms of anything that functioned as a reward signal—whether in the human brain’s dopaminergic reward system (Dreher and Tremblay 2009), or in a digital mind’s reward signal circuitry (Sutton and Barto 1998). A machine superoptimizer with the goal of maximizing reward signal scores could tile its environs with trillions of tiny minds, each one running its reward signal up to the highest number it could. [...]

What if a machine superoptimizer was programmed to maximize desire satisfaction in humans? Human desire is implemented by the dopaminergic reward system (Schroeder 2004; Berridge, Robinson, and Aldridge 2009), and a machine superoptimizer mizer could likely get more utility by (1) rewiring human neurology so that we attain maximal desire satisfaction while lying quietly on the ground than by (2) building and maintaining a planet-wide utopia that caters perfectly to current human preferences. [...]

Consequentialist designs for machine goal systems face a host of other concerns (Shulman, Jonsson, and Tarleton 2009b), for example the difficulty of interpersonal comparisons of utility (Binmore 2009), and the counterintuitive implications of some methods of value aggregation (Parfit 1986; Arrhenius 2011). [...]

We cannot show that every moral theory yet conceived would produce substantially unwanted consequences if used in the goal system of a machine superoptimizer. Philosophers have been prolific in producing new moral theories, and we do not have the space here to consider the prospects (for use in the goal system of a machine superoptimizer) for a great many modern moral theories. These include rule utilitarianism (Harsanyi 1977), motive utilitarianism (Adams 1976), two-level utilitarianism (Hare 1982), prioritarianism (Arneson 1999), perfectionism (Hurka 1993), welfarist utilitarianism (Sen 1979), virtue consequentialism (Bradley 2005), Kantian consequentialism (Cummiskey 1996), global consequentialism (Pettit and Smith 2000), virtue theories (Hursthouse 2012), contractarian theories (Cudd 2008), Kantian deontology (R. Johnson 2010), and Ross’ prima facie duties (Anderson, Anderson, and Armen 2006).

Yet the problem remains: the AI has to be programmed with some definition of what is good.

Now this alone isn't yet sufficient to show that philosophy wouldn't be up to the task. But philosophy has been trying to solve ethics for at least the last 2500 years, and it doesn't look like there would have been any major progress towards solving it. The PhilPapers survey didn't show any of the three major ethical schools (consequentialism, deontology, virtue ethics) being significantly more favored by professional philosophers than the others, nor does anyone - to my knowledge - even know what a decisive theoretical argument in favor of one of them could be.

And at this point, we have pretty good theoretical reasons for believing that the traditional goal of moral philosophy - "developing a set of explicit principles for telling us what is good" - is in fact impossible. Or at least, it's impossible to develop a set of principles that would be simple and clear enough to write down in human-understandable form and which would give us clear answers to every situation, because morality is too complicated for that.

We've already seen this in trying to define concepts: as philosophy noted a long time ago, you can't come up with a set of explicit rules that would define even any concept even as simple as "man" in such a way that nobody could develop a counterexample. "The Normative Insignificance of Neuroscience" also notes that the situation in ethics looks similar to the situation with trying to define many other concepts:

... what makes the trolley problem so hard—indeed, what has led some to despair of our ever finding a solution to it—is that for nearly every principle that has been proposed to explain our intuitions about trolley cases, some ingenious person has devised a variant of the classic trolley scenario for which that principle yields counterintuitive results. Thus as with the Gettier literature in epistemology and the causation and personal identity literatures in metaphysics, increasingly baroque proposals have given way to increasingly complex counterexamples, and though some have continued to struggle with the trolley problem, many others have simply given up and moved on to other topics.

Yet human brains do manage to successfully reason with concepts, despite it being impossible to develop a set of explicit necessary and sufficient criteria. The evidence from both psychology and artificial intelligence (where we've managed to train neural nets capable of reasonably good object recognition) is that a big part of how they do it is by building up complicated statistical models of what counts as a "man" or "philosopher" or whatever.

So given that

we can't build explicit verbal models of what a concept is * but we can build machine-learning algorithms that use complicated statistical analysis to identify instances of a concept

and

defining morality looks similar to defining concepts, in that we can't build explicit verbal models of what morality is

it would seem reasonable to assume that

we can build machine-learning algorithms that can learn to define morality, in that it can give such answers to moral dilemmas that a vast majority of people would consider them acceptable

But here it looks likely that we need information from psychology to narrow down what those models should be. What humans consider to be good has likely been influenced by a number of evolutionary idiosyncrasies, so if we want to come up with a model of morality that most humans would agree with, then our AI's reasoning process should take into account those considerations. And we've already established that defining those considerations on a verbal level looks insufficient - they have to be established on a deeper level, of "what are the actual computational processes that are involved when the brain computes morality".

Yes, I am here assuming "what is good" to equate to "what do human brains consider good", in a way that may be seen as reducing to "what would human brains accept as a persuasive argument for what is good". You could argue that this is flawed, because it's getting dangerously close to defining "good" by social consensus. But then again, the way the field of ethics itself proceeds is basically the same: a philosopher presents an argument for what is good, another attacks it, if the argument survives attacks and is compelling then it is eventually accepted. For empirical facts we can come up with objective tests, but for moral truths it looks to me unavoidable - due to the is-ought gap - that some degree of "truth by social consensus" is the only way of figuring out what the truth is, even in principle.

LanceSBushJun 11 20173

Hi Kaj,

Even if we found the most agreeable available set of moral principles, that amount may turn out not to constitute the vast majority of people. It may not even reach a majority at all. It is possible that there simply is no moral theory that is acceptable to most people. People may just have irreconcilable values. You state that:

“For empirical facts we can come up with objective tests, but for moral truths it looks to me unavoidable - due to the is-ought gap - that some degree of "truth by social consensus" is the only way of figuring out what the truth is, even in principle.”

Suppose this is the best we can do. It doesn’t follow that the outputs of this exercise are “true.” I am not sure in what sense this would constitute a true set of moral principles.

More importantly, it is unclear whether or not I have any rational or moral obligation to care about the outputs of this exercise. I do not want to implement the moral system that most people find agreeable. On the contrary, I want everyone to share my moral views, because this is what, fundamentally, I care about. The notion that we should care about what others care about, and implement whatever the consensus is, seems to presume a very strong and highly contestable metaethical position that I do not accept and do not think others should accept.

Kaj_SotalaJun 11 20171

Even if we found the most agreeable available set of moral principles, that amount may turn out not to constitute the vast majority of people. It may not even reach a majority at all. It is possible that there simply is no moral theory that is acceptable to most people.

It's certainly possible that this is the case, but looking for the kind of solution that would satisfy as many people as possible certainly seems like the thing we should try first and only give it up if it seems impossible, no?

More importantly, it is unclear whether or not I have any rational or moral obligation to care about the outputs of this exercise. I do not want to implement the moral system that most people find agreeable.

Well, the ideal case would be that the AI would show you a solution which it had found, and upon inspecting it and considering it through you'd be convinced that this solution really does satisfy all the things you care about - and all the things that most other people care about, too.

From a more pragmatic perspective, you could try to insist on an AI which implemented your values specifically - but then everyone else would also have a reason to fight to get an AI which fulfilled their values specifically, and if it was you versus everyone else in the world, it seems like a pretty high probability that somebody else would win. Which means that your values would have a much higher chance of getting shafted than if everyone had agreed to go for a solution which tried to take into everyone's preferences into account.

And of course, in the context of AI, everyone insisting on their own values and their values only means that we'll get arms races, meaning a higher probability of a worse outcome for everyone.

LanceSBushJun 12 20175

It's certainly possible that this is the case, but looking for the kind of solution that would satisfy as many people as possible certainly seems like the thing we should try first and only give it up if it seems impossible, no?

Sure. That isn't my primary objection though. My main objection is that that even if we pursue this project, it does not achieve the heavy metaethical lifting you were alluding to earlier. It doesn’t demonstrate nor provide any particularly good reason to regard the outputs of this process as moral truth.

Well, the ideal case would be that the AI would show you a solution which it had found, and upon inspecting it and considering it through you'd be convinced that this solution really does satisfy all the things you care about - and all the things that most other people care about, too.

I want to convert all matter in the universe to utilitronium. Do you think it is likely that an AI that factored in the values of all humans would yield this as its solution? I do not. Since I think the expected utility of most other likely solutions, given what I suspect about other people's values, is far less than this, I would view almost any scenario other than imposing my values on everyone else to be a cosmic disaster.

Kaj_SotalaJun 12 20170

My main objection is that that even if we pursue this project, it does not achieve the heavy metaethical lifting you were alluding to earlier. It doesn’t demonstrate nor provide any particularly good reason to regard the outputs of this process as moral truth.

Well, what alternative would you propose? I don't see how it would even be possible to get any stronger evidence for the moral truth of a theory, than the failure of everyone to come up with convincing objections to it even after extended investigation. Nor a strategy for testing the truth which wouldn't at some point reduce to "test X gives us reason to disagree with the theory".

I would understand your disagreement if you were a moral antirealist, but your comments seem to imply that you do believe that a moral truth exists and that it's possible to get information about it, and that it's possible to do "heavy metaethical lifting". But how?

I want to convert all matter in the universe to utilitronium.

I think anything as specific as this sounds worryingly close to wanting an AI to implement favoritepoliticalsystem.

What the first communist revolutionaries thought would happen, as the empirical consequence of their revolution, was that people’s lives would improve: laborers would no longer work long hours at backbreaking labor and make little money from it. This turned out not to be the case, to put it mildly. But what the first communists thought would happen, was not so very different from what advocates of other political systems thought would be the empirical consequence of their favorite political systems. They thought people would be happy. They were wrong.

Now imagine that someone should attempt to program a “Friendly” AI to implement communism, or libertarianism, or anarcho-feudalism, or favoritepoliticalsystem, believing that this shall bring about utopia. People’s favorite political systems inspire blazing suns of positive affect, so the proposal will sound like a really good idea to the proposer.

We could view the programmer’s failure on a moral or ethical level—say that it is the result of someone trusting themselves too highly, failing to take into account their own fallibility, refusing to consider the possibility that communism might be mistaken after all. But in the language of Bayesian decision theory, there’s a complementary technical view of the problem. From the perspective of decision theory, the choice for communism stems from combining an empirical belief with a value judgment. The empirical belief is that communism, when implemented, results in a specific outcome or class of outcomes: people will be happier, work fewer hours, and possess greater material wealth. This is ultimately an empirical prediction; even the part about happiness is a real property of brain states, though hard to measure. If you implement communism, either this outcome eventuates or it does not. The value judgment is that this outcome satisfices or is preferable to current conditions. Given a different empirical belief about the actual realworld consequences of a communist system, the decision may undergo a corresponding change.

We would expect a true AI, an Artificial General Intelligence, to be capable of changing its empirical beliefs (or its probabilistic world-model, et cetera). If somehow Charles Babbage had lived before Nicolaus Copernicus, and somehow computers had been invented before telescopes, and somehow the programmers of that day and age successfully created an Artificial General Intelligence, it would not follow that the AI would believe forever after that the Sun orbited the Earth. The AI might transcend the factual error of its programmers, provided that the programmers understood inference rather better than they understood astronomy. To build an AI that discovers the orbits of the planets, the programmers need not know the math of Newtonian mechanics, only the math of Bayesian probability theory.

The folly of programming an AI to implement communism, or any other political system, is that you’re programming means instead of ends. You’re programming in a fixed decision, without that decision being re-evaluable after acquiring improved empirical knowledge about the results of communism. You are giving the AI a fixed decision without telling the AI how to re-evaluate, at a higher level of intelligence, the fallible process which produced that decision.

LanceSBushJun 12 20171

Whoops. I can see how my responses didn't make my own position clear.

I am an anti-realist, and I think the prospects for identifying anything like moral truth are very low. I favor abandoning attempts to frame discussions of AI or pretty much anything else in terms of converging on or identifying moral truth.

I consider it a likely futile effort to integrate important and substantive discussions into contemporary moral philosophy. If engaging with moral philosophy introduces unproductive digressions/confusions/misplaced priorities into the discussion it may do more harm than good.

I'm puzzled by this remark:

I think anything as specific as this sounds worryingly close to wanting an AI to implement favoritepoliticalsystem.

I view utilitronium as an end, not a means. It is a logical consequence of wanting to maximize aggregate utility and is more or less a logical entailment of my moral views. I favor the production of whatever physical state of affairs yields the highest aggregate utility. This is, by definition, "utilitronium." If I'm using the term in an unusual way I'm happy to propose a new label that conveys what I have in mind.

Lukas_GloorAug 31 20174

I totally sympathize with your sentiment and feel the same way about incorporating other people's values in a superintelligent AI. If I just went with my own wish list for what the future should look like, I would not care about most other people's wishes. I feel as though many other people are not even trying to be altruistic in the relevant sense that I want to be altruistic, and I don't experience a lot of moral motivation to help accomplish people's weird notions of altruistic goals, let alone any goals that are clearly non-altruistically motivated. In the same way I'd feel no strong (even lower, admittedly) motivation to help make the dreams of baby eating aliens come true.

Having said that, I am confident that it would screw things up for everyone if I followed a decision policy that does not give weight to other people's strongly held moral beliefs. It is already hard enough to not mess up AI alignment in a way that makes things worse for everyone, and it would become much harder still if we had half a dozen or more competing teams who each wanted to get their idiosyncratic view of the future installed.

BTW note that value differences are not the only thing that can get you into trouble. If you hold an important empirical beliefs that others do not share, and you cannot convince them of it, then it may appear to you as though you're justified to do something radical about it, but that's even more likely to be a bad idea because the reasons for taking peer disagreement seriously are stronger in empirical domains of dispute than in normative ones.

There is a sea of considerations from Kantianism, contractualism, norms for stable/civil societies and advanced decision theory that, while each line of argument seems tentative on its own and open to skepticism, all taken together point very strongly into the same direction, namely that things will be horrible if we fail to cooperate with each other and that cooperating is often the truly rational thing to do. You're probably already familiar with a lot of this, but for general reference, see also this recent paper that makes a particularly interesting case for particularly strong cooperation, as well as other work on the topic, e.g. here and here.

This is why I believe that people interested in any particular version of utilitronium should not override AI alignment procedures last minute just to get an extra large share of cosmic stakes for their own value system, and why I believe that people like me, who care primarily about reducing suffering, should not increase existential risk. Of course, all of this means that people who want to benefit human values in general should take particular caution to make sure that idiosyncratic value systems that may diverge from them also receive consideration and gains from trade.

This piece I wrote recently is relevant to cooperation and the question of whether values are subjective or not, and how much convergence we should expect and to what extent value extrapolation procedures bake in certain (potentially unilateral) assumptions.

Kaj_SotalaJun 16 20172

I am an anti-realist, and I think the prospects for identifying anything like moral truth are very low. I favor abandoning attempts to frame discussions of AI or pretty much anything else in terms of converging on or identifying moral truth.

Ah, okay. Well, in that case you can just read my original comment as an argument for why one would want to use psychology to design an AI that was capable of correctly figuring out just a single person's values and implementing them, as that's obviously a prerequisite for figuring out everybody's values. The stuff that I had about social consensus was just an argument aimed at moral realists, if you're not one then it's probably not relevant for you.

(my values would still say that we should try to take everyone's values into account, but that disagreement is distinct from the whole "is psychology useful for value learning" question)

I'm puzzled by this remark:

Sorry, my mistake - I confused utilitronium with hedonium.

kbogJun 21 20170

It took me a while to respond to this because I wanted to take the time to read "The Normative Insignificance of Neuroscience" first.

Great!

But philosophy has been trying to solve ethics for at least the last 2500 years, and it doesn't look like there would have been any major progress towards solving it. The PhilPapers survey didn't show any of the three major ethical schools (consequentialism, deontology, virtue ethics) being significantly more favored by professional philosophers than the others, nor does anyone - to my knowledge - even know what a decisive theoretical argument in favor of one of them could be.

Restricting analysis to the Western tradition, 2500 years ago we barely had any conception of virtue ethics. Our contemporary conceptions of virtue ethics are much better than the ones the Greeks had. Meanwhile, deontological and consequentialist ethics did not even exist back then. Even over recent decades there has been progress in these positions. And plenty of philosophers know what a decisive theoretical argument could be: either they purport to have identified such arguments, or they think it would be an argument that showed the theory to be well supported by intuitions, reason, or some other evidence, not generally different from what an argument for a non-moral philosophical theory would look like.

it's noteworthy that at their current state, none of the existing ethical theories are up to the task of giving us such a set of principles that, when programmed into an AI, would actually give results that could be considered "good".

It would (arguably) give results that people wouldn't like, but assuming that the moral theory is correct and the machine understands it, almost necessarily it would do morally correct things. If you object to its actions then you are already begging the question by asserting that we ought to be focused on building a machine that will do things that we like regardless of whether they are moral. Moreover, you could tell a similar story for any values that people have. Whether you source them from real philosophy or from layman ethics wouldn't change the problems of optimization and systematization.

And at this point, we have pretty good theoretical reasons for believing that the traditional goal of moral philosophy - "developing a set of explicit principles for telling us what is good" - is in fact impossible.

But that's an even stronger claim than the one that moral philosophy hasn't progressed towards such a goal. What reasons are there?

Or at least, it's impossible to develop a set of principles that would be simple and clear enough to write down in human-understandable form and which would give us clear answers to every situation, because morality is too complicated for that.

That's contentious, but some philosophers believe that, and there are philosophies which adhere to that. The problem of figuring out how to make a machine behave morally according to those premises is still a philosophical one, just one based on other ideas in moral philosophy besides explicit rule-based ones.

Yes, I am here assuming "what is good" to equate to "what do human brains consider good", in a way that may be seen as reducing to "what would human brains accept as a persuasive argument for what is good". You could argue that this is flawed, because it's getting dangerously close to defining "good" by social consensus. But then again, the way the field of ethics itself proceeds is basically the same: a philosopher presents an argument for what is good, another attacks it, if the argument survives attacks and is compelling then it is eventually accepted.

Except the field of ethics does it with actual arguments among experts in the field. You could make the same story for any field: truths about physics can be determined by social consensus, since that's just what the field of physics is, a physicist presents an experiment or hypothesis, another attacks it, if the hypothesis survives the attacks and is compelling then it is eventually accepted! And so on for all non-moral fields of inquiry as well. I don't see why you think ethics would be special; basically everything can be modeled like this. But that's ridiculous. We don't look at social consensus for all forms of inquiry, because there is a difference between what ordinary people believe and what people believe when they are trained professionals in the subject.

for moral truths it looks to me unavoidable - due to the is-ought gap - that some degree of "truth by social consensus" is the only way of figuring out what the truth is, even in principle.

Then why don't you believe in morality by social consensus? (Or do you? It seems like you're probably not, given that you're an effective altruist. What do you think about animal rights, or Sharia law?)

Kaj_SotalaJul 9 20170

(We seem to be talking past each other in some weird way; I'm not even sure what exactly it is that we're disagreeing over.)

It would (arguably) give results that people wouldn't like, but assuming that the moral theory is correct and the machine understands it, almost necessarily it would do morally correct things.

Well sure, if we proceed from the assumption that the moral theory really was correct, but the point was that none of those proposed theories has been generally accepted by moral philosophers.

But that's an even stronger claim than the one that moral philosophy hasn't progressed towards such a goal. What reasons are there?

I gave one in the comment? That philosophy has accepted that you can't give a set of human-comprehensible set of necessary and sufficient criteria for concepts, and if you want a system for classifying concepts you have to use psychology and machine learning; and it looks like morality is similar.

Except the field of ethics does it with actual arguments among experts in the field. You could make the same story for any field: truths about physics can be determined by social consensus, since that's just what the field of physics is, a physicist presents an experiment or hypothesis, another attacks it, if the hypothesis survives the attacks and is compelling then it is eventually accepted! And so on for all non-moral fields of inquiry as well. I don't see why you think ethics would be special; basically everything can be modeled like this. But that's ridiculous. We don't look at social consensus for all forms of inquiry, because there is a difference between what ordinary people believe and what people believe when they are trained professionals in the subject.

I'm not sure what exactly you're disagreeing with? It seems obvious to me that physics does indeed proceed by social consensus in the manner you describe. Someone does an experiment, then others replicate the experiment until there is consensus that this experiment really does produce these results; somebody proposes a hypothesis to explain the experimental results, others point out holes in that hypothesis, there's an extended back-and-forth conversation and further experiments until there is a consensus that the modified hypothesis really does explain the results and that it can be accepted as an established scientific law. And the same for all other scientific and philosophical disciplines. I don't think that ethics is special in that sense.

Sure, there is a difference between what ordinary people believe and what people believe when they're trained professionals: that's why you look for a social consensus among the people who are trained professionals and have considered the topic in detail, not among the general public.

Then why don't you believe in morality by social consensus? (Or do you? It seems like you're probably not, given that you're an effective altruist.

I do believe in morality by social consensus, in the same manner as I believe in physics by social consensus: if I'm told that the physics community has accepted it as an established fact that e=mc^2 and that there's no dispute or uncertainty about this, then I'll accept it as something that's probably true. If I thought that it was particularly important for me to make sure that this was correct, then I might look up the exact reasoning and experiments used to determine this and try to replicate some of them, until I found myself to also be in consensus with the physics community.

Similarly, if someone came to me with a theory of what was moral and it turned out that the entire community of moral philosophers had considered this theory and accepted it after extended examination, and I could also not find any objections to that and found the justifications compelling, then I would probably also accept the moral theory.

But to my knowledge, nobody has presented a conclusive moral theory that would satisfy both me and nearly all moral philosophers and which would say that it was wrong to be an effective altruist - quite the opposite. So I don't see a problem in being an EA.

kbogJul 14 20170

Well sure, if we proceed from the assumption that the moral theory really was correct, but the point was that none of those proposed theories has been generally accepted by moral philosophers.

Your point was that "none of the existing ethical theories are up to the task of giving us such a set of principles that, when programmed into an AI, would actually give results that could be considered "good"." But this claim is simply begging the question by assuming that all the existing theories are false. And to claim that a theory would have bad moral results is different from claiming that it's not generally accepted by moral philosophers. It's plausible that a theory would have good moral results, in virtue of it being correct, while not being accepted by many moral philosophers. Since there is no dominant moral theory, this is necessarily the case as long as some moral theory is correct.

I gave one in the comment? That philosophy has accepted that you can't give a set of human-comprehensible set of necessary and sufficient criteria for concepts

If you're referring to ethics, no, philosophy has not accepted that you cannot give such an account. You believe this, on the basis of your observation that philosophers give different accounts of ethics. But that doesn't mean that moral philosophers believe it. They just don't think that the fact of disagreement implies that no such account can be given.

It seems obvious to me that physics does indeed proceed by social consensus in the manner you describe. Someone does an experiment, then others replicate the experiment until there is consensus that this experiment really does produce these results; somebody proposes a hypothesis to explain the experimental results, others point out holes in that hypothesis, there's an extended back-and-forth conversation and further experiments until there is a consensus that the modified hypothesis really does explain the results and that it can be accepted as an established scientific law. And the same for all other scientific and philosophical disciplines. I don't think that ethics is special in that sense.

So you haven't pointed out any particular features of ethics, you've merely described a feature of inquiry in general. This shows that your claim proves too much - it would be ridiculous to conduct physics by studying psychology.

Sure, there is a difference between what ordinary people believe and what people believe when they're trained professionals: that's why you look for a social consensus among the people who are trained professionals and have considered the topic in detail, not among the general public.

But that's not a matter of psychological inquiry, that's a matter of looking at what is being published in philosophy, becoming familiar with how philosophical arguments are formed, and staying in touch with current developments in the field. So you are basically describing studying philosophy. Studying or researching psychology will not tell you anything about this.

Kaj_SotalaJun 11 20170

Also, I find pretty compelling the argument that the classical definition of moral philosophy in trying to define "the good" is both impossible and not even a particularly good target to aim at, and that trying to find generally-agreeable moral solutions is something much more useful; and if we accept this argument, then moral psychology is relevant, because it can help us figure out generally-agreeable solutions.

As Martela (2017) writes:

...there is a deeper point in Williams's book that is even harder to rebut. Williams asks: What can an ethical theory do, if we are able to build a convincing case for one? He is skeptical about the force of ethical considerations and reminds us that even if we were to have a justified ethical theory, the person in question might not be concerned about it. Even if we could prove to some amoralists that what they are about to do is (a) against some universal ethical standard, (b) is detrimental to their own well-being, and/or (c) is against the demands of rationality or internal coherence, they still have the choice of whether to care about this or not. They can choose to act even if they know that what they are about to do is against some standard that they believe in. Robert Nozick—whom Williams quotes—describes this as follows: “Suppose that we show that some X he [the immoral man] holds or accepts or does commits him to behaving morally. He now must give up at least one of the following: (a) behaving immorally, (b) maintaining X, (c) being consistent about this matter in this respect. The immoral man tells us, ‘To tell you the truth, if I had to make the choice, I would give up being consistent’” (Nozick 1981, 408).

What Williams in effect says is that the noble task of finding ultimate justification for some ethical standards could not—even if it was successful—deliver any final argument in practical debates about how to behave. “Objective truth” would have only the motivational weight that the parties involved choose to give to it. It no longer is obvious what a philosophical justification of an ethical standard is supposed to do or even “why we should need such a thing” (Williams 1985, 23).

Yet when we look at many contemporary ethical debates, we can see that that they proceed as if the solutions to the questions they pose would matter. In most scientific disciplines the journal articles have a standard section called “practical bearings,” where the practical relevance of the accumulated results are discussed. Not so for metaethical articles, even though they otherwise simulate the academic and peer-reviewed writing style of scientific articles. When we read someone presenting a number of technical counterarguments against quasi-realist solutions to the Frege-Geach problem, there usually is no debate about what practical bearings the discussion would have, whether these arguments would be successful or not. Suppose that in some idealized future the questions posed by the Frege-Geach problem would be conclusively solved. A new argument would emerge that all parties would see as so valid and sound that they would agree that the problem has now been finally settled. What then? How would ordinary people behave differently, after the solution has been delivered to them? I would guess it is fair to say—at least until it is proven otherwise—that the outcome of these debates is only marginally relevant for any ordinary person's ethical life. [...]

This understanding of morality means that we have to think anew what moral inquiry should aim at. [...] Whatever justification can be given for one moral doctrine over the other, it has to be found in practice—simply because there are no other options available. Accordingly, for pragmatists, moral inquiry is in the end directed toward practice, its successfulness is ultimately judged by the practical bearings it has on people's experiences: “Unless a philosophy is to remain symbolic—or verbal—or a sentimental indulgence for a few, or else mere arbitrary dogma, its auditing of past experience and its program of values must take effect in conduct” (Dewey 1916, 315). Moral inquiry should thus aim at practice; its successfulness is ultimately measured by how it is able to influence people's moral outlook and behavior. [...]

Moral principles, ideals, rules, theories, or conclusions should thus be seen “neither as a cookbook, nor a remote calculus” (Pappas 1997, 546) but as instruments that we can use to understand our behavior and change it for the better. Instead of trying to discover the correct ethical theories, the task becomes one of designing the most functional ethical theories. Ethics serves certain functions in human lives and in societies, and the task is to improve its ability to serve these functions (Kitcher 2011b). In other words, the aim of ethical theorizing is to provide people with tools (see Hickman 1990, 113–14) that help them in living their lives in a good and ethically sound way. [...]

It is true that the lack of foundational principles in ethics denies the pragmatist moral philosopher the luxury of being objectively right in some moral question. In moral disagreements, a pragmatist cannot “solve” the disagreement by relying on some objective standards that deliver the “right” and final answer. But going back to Williams's argument raised at the beginning of this article, we can ask what would it help if we were to “solve” the problem. The other party still has the option to ignore our solution. Furthermore, despite the long history of ethics we still haven't found many objective standards or “final solutions” that everyone would agree on, and thus it seems that waiting for such standards to emerge is futile.

In practice, there seem to be two ways in which moral disagreements are resolved. First is brute force. In some moral disputes I am in a position in which I can force the other party to comply with my standards whether that other party agrees with me or not. The state with its monopoly on the legitimate use of violence can force its citizens to comply with certain laws even when the personal moral code of these citizens would disagree with the law. The second way to resolve a moral disagreement is to find some common ground, some standards that the other believes in, and start building from there a case for one's own position.

In the end, it might be beneficial that pragmatism annihilates the possibility of believing that I am absolutely right and the other party is absolutely wrong. As Margolis notes: “The most monstrous crimes the race has ever (been judged to have) perpetrated are the work of the partisans of ‘right principles’ and privileged revelation” (1996, 213). Instead of dismissing the other's perspective as wrong, one must try to understand it in order to find common ground and shared principles that might help in progressing the dialogue around the problem. If one really wants to change the opinion of the other party, instead of invoking some objective standards one should invoke some standards that the other already believes in. This means that one has to listen to the other person, try to see the world from his or her point of view. Only through understanding the other's perspective one can have a chance to find a way to change it—or to change one's own opinion, if this learning process should lead to that. One can aim to clarify the other's points of view, unveil their hidden assumptions and values, or challenge their arguments, but one must do this by drawing on principles and values that the other is already committed to if one wants to have a chance to have a real impact on the other's way of seeing the world, or actually to resolve the disagreement. I believe that this kind of approach, rather than a claim for a more objective position, has a much better chance of actually building common understanding around the moral issue at hand.

Gram_StoneJun 7 20172

Your comment reads strangely to me because your thoughts seem to fall into a completely different groove from mine. The problem statement is perhaps: write a program that does what-I-want, indefinitely. Of course, this could involve a great deal of extrapolation.

The fact that I am even aspiring to write such a program means that I am assuming that what-I-want can be computed. Presumably, at least some portion of the relevant computation, the one that I am currently denoting 'what-I-want', takes place in my brain. If I want to perform this computation in an AI, then it would probably help to at least be able to reproduce whatever portion of it takes place in my brain. People who study the mind and brain happen to call themselves psychologists and cognitive scientists. It's weird to me that you're arguing about how to classify Joshua Greene's research; I don't see why it matters whether we call it philosophy or psychology. I generally find it suspicious when anyone makes a claim of the form: "Only the academic discipline that I hold in high esteem has tools that will work in this domain." But I won't squabble over words if you think you're drawing important boundaries; what do you mean when you write 'philosophical'? Maybe you're saying that Greene, despite his efforts to inquire with psychological tools, elides into 'philosophy' anyway, so like, what's the point of pretending it's 'moral philosophy' via psychology? If that's your objection, that he 'just ends up doing philosophy anyway', then what exactly is he eliding into, without using the words 'philosophy' or 'philosophical'?

More generally, why is it that we should discard the approach because it hasn't made itself obsolete yet? Should the philosophers give up because they haven't made their approach obsolete yet either? If there's any reason that we should have more confidence in the ability of philosophers than cognitive scientists to contribute towards a formal specification of what-I-want, that reason is certainly not track record.

What people believe doesn't tell us much about what actually is good.

I don't think anyone who has read or who likely will read your comment equivocates testimony or social consensus with what-is-good.

The challenge of AI safety is the challenge of making AI that actually does what is right, not AI that does whatever it's told to do by a corrupt government, a racist constituency, and so on.

It's my impression that AI safety researchers are far more concerned about unaligned AGIs killing everyone than they are about AGIs that are successfully designed by bad actors to do a specific, unimaginative thing without killing themselves and everyone else in the process.

Of course a new wave of pop-philosophers and internet bloggers have made silly claims that moral philosophy can be completely solved by psychology and neuroscience but this extreme view is ridiculous on its face.

Bleck, please don't ever give me a justification to link a Wikipedia article literally named pooh-pooh.

kbogJun 10 20170

The problem statement is perhaps: write a program that does what-I-want, indefinitely

No, the problem statement is write a program that does what is right.

It's weird to me that you're arguing about how to classify Joshua Greene's research; I don't see why it matters whether we call it philosophy or psychology

Then you missed the point of what I said, since I wasn't talking about what to call it, I was talking about the tools and methods it uses. The question is what people ought to be studying and learning.

I generally find it suspicious when anyone makes a claim of the form: "Only the academic discipline that I hold in high esteem has tools that will work in this domain."

If you want to solve a philosophical problem then you're going to have to do philosophy. Psychology is for solving psychological problems. It's pretty straightforward.

what do you mean when you write 'philosophical'?

I mean the kind of work that is done in philosophy departments, and which would be studied by someone who was told "go learn about moral philosophy".

Maybe you're saying that Greene, despite his efforts to inquire with psychological tools, elides into 'philosophy' anyway

Yes, that's true by his own admission (he affirms in his reply to Berker that the specific cognitive model he uses is peripheral to the main normative argument) and is apparent if you look at his work.

If that's your objection, that he 'just ends up doing philosophy anyway', then what exactly is he eliding into, without using the words 'philosophy' or 'philosophical'?

He's eliding into normative arguments about morality, rather than merely describing psychological or cognitive processes.

More generally, why is it that we should discard the approach because it hasn't made itself obsolete yet?

I don't know what you are talking about, since I said nothing about obsolescence.

I don't think anyone who has read or who likely will read your comment equivocates testimony or social consensus with what-is-good.

Great! Then they'll acknowledge that studying testimony and social consensus is not studying what is good.

It's my impression that AI safety researchers are far more concerned about unaligned AGIs killing everyone than they are about AGIs that are successfully designed by bad actors to do a specific, unimaginative thing without killing themselves and everyone else in the process.

Rather than bad actors needing to be restrained by good actors, which is neither a psychological nor a philosophical problem, the problem is that the very best actors are flawed and will produce flawed machines if they don't do things correctly.

please don't ever give me a justification to link a Wikipedia article literally named pooh-pooh.

Would you like to me to explicitly explain why the new wave of pop-philosophers and internet bloggers who think that moral philosophy can be completely solved by psychology and neuroscience don't know what they're talking about? It's not taken seriously; I didn't go into detail because I was unsure if anyone around here took it seriously.

LanceSBushJun 6 20172

I agree that defining human values is a philosophical issue, but I would not describe it as "not a psychological issue at all." It is in part a psychological issue insofar as understanding how people conceive of values is itself an empirical question. Questions about individual and intergroup differences in how people conceive of values, distinguish moral from nonmoral norms, etc. cannot be resolved by philosophy alone.

I am sympathetic to some of the criticisms of Greene's work, but I do not think Berker's critique is completely correct, though explaining why I think Greene and others are correct in thinking that psychology can inform moral philosophy in detail would call for a rather titanic post.

The tl;dr point I'd make is that yes, you can draw philosophical conclusions from empirical premises, provided your argument is presented as a conditional one in which you propose that certain philosophical positions are dependent on certain factual claims. If anyone else accepts those premises, then empirical findings that confirm or disconfirm those factual claims can compel specific philosophical conclusions. A toy version of this would be the following:

P1: If the sky is blue, then utilitarianism is true. P2: The sky is blue. C: Therefore, utilitarianism is true.

If someone accepts P1, and if P2 is an empirical claim, then empirical evidence for/against P2 bears on the conclusion.

This is the kind of move Greene wants to make.

The slightly longer version of what I'd say to a lot of Greene's critics is that they misconstrue Greene's arguments if they think he is attempting to move straight from descriptive claims to normative claims. In arguing for the primacy of utilitarian over deontological moral norms, Greene appeals the presumptive shared premise between himself and his interlocutors that, on reflection, they will reject beliefs that are the result of epistemically dubious processes but retain those that are the result of epistemically justified processes.

If they share his views about what processes would in principle be justified/not justified, and if he can demonstrate that utilitarian judgments are reliably the result of justified processes but deontological judgments are not, then he has successfully appealed to empirical findings to draw a philosophical conclusion: that utilitarian judgments are justified and deontological ones are not. One could simply reject his premises about what constitutes justifed/unjustified grounds for belief, and in that case his argument would not be convincing. I don't endorse his conclusions because I think his empirical findings are not compelling; not because I think he's made any illicit philosophical moves.

kbogJun 6 20171

The tl;dr point I'd make is that yes, you can draw philosophical conclusions from empirical premises, provided your argument is presented as a conditional one in which you propose that certain philosophical positions are dependent on certain factual claims.

You can do that if you want, but (1) it's still a narrow case within a much larger philosophical framework and (2) such cases are usually pretty simple and don't require sophisticated knowledge of psychology.

The slightly longer version of what I'd say to a lot of Greene's critics is that they misconstrue Greene's arguments if they think he is attempting to move straight from descriptive claims to normative claims.

To the contrary, Berker criticizes Greene precisely because his neuroscientific work is hardly relevant to the moral argument he's making. You don't need a complex account of neuroscience or psychology to know that people's intuitions in the trolley problem are changing merely because of an apparently non-significant change in the situation. Philosophers knew that a century ago.

If they share his views about what processes would in principle be justified/not justified, and if he can demonstrate that utilitarian judgments are reliably the result of justified processes but deontological judgments are not, then he has successfully appealed to empirical findings to draw a philosophical conclusion: that utilitarian judgments are justified and deontological ones are not.

But nobody believes that judgements are correct or wrong merely because of the process that produces them. That just produces grounds for skepticism that the judgements are reliable - and it is skepticism of a sort that was already known without any reference to psychology, for instance through Plantinga's evolutionary argument against naturalism or evolutionary debunking arguments.

Also it's worth clarifying that Greene only deals with a particular instance of a deontological judgement rather than deontological judgements in general.

One could simply reject his premises about what constitutes justifed/unjustified grounds for belief, and in that case his argument would not be convincing.

It's only a question of moral epistemology, so you could simply disagree on how he talks about intuitions or abandon the idea altogether (https://global.oup.com/academic/product/philosophy-without-intuitions-9780199644865?cc=us&lang=en&).

Again, it's worth stressing that this is a fairly narrow and methodologically controversial area of moral philosophy. There is a difference between giving an opinion on a novel approach to a subject, and telling a group of people what subject they need to study in order to be well-informed. Even if you do take the work of x-philers for granted, it's not the sort of thing that can be done merely with education in psychology and neuroscience, because people who understand that side of the story but not the actual philosophy are going to be unable to evaluate or make the substantive moral arguments which are necessary for empirically informed work.

LanceSBushJun 8 20171

Thanks for the excellent reply.

Greene would probably not dispute that philosophers have generally agreed that the difference between the lever and footbridge cases are due to “apparently non-significant changes in the situation”

However, what philosophers have typically done is either bit the bullet and said one ought to push, or denied that one ought to push in the footbridge case, but then feel the need to defend commonsense intuitions by offering a principled justification for the distinction between the two. The trolley literature is rife with attempts to vindicate an unwillingness to push, because these philosophers are starting from the assumption that commonsense moral intuitions track deep moral truths and we must explicate the underlying, implicit justification our moral competence is picking up on.

What Greene is doing by appealing to neuroscientific/psychological evidence is to offer a selective debunking explanation of some of those intuitions but not the others. If the evidence demonstrates that one set of outputs (deontological judgments) are the result of an unreliable cognitive process, and another set of outputs (utilitarian judgments) are the result of reliable cognitive processes, then he can show that we have reason to doubt one set of intuitions but not the other, provided we agree with his criteria about what constitutes a reliable vs. an unreliable process. A selective debunking argument of this kind, relying as it does on the reliability of distinct psychological systems or processes, does in fact turn on the empirical evidence (in this case, on his dual process model of moral cognition).

[But nobody believes that judgements are correct or wrong merely because of the process that produces them.]

Sure, but Greene does not need to argue that deontological/utilitarian conclusions are correct or incorrect, only that we have reason to doubt one but not the other. If we can offer reasons to doubt the very psychological processes that give rise to deontological intuitions, this skepticism may be sufficient to warrant skepticism about the larger project of assuming that these intuitions are underwitten by implicit, non-obvious justifications that the philosopher’s job is to extract and explicate.

You mention evolutionary debunking arguments as an alternative that is known “without any reference to psychology.” I think this is mistaken. Evolutionary debunking arguments are entirely predicated on specific empirical claims about the evolution of human psychology, and are thus a perfect example of the relevance of empirical findings to moral philosophy.

[Also it's worth clarifying that Greene only deals with a particular instance of a deontological judgement rather than deontological judgements in general.]

Yes, I completely agree and I think this is a major weakness with Greene’s account.

I think there are two other major problems: the fMRI evidence he has is not very convincing, and trolley problems offer a distorted psychological picture of the distinction between utilitarian and non-utilitarian moral judgment. Recent work by Kahane shows that people who push in footbridge scenarios tend not to be utilitarians, just people with low empathy. The same people that push tend to also be more egoistic, less charitable, less impartial, less concerned about maximizing welfare, etc.

Regarding your last point two points: I agree that one move is to simply reject how he talks about intuitions (or one could raise other epistemic challenges presumably). I also agree that training in psychology/neuroscience but not philosophy impairs one's ability to evaluate arguments that presumably depend on competence in both. I am not sure why you bring this up though, so if there was an inference I should draw from this help me out!