Gram_Stone

Comments

Cognitive Science/Psychology As a Neglected Approach to AI Safety

Your comment reads strangely to me because your thoughts seem to fall into a completely different groove from mine. The problem statement is perhaps: write a program that does what-I-want, indefinitely. Of course, this could involve a great deal of extrapolation.

The fact that I am even aspiring to write such a program means that I am assuming that what-I-want can be computed. Presumably, at least some portion of the relevant computation, the one that I am currently denoting 'what-I-want', takes place in my brain. If I want to perform this computation in an AI, then it would probably help to at least be able to reproduce whatever portion of it takes place in my brain. People who study the mind and brain happen to call themselves psychologists and cognitive scientists. It's weird to me that you're arguing about how to classify Joshua Greene's research; I don't see why it matters whether we call it philosophy or psychology. I generally find it suspicious when anyone makes a claim of the form: "Only the academic discipline that I hold in high esteem has tools that will work in this domain." But I won't squabble over words if you think you're drawing important boundaries; what do you mean when you write 'philosophical'? Maybe you're saying that Greene, despite his efforts to inquire with psychological tools, elides into 'philosophy' anyway, so like, what's the point of pretending it's 'moral philosophy' via psychology? If that's your objection, that he 'just ends up doing philosophy anyway', then what exactly is he eliding into, without using the words 'philosophy' or 'philosophical'?

More generally, why is it that we should discard the approach because it hasn't made itself obsolete yet? Should the philosophers give up because they haven't made their approach obsolete yet either? If there's any reason that we should have more confidence in the ability of philosophers than cognitive scientists to contribute towards a formal specification of what-I-want, that reason is certainly not track record.

What people believe doesn't tell us much about what actually is good.

I don't think anyone who has read or who likely will read your comment equivocates testimony or social consensus with what-is-good.

The challenge of AI safety is the challenge of making AI that actually does what is right, not AI that does whatever it's told to do by a corrupt government, a racist constituency, and so on.

It's my impression that AI safety researchers are far more concerned about unaligned AGIs killing everyone than they are about AGIs that are successfully designed by bad actors to do a specific, unimaginative thing without killing themselves and everyone else in the process.

Of course a new wave of pop-philosophers and internet bloggers have made silly claims that moral philosophy can be completely solved by psychology and neuroscience but this extreme view is ridiculous on its face.

Bleck, please don't ever give me a justification to link a Wikipedia article literally named pooh-pooh.

Cognitive Science/Psychology As a Neglected Approach to AI Safety

Also, have you seen this AI Impacts post and the interview it links to? I would expect so, but it seems worth asking. Tom Griffiths makes similar points to the ones you've made here.

Cognitive Science/Psychology As a Neglected Approach to AI Safety

I think these are all points that many people have considered privately or publicly in isolation, but that thus far no one has explicitly written them down and drawn a connection between them. In particular, lots of people have independently made the observation that ontological crises in AIs are apparently similar to existential angst in humans, ontology identification seems philosophically difficult, and so plausibly studying ontology identification in humans is a promising route to understanding ontology identification for arbitrary minds. So, thank you for writing this up; it seems like something that needed to be written quite badly.

Some other problems that might be easier to tackle from this perspective include mind crime, nonperson predicates, and suffering risk, especially subproblems like suffering in physics.

On 'Why Global Poverty?' and Arguments from Unobservable Impacts

I agree with this. It's the right way to take this further by getting rid of leaky generalizations like 'Evidence is good, no evidence is bad," and also to point out what you pointed out: is the evidence still virtuous if it's from the past and you're reasoning from it? Confused questions like that are a sign that things have been oversimplified. I've thought about the more general issues behind this since I wrote this, since I actually posted this on LW over two weeks ago. (I've been waiting for karma.) In the interim, I found an essay on Facebook by Eliezer Yudkowsky that gets to the core of why these are bad heuristics, among other things.

On 'Why Global Poverty?' and Arguments from Unobservable Impacts

I really like this bit.

Thank you.

I found a lot of this post disconcerting because of how often you linked to LessWrong posts, even when doing so didn't add anything. I think it would be better if you didn't rely on LW concepts so much and just say what you want to say without making outside references.

I mulled over this article for quite awhile before posting it, and this included the pruning of many hyperlinks deemed unnecessary. Of course, the links that remain are meant to produce a more concise article, not a more opaque one, so what you say is unfortunate to read. I would be interested in some specific examples of links or idiosyncratic language that either don't add value to or subtract value from the article.

It sure isn't good if I'm coming off as a crank though. I consider the points within this article very important.

Let's conduct a survey on the quality of MIRI's implementation

But I think we may be disagreeing over whether "thinks AI risk is an important cause" is too close to "is broadly positive towards AI risk as a cause area." I think so. You think not?

Are there alternatives to a person like this? It doesn't seem to me like there are.

"Is broadly positive towards AI risk as a cause area" could mean "believes that there should exist effective organizations working on mitigating AI risk", or could mean "automatically gives more credence to the effectiveness of organizations that are attempting to mitigate AI risk."

It might be helpful if you elaborated more on what you mean by 'aim for neutrality'. What actions would that entail, if you did that, in the real world, yourself? What does hiring the ideal survey supervisor look like in your mind if you can't use the words "neutral" or "neutrality" or any clever rephrasings thereof?

Let's conduct a survey on the quality of MIRI's implementation

Why should the person overseeing the survey think AI risk is an important cause?

Because the purpose of the survey is to determine MIRI's effectiveness as a charitable organization. If one believes that there is a negligible probability that an artificial intelligence will cause the extinction of the human species within the next several centuries, then it immediately follows that MIRI is an extremely ineffective organization, as it would be designed to mitigate a risk that ostensibly does not need mitigating. The survey is moot if one believes this.

Let's conduct a survey on the quality of MIRI's implementation

I think that it's probably quite important to define in advance what sorts of results would convince us that the quality of MIRI's performance is either sufficient or insufficient. Otherwise I expect those already committed to some belief about MIRI's performance to consider the survey evidence for their existing belief, even if another person with the opposite belief also considers it evidence for their belief.

Relatedly, I also worry about the uniqueness of the problem and how it might change what we consider a cause worth donating to. Although you don't seem to be thinking that you could understand MIRI's arguments and see no flaws and still be inclined to say "I still can't be sure that this is the right way to go," I expect that many people are averse to donating to causes like MIRI because the effectiveness of the proposed interventions does not admit to simple testing. With existential risks, empirical testing is often impossible in the traditional sense, although sometimes possible in a limited sense. Results about sub-existential pandemic risk are probably at least somewhat relevant to the study of existential pandemic risk, for example. But it's not the same as distributing bed nets, looking at the malaria incidence, adjusting, reobserving, and so on and so on. It's not like we can perform an action, look through a time warp, and see whether or not the world ends in the future. And what I'm getting at is that, even if this is not really the nature of these problems, even if it is not the case that interventions upon these problems are not testable, we might imagine the implications if it were the case that they were genuinely untestable. I think that there are some people who would refuse to donate to existential risk charities merely because other charities have interventions testable for effectiveness. And this concerns me. If it is not by human failing that we don't test the effectiveness of our interventions, but it is the nature of the problem that you cannot test the effectiveness of your interventions, do you choose to do nothing? That is not a rhetorical question. I genuinely believe that we are confused about this and that MIRI is an example of a cause that may be difficult to evaluate without resolving this confusion. This is related to ambiguity aversion in cognitive science and decision theory. Even though ambiguity aversion appears in choices between betting on known and unknown risks, and not in choices to bet or not to bet on unknown risks in non-comparative contexts, effective altruists consider almost all charitable decisions within the context of cause prioritization, which means that we might expect EAs to encounter more comparative contexts than a random philanthropist, and thus for them to exhibit more bias towards causes with ambiguity, even if the survey itself would technically be focusing on one cause. It's noteworthy that the expected utility formalism and human behavior differ in the sense that the expected utility formalism prescribes indifference between bets with known and unknown probabilities in the case that each bet has the same payoffs. (In reality the situation is not even this clear, for the payoffs of successfully intervening upon malaria incidence as opposed to human extinction are hardly equal.) I think we must genuinely ask if we should be averse to ambiguity in general, and to attempt to explain why this heuristic was evolutionarily adaptive, and to see if the problem of existential risk is an example of a case either where we should, or where we should not, use ambiguity aversion as a heuristic. After all, a humanity that attempts no interventions on the problem of existential risk merely because it cannot test the effectiveness of its interventions is a humanity that ignores existential risk and goes extinct for it, even if we believed that we were being virtuous philanthropists the entire time.

The Effective Altruism Newsletter & Open Thread – February 2016

Sorry about the confusion, I mean to say that even though the Against Malaria Foundation observes evidence of the effectiveness of its interventions all of the time, and this is good, the founders of the Against Malaria Foundation had to choose an initial action before they had made any observations about the effectiveness of their interventions. Presumably, there was some first village or region of trial subjects that first empirically demonstrated the effectiveness of durable, insecticidal bednets. But before this first experiment, the AMF also presumably had to rely merely on correct reasoning without corroborative observations to support their arguments. Nonetheless, their reasoning was correct. Experiment is a way to increase our confidence in our reasoning, and it is good to use it when it's available, but we can have confidence at times without it. I use these points to argue that people successfully reason without being able to test the effectiveness of their actions all of the time, and that they often have to.

The more general point is that people often use a very simple heuristic to decide whether or not something academic is worthy of interest: Is it based on evidence and empirical testing? 'Evidence-based medicine' is synonymous with 'safe, useful medicine,' depending on who you ask. Things are bad if they are not based on evidence. But in the case of existential risk interventions, it is a property of the situation that we cannot empirically test the effectiveness of our interventions. It is thus necessary to reason without conducting empirical tests. This is a reason to take the problem more seriously, for its difficulty, as opposed to the reaction of some others, which is that the 'lack of evidence-based methods' is some sort of point against trying to solve the problem anyway.

And in the case of some risks, like AI, it is actually dangerous to conduct empirical testing. It's plausible that sufficiently intelligent unsafe AIs would mimic safe AIs until they gain a decisive strategic advantage. See Bostrom's 'treacherous turn' for more on this.

The Effective Altruism Newsletter & Open Thread – February 2016

I'm new to the EA Forum. It was suggested to me that I crosspost this LessWrong post criticizing Jeff Kaufman's speech at EA Global 2015 entitled 'Why Global Poverty?' on the EA forum, but I need 5 karma to make my first post.

EDIT: Here it is.

Load More