I think that it's probably quite important to define in advance what sorts of results would convince us that the quality of MIRI's performance is either sufficient or insufficient. Otherwise I expect those already committed to some belief about MIRI's performance to consider the survey evidence for their existing belief, even if another person with the opposite belief also considers it evidence for their belief.
Relatedly, I also worry about the uniqueness of the problem and how it might change what we consider a cause worth donating to. Although you don't seem to be thinking that you could understand MIRI's arguments and see no flaws and still be inclined to say "I still can't be sure that this is the right way to go," I expect that many people are averse to donating to causes like MIRI because the effectiveness of the proposed interventions does not admit to simple testing. With existential risks, empirical testing is often impossible in the traditional sense, although sometimes possible in a limited sense. Results about sub-existential pandemic risk are probably at least somewhat relevant to the study of existential pandemic risk, for example. But it's not the same as distributing bed nets, looking at the malaria incidence, adjusting, reobserving, and so on and so on. It's not like we can perform an action, look through a time warp, and see whether or not the world ends in the future. And what I'm getting at is that, even if this is not really the nature of these problems, even if it is not the case that interventions upon these problems are not testable, we might imagine the implications if it were the case that they were genuinely untestable. I think that there are some people who would refuse to donate to existential risk charities merely because other charities have interventions testable for effectiveness. And this concerns me. If it is not by human failing that we don't test the effectiveness of our interventions, but it is the nature of the problem that you cannot test the effectiveness of your interventions, do you choose to do nothing? That is not a rhetorical question. I genuinely believe that we are confused about this and that MIRI is an example of a cause that may be difficult to evaluate without resolving this confusion. This is related to ambiguity aversion in cognitive science and decision theory. Even though ambiguity aversion appears in choices between betting on known and unknown risks, and not in choices to bet or not to bet on unknown risks in non-comparative contexts, effective altruists consider almost all charitable decisions within the context of cause prioritization, which means that we might expect EAs to encounter more comparative contexts than a random philanthropist, and thus for them to exhibit more bias towards causes with ambiguity, even if the survey itself would technically be focusing on one cause. It's noteworthy that the expected utility formalism and human behavior differ in the sense that the expected utility formalism prescribes indifference between bets with known and unknown probabilities in the case that each bet has the same payoffs. (In reality the situation is not even this clear, for the payoffs of successfully intervening upon malaria incidence as opposed to human extinction are hardly equal.) I think we must genuinely ask if we should be averse to ambiguity in general, and to attempt to explain why this heuristic was evolutionarily adaptive, and to see if the problem of existential risk is an example of a case either where we should, or where we should not, use ambiguity aversion as a heuristic. After all, a humanity that attempts no interventions on the problem of existential risk merely because it cannot test the effectiveness of its interventions is a humanity that ignores existential risk and goes extinct for it, even if we believed that we were being virtuous philanthropists the entire time.
Many people here, myself included, are very concerned about the risks from rapidly improving artificial general intelligence (AGI). A significant fraction of people in that camp give to the
Thanks for the write-up, Rob. OpenPhil actually decided to evaluate our technical agenda last summer, and Holden put Daniel Dewey on the job. The report isn't done yet, in part because it has proven very time-intensive to fully communicate the reasoning behind our research priorities, even to someone with as much understanding of the AI landscape as Daniel Dewey. Separately, we have plans to get an independent evaluation of our organizational efficacy started later in 2016, which I expect to be useful for our admin team as well as prospective donors.
FYI, when it comes to evaluating our research progress, I doubt that the methods you propose would get you much Bayesian evidence. Our published output will look like round pegs shoved into square holes regardless of whether we're doing our jobs well or poorly, because we're doing research that doesn't fit neatly into an existing academic niche. Our objective is to make direct progress on what appear to us to be the main neglected technical obstacles to developing reliable AI systems in the long term, with a goal of shifting the direction of AI research in a big way once we hit certain key research targets; and we're specifically targeting research that isn't compatible with industry's economic incentives or academia's publish-or-perish incentives. To get information about how well we're doing our jobs, I think the key questions to investigate are (1) whether we've chosen good research targets; and (2) whether we're making good progress towards them.
We've been focusing our communication efforts mainly on helping people evaluate (1): I've been working on explaining our approach and agenda, and OpenPhil is also on the job. To investigate (2), we'd need to spend a sizable chunk of time with mathematically adept evaluators — we still haven't hit any of our key research targets, which means that evaluating our progress requires understanding our smaller results and why we think they're progress towards the big results. In practice, we've found that explaining this usually requires explaining why we think the big targets are vital, as this informs (e.g.) which shortcuts are and are not acceptable. I plan to wait until after the OpenPhil report is finished before taking on another time-intensive eval.
Fortunately, (2) will become much easier to evaluate as we achieve (or persistently fail to achieve) those key targets. This also provides us with an opportunity to test our approach and methodology. People who understand our approach and find it uncompelling often predict that some of the results we're shooting for cannot be achieved. This means we'll get some evidence about (1) as we learn more about (2). For example, last year I mentioned "naturalized AIXI" as an ambitious 5-year research target. If we are not able to make concrete progress towards that goal, then over the next four years, I will lose confidence in our approach and eventually change our course dramatically. Conversely, if we make discoveries that are important pieces of that puzzle, I'll update in favor of us being onto something, especially if we find puzzle pieces that knowledgeable critics predicted we wouldn’t find. This data will hopefully start rolling in soon, now that our research team is getting up to size.
("Concrete progress" / "important puzzle pieces" in this case are satisfactory asymptotic algorithms for any of: (1) reasoning under logical uncertainty; (2) identifying the best available decision with respect to a utility function; (3) performing induction from inside an environment; (4) identifying the referents of goals in realistic world-models; and (5) reasoning about the behavior of smarter reasoners; the last of which is hopefully a subset of 1 and 2. The linked papers give rough descriptions of what counts as 'satisfactory' in each case; I'll work to make the desiderata more explicit as time goes on.)