How much do you worry that MIRI's default non-disclosure policy is going to hinder MIRI's ability to do good research, because it won't be able to get as much external criticism?

Suppose you find out that Buck-in-2040 thinks that the work you're currently doing is a big mistake (which should have been clear to you, now). What are your best guesses about what his reasons are?

What's the biggest misconception people have about current technical AI alignment work? What's the biggest misconception people have about MIRI?

Thanks Greg - I really enjoyed this post.

I don't think that this is what you're saying, but I think if someone drew the lesson from your post that, when reality is underpowered, there's no point in doing research into the question, that would be a mistake.

When I look at tiny-n sample sizes for important questions (e.g.: "How have new ideas made major changes to the focus of academic economics?" or "Why have social movements collapsed in the past?"), I generally don't feel at all like I'm trying to get a p<0.05 ; it feels more like hypothesis generation. So when I find out that Kahneman and Tversky spent 5 years honing the article Prospect Theory into a form that could be published in an economics journal, I think "wow, ok, maybe that's the sort of time investment that we should be thinking of". Or when I see social movements collapse because of in-fighting (e.g. pre-Copenhagen UK climate movement), or romantic disputes between leaders (e.g. Objectivism), then - insofar as we just want to take all the easy wins to mitigate catastrophic risks to the EA community - I know that this risk is something to think about and focus on for EA.

For these sorts of areas, the right approach seems to be granular qualitative research - trying to really understand in depth what happened in some other circumstance, and then think through what lessons that entail for the circumstance you're interested in. I think that, as a matter of fact, EA does this quite a lot when relevant. (E.g. Grace on Szilard, or existing EA discussion of previous social movements). So I think this gives us extra reason to push against the idea that "EA-style analysis" = "quant-y RCT-esque analysis" rather than "whatever research methods are most appropriate to the field at hand". But even on qualitative research I think the "EA mindset" can be quite distinctive - certainly I think, for example, that a Bayesian-heavy approach to historical questions, often addressing counterfactual questions, and looking at those issues that are most interesting from an EA perspective (e.g. how modern-day values would be different if Christianity had never taken off), would be really quite different from almost all existing historical research.

Sorry - 'or otherwise lost' qualifier was meant to be a catch-all for any way of the investment losing its value, including (bad) value-drift.

I think there's a decent case for (some) EAs doing better at avoiding this than e.g. typical foundations:

  • If you have precise values (e.g. classical utilitarianism) then it's easier to transmit those values across time - you can write your values down clearly as part of the constitution of the foundation, and it's easier to find and identify younger people to take over the fund who also endorse those values. In contrast, for other foundations, the ultimate aims of the foundation are often not clear, and too dependent on a particular empirical situation (e.g. Benjamin Franklin's funds were to 'to provide loans for apprentices to start their businesses' (!!)).
  • If you take a lot of time carefully choosing who your successors are (and those people take a lot of time over who their successors are).

Then to reduce appropriation, one could spread the funds across many different countries and different people who share your values. (Again, easier if you endorse a set of values that are legible and non-idiosyncratic.)

It might still be true that the chance of the fund becoming valueless gets large over time (if, e.g. there's a 1% risk of it losing its value per year), but the size of the resources available also increases exponentially over time in those worlds where it doesn't lose its value.

Caveat also tricky questions on when 'value drift' is a bad thing rather than the future fund owners just having a better understanding of the right thing to do than the founders did, which often seems to be true for long-lasting foundations.

I think you might be misunderstanding what I was referring to. An example of what I mean: Suppose Jane is deciding whether to work for Deepmind on the AI safety team. She’s unsure whether this speeds up or slows down AI development; her credence is imprecise, represented by the interval [0.4, 0.6]. She’s confident, let’s say, that speeding up AI development is bad. Because there’s some precisification of her credences on which taking the job is good, and some on which taking the job is bad, then if she uses a Liberal decision rule (= it is permissible for you to perform any action that is permissible according to at least one of the credence functions in your set), it’s permissible for her to take the job or not take the job.

The issue is that, if you have imprecise credences and a Liberal decision rule, and are a longtermist, then almost all serious contenders for actions are permissible.

So the neartermist would need to have some way of saying (i) we can carve out the definitely-good part of the action, which is better than not-doing the action on all precisifications of the credence; (ii) we can ignore the other parts of the action (e.g. the flow-through effects) that are good on some precisifications and bad on some precisifications. It seems hard to make that theoretically justified, but I think it matches how people actually think, so at least has some common-sense motivation. 

But you could do it if you could argue for a pseudodominance principle that says: "If there's some interval of time t_i over which action x does more expected good than action y on all precisifications of one's credence function, and there's no interval of time t_j at which action y does more expected good than action x on all precisifications of one's credence function, then you should choose x over y".

(In contrast, it seems you thought I was referring to AI vs some other putative great longtermist intervention. I agree that plausible longtermist rivals to AI and bio are thin on the ground.)

Thanks, William! 

Yeah, I think I messed up this bit. I should have used the harmonic mean rather than the arithmetic mean when averaging over possibilities of how many people will be in the future. Doing this brings the chance of being among the most influential person ever close to the chance of being the most influential person ever in a small-population universe.  But then we get the issue that being the most influential person ever in a small-population universe is much less important than being the most influential person in a big-population universe. And it’s only the latter that we care about.  

So what I really should have said (in my too-glib argument) is: for simplicity, just assume a high-population future, which are the action-relevant futures if you're a longtermist. Then take a uniform prior over all times (or all people) in that high-population future. So my claim is: “In the action-relevant worlds, the frequency of ‘most important time’ (or ‘most important person’) is extremely low, and so should be our prior.”

Thanks for these links. I’m not sure if your comment was meant to be a criticism of the argument, though? If so: I’m saying “prior is low, and there is a healthy false positive rate, so don’t have high posterior.” You’re pointing out that there’s a healthy false negative rate too — but that won’t cause me to have a high posterior?

And, if you think that every generation is increasing in influentialness, that’s a good argument for thinking that future generations will be more influential and we should therefore save.

There were a couple of recurring questions, so I’ve addressed them here.

What’s the point of this discussion — isn’t passing on resources to the future too hard to be worth considering? Won’t the money be stolen, or used by people with worse values?

In brief: Yes, losing what you’ve invested is a risk, but (at least for relatively small donors) it’s outweighed by investment returns. 

Longer: The concept of ‘influentialness of a time’ is the same as the cost-effectiveness (from a longtermist perspective) of the best opportunities accessible to longtermists at a time.  Suppose I think that the best opportunities in, say, 100 years, are as good as the best opportunities now. Then, if I have a small amount of money, then I can get (say) at least a 2% return per year on those funds. But I shouldn’t think that the chance of my funds being appropriated (or otherwise lost) is as high as 2% per year. So the expected amount of good I do is greater by saving. 

So if you think that hingeyness (as I’ve defined it) is about the same in 100 years as it is now, or greater, then there’s a strong case for investing for 100 years before spending the money.

(Caveat that once we consider larger amounts of money, diminishing returns for expenditure becomes an issue, and chance of appropriation increases.)

What’s your view on anthropics? Isn’t that relevant here?

I’ve been trying to make claims that aren’t sensitive to tricky issues in anthropic reasoning. The claim that if there are n people, ordered in terms of some relation F (like ‘more important than’), then the claim that the prior probability that you are most F (‘most important’) person  is 1/n doesn’t distinguish between anthropic principles, because I’ve already conditioned on the number of people in the world. So I think anthropic principles aren’t directly relevant for the argument I’ve made, though obviously they are relevant more generally.

