Hey! I applied end of april and haven't received any notification like this nor a rejection and I'm not sure what this means about the status of my application. I emailed twice over the past 4 months, but haven't received a reply :/
Most of the researchers at GPI are pretty sceptical of AI x-risk.
Not really responding to the comment (sorry), just noting that I'd really like to understand why these researchers at GPI and careful-thinking AI alignment people - like Paul Christiano - have such different risk estimates! Can someone facilitate and record a conversation?
I find it remarkable how little is being said about concrete mechanisms for how advanced AI would destroy the world by the people who most express worries about this. Am I right in thinking that? And if so, is this mostly because they are worried about infohazards and therefore don't share the concrete mechanisms they are worried about?
I personally find it pretty hard to imagine ways that AI would e.g. cause human extinction that feel remotely plausible (allthough I can well imagine that there are plausible pathways I haven't thought of!)
Relatedly, I wonder if public communication about x-risks from AI should be more concrete about mechanisms? Otherwise it seems much harder for people to take these worries seriously.
I wonder to what extent people take the alignment problem to be the problem of (i) creating an AI system that reliaby does or tries to do what its operators want it do as opposed to (ii) the problem of creating an AI system that does or tries to do what is best "aligned with human values" (whatever this precisely means).
I see both definitions being used and they feel importantly different to me: if we solve the problem of aligning an AI with some operator, then this seems far away from safe AI. In fact, when I try to imagine how an AI might cause a catastrophe, the clearest causal path for me would be one where the AI is extremely competent at pursuing the operators stated goal, but that stated goal implies or requires catastrophe (e.g. the superintelligent AI receives some input like "Make me the emperor of the world"). On the other hand, if the AI system is aligned with humanity as a whole (in some sense), this scenario seems less likely.
Does that seem right to you?
I'm pretty late to the party (perhaps even so late that people forgot that there was a party), but just in case someone is still reading this, I'll leave my 2 cents on this post.
[Context: A few days ago, I released a post that distils a paper by Kenny Easwaran and others, in which they propose a rule for updating on the credences of others. In a (tiny) nutshell, this rule, "Upco", asks you to update on someones credence in proposition A by multiplying your odds with their odds.]
1. Using Upco suggests some version of strong epistemic modesty: whenever the product of all the odds of your peers that you have learned is larger than your own odds, then your credence should be dominated by those of others; and if we grant that this is virtually always the case, then strong epistemic modesty follows.
2. While I agree with some version of strong epistemic modesty, I strongly disagree with what I take to be the method of updating on the credence of peers that is proposed in this post: taking some kind of linear average (from hereon referred to as LA). Here's a summary of reasons why I think Upco is a better updating rule, copied from my post:
Unfortunately, the LA has some undesirable properties (see section 4 of the paper):
- Applied in the way sketched above, LA is non-commutative, meaning that LA is sensitive to the order in which you update on the credences of others, and it seems like this should be completely irrelevant to your subsequent beliefs.
- This can be avoided by taking the “cumulative average” of the credences of the people you update on, i.e. each time you learn someone's credence in A you average again over all the credences you have ever learned regarding this proposition. However, now the LA has lost its initial appeal; for each proposition you have some credence in, rationality seems to require you to keep track of everyone you have updated on and the weights you assigned to them. This seems clearly intractable once the number of propositions and learned credences grows large.
- See Gardiner (2013) for more on this.
- Relatedly, LA is also sensitive to whether you update on multiple peers at once or sequentially.
- Also, LA does not commute with Bayesian Updating. There are cases where it matters whether you first update on someone's credence (e.g. regarding the bias of a coin) using the LA and then on “non-psychological” evidence (e.g. the outcome of a coin-flip you observed) using Bayesian Updating or the reverse.
- Moreover, LA does not preserve ‘judgments of independence’. That is, if two peers judge two propositions A and B to be independent, i.e. and , then after updating on each other's credences, independence is not always preserved. This seems intuitively undesirable: if you think that the outcome of (say) a coin flip and a die roll are independent and I think the same - why should updating on my credences change your mind about that?
- LA does not exhibit what the authors call “synergy”. That is, suppose and . Then it is necessarily the case that , are in the interval if they are both applying LA. In other words, using the LA never allows you to update beyond the credence of the most confident person you’ve updated on (or yourself if you are more confident than everybody else).
- At first sight, this might seem like a feature rather than a bug. However, this means that the credence of someone less confident than you can never be positive evidence regarding the issue at hand. Suppose you are 95% sure that A is true. Now, for any credence smaller than 95% LA would demand that you update downwards. Even if someone is perfectly rational, has a 94.9% credence in A and has evidence completely independent from yours, LA tells you that their credence is disconfirming evidence.
Perhaps most importantly, since Bayesian Updating does not have these properties, LA does not generally produce the same results. Thus, insofar as we regard Bayesian updating as the normative ideal, we should expect LA to be at best an imperfect heuristic and perhaps not even that.In sum, LA has a whole host of undesirable properties. It seems like we therefore would want an alternative rule that avoids these pitfalls while retaining the simplicity of LA.
The EaGlHiVe aims to show that such a rule exists. They call this rule “Upco”, standing for “updating on the credences of others”. Upco is a simple rule that avoids many of the problems of LA: preservation of independence, commutativity, synergy, etc. Moreover, Upco produces the same results as Bayesian Updating under some conditions.
Thanks, this seems useful! :) One suggestion: if there are similar estimates available for other causes, could you add at least one to the post as a comparison? I think this would make your numbers more easily interpretable.
This link works for me:
https://openai.com/form/preparedness-challenge
(Just without period at the end)