202 karmaJoined Sep 2020Pursuing an undergraduate degree


I'm a third-year undergrad in cognitive science; currently testing out research in philosophy (social epistemology) and studying deep learning. 

Feel free to reach out via dm!


Topic Contributions


I find it remarkable how little is being said about concrete mechanisms for how advanced AI would destroy the world by the people who most express worries about this. Am I right in thinking that? And if so, is this mostly because they are worried about infohazards and therefore don't share the concrete mechanisms they are worried about?

I personally find it pretty hard to imagine ways that AI would e.g. cause human extinction that feel remotely plausible (allthough I can well imagine that there are plausible pathways I haven't thought of!)

Relatedly, I wonder if public communication about x-risks from AI should be more concrete about mechanisms? Otherwise it seems much harder for people to take these worries seriously.

I wonder to what extent people take the alignment problem to be the problem of (i) creating an AI system that reliaby does or tries to do what its operators want it do as opposed to (ii) the problem of creating an AI system that does or tries to do what is best "aligned with human values" (whatever this precisely means).

I see both definitions being used and they feel importantly different to me: if we solve the problem of aligning an AI with some operator, then this seems far away from safe AI. In fact, when I try to imagine how an AI might cause a catastrophe, the clearest causal path for me would be one where the AI is extremely competent at pursuing the operators stated goal, but that stated goal implies or requires catastrophe (e.g. the superintelligent AI receives some input like "Make me the emperor of the world"). On the other hand, if the AI system is aligned with humanity as a whole (in some sense), this scenario seems less likely.

Does that seem right to you?

I'm pretty late to the party (perhaps even so late that people forgot that there was a party), but just in case someone is still reading this, I'll leave my 2 cents on this post. 

[Context: A few days ago, I released a post that distils a paper by Kenny Easwaran and others, in which they propose a rule for updating on the credences of others. In a (tiny) nutshell, this rule, "Upco", asks you to update on someones credence in proposition A by multiplying your odds with their odds.]

 1. Using Upco suggests some version of strong epistemic modesty: whenever the product of all the odds of your peers that you have learned is larger than your own odds, then your credence should be dominated by those of others; and if we grant that this is virtually always the case, then strong epistemic modesty follows. 

2. While I agree with some version of strong epistemic modesty, I strongly disagree with what I take to be the method of updating on the credence of peers that is proposed in this post: taking some kind of linear average (from hereon referred to as LA). Here's a summary of reasons why I think Upco is a better updating rule, copied from my post: 

Unfortunately, the LA has some undesirable properties (see section 4 of the paper):

  • Applied in the way sketched above, LA is non-commutative, meaning that LA is sensitive to the order in which you update on the credences of others, and it seems like this should be completely irrelevant to your subsequent beliefs. 
    • This can be avoided by taking the “cumulative average” of the credences of the people you update on, i.e. each time you learn someone's credence in A you average again over all the credences you have ever learned regarding this proposition. However, now the LA has lost its initial appeal; for each proposition you have some credence in, rationality seems to require you to keep track of everyone you have updated on and the weights you assigned to them. This seems clearly intractable once the number of propositions and learned credences grows large. 
    • See Gardiner (2013) for more on this.
  • Relatedly, LA is also sensitive to whether you update on multiple peers at once or sequentially.
  • Also, LA does not commute with Bayesian Updating. There are cases where it matters whether you first update on someone's credence (e.g. regarding the bias of a coin) using the LA and then on “non-psychological” evidence (e.g. the outcome of a coin-flip you observed) using Bayesian Updating or the reverse. 
  • Moreover, LA does not preserve ‘judgments of independence’. That is, if two peers judge two propositions A and B to be independent, i.e.   and , then after updating on each other's credences, independence is not always preserved. This seems intuitively undesirable: if you think that the outcome of (say) a coin flip and a die roll are independent and I think the same - why should updating on my credences change your mind about that?
  • LA does not exhibit what the authors call “synergy”. That is, suppose  and . Then it is necessarily the case that  are in the interval  if they are both applying LA. In other words, using the LA never allows you to update beyond the credence of the most confident person you’ve updated on (or yourself if you are more confident than everybody else). 

    • At first sight, this might seem like a feature rather than a bug. However, this means that the credence of someone less confident than you can never be positive evidence regarding the issue at hand. Suppose you are 95% sure that A is true. Now, for any credence smaller than 95% LA would demand that you update downwards. Even if someone is perfectly rational, has a 94.9% credence in A and has evidence completely independent from yours, LA tells you that their credence is disconfirming evidence. 

Perhaps most importantly, since Bayesian Updating does not have these properties, LA does not generally produce the same results. Thus, insofar as we regard Bayesian updating as the normative ideal, we should expect LA to be at best an imperfect heuristic and perhaps not even that. 

In sum, LA has a whole host of undesirable properties. It seems like we therefore would want an alternative rule that avoids these pitfalls while retaining the simplicity of LA.   

The EaGlHiVe aims to show that such a rule exists. They call this rule “Upco”, standing for “updating on the credences of others”. Upco is a simple rule that avoids many of the problems of LA: preservation of independence, commutativity, synergy, etc. Moreover, Upco produces the same results as Bayesian Updating under some conditions.

Thanks, this seems useful! :) One suggestion: if there are similar estimates available for other causes, could you add at least one to the post as a comparison? I think this would make your numbers more easily interpretable.

Hey Daniel, 

thanks for engaging with this! :)

You might be right that the geometric mean of odds performs better than Ucpo as an updating rule although I'm still unsure exactly how you would implement it. If you used the geometric mean of odds as an updating rule for a first person and you learn the credence of another person, would you then change the weight (in the exponent) you gave the first peer to 1/2 and sort of update as though you had just learnt the first and second persons' credence? That seems pretty cumbersome as you'd have to keep track of the number of credences you already updated on for each proposition in order to assign the correct weight to a new credence.  Even if using the geometric mean of odds(GMO) was a better approximation of the ideal bayesian response than Upco (which to me is an open question), it thus seems like Upco is practically feasible and GMO is not. 

Here's one huge problem with the Upco method as you present it: two people think that it's a 1/6 chance of rolling a six on a (fair) die. This opinion shouldn't change when you update on others' opinions. If you used Upco, that's a 1:5 ratio, giving final odds of 1:25 - clearly incorrect. On the other hand, a geometric mean approach gives sqrt((1*1)/(5*5))=1:5, as it should be.

If one person reports credence 1/6 in rolling a six on a fair die and this is part of a partition containing one proposition for each possible outcome {"Die lands on 1", Die lands on 2", ... "Die lands on 6"}, then the version of upco that can deal with complex partitions like this will tell you not to update on this credence (see section on "arbitrary partitions"). I think the problem you mention just occurs because you are using a partition that collapses all five other possible outcomes into one proposition - "the die will not land  on 6". 

This case does highlight that the output of Upco depends on the partition you assume someone to report their credence over. But since GMO as an updating rule just differs from Upco in assigning a weight of 1/n to each persons credence (where n is the number of credence already learned), I'm pretty sure you can find the same partition dependence with GMO.


Thanks a lot for the update!  I feel excited about this project and grateful that it exists!

As someone who stayed at CEEALAR for ~6 months over the last year, I though I'd share some reflections that might help people decide whether going to the EA Hotel is a good decision for them. I'm sure experiences vary a lot, so, general disclaimer, this is just my personal data point and not some broad impression of the typical experience. 

Some of the best things that happened as a result of my stay:

  1.  I made at least three close friends I'm still in regular contact with, despite leaving the hotel half a year ago. That is a lot by my standards! I also increased my "network" by at least 20 people from all parts of the world and various professional/academic backgrounds that I'd be pretty happy to reach out to.
  2. I greatly increased my productivity. During my stay, a friend and I came up with an accountability system that increased my productivity on average by >5 focused hours per week over the last year (compared to previous years) and made me generally more healthy (e.g. I would estimate that I now exercise >1h more per week on average).
    1. Relatedly, I managed to create four episodes of a German podcast about EA in my first two months there.
  3. Harder to quantify: a lot of inspiration! I came to the hotel after a year of being more or less in covid-isolation and so I went from that to talking hours every day to researchers and creators with loads of new ideas. I hope that spending so much time around people who were smarter and more knowledgeable than I lead to a decent amount of intellectual growth for me. Somewhat relatedly, I wouldn't be surprised if a lot of the value of my stay came from many small suggestions other people casually made in conversation (e.g. of some concept, some website or some other product).
    1. I think this varied a lot over time as the group of people changed and especially as we became a smaller group over the summer it became less of a stimulating environment. 

Some downsides: 

  1. I'm not a fan of Blackpool. I remember it largely as being grimy and dull, especially in winter. The proximity to the sea and the park are quite nice though 
  2. I think the hotel rooms were also less comfortable and nice than what I'm used to from home. I think this was fine for the most part since I mostly used my room to sleep.  

On balance, I think I benefitted a lot from staying at the hotel and I'm very glad I made the decision to go! Thanks to Greg, Lumi, Denisa, Dave and everyone else who spend time with me while I was there <3
Please reach out if you have any questions you'd rater like to ask me in private.

Yes, that would be awesome!

Why is there so much more talk about the existential risk from AI as opposed to the amount by which individuals (e.g. researchers) should expect to reduce these risks through their work? 

The second number seems much more decision-guiding for individuals than the first. Is the main reason that its much harder to estimate? If so, why?

(Hastily written, sry)

I would love to see more of the theories of change that researchers in EA have for their own career! I'm particularly interested to see them in Global Priorities Research as its done at GPI (because I find that both extremely interesting and I'm very uncertain how useful it is apart from field-building).

Two main reasons: 

  1. It's not easy at all (in my experience) to figure out which claims are actually decision relevant in major ways. Seeing these theories of change might make it much easier for junior researchers to develop a "taste" for which research directions are tractable and important. 
  2. Publishing their theories of change would allow researchers  to get more feedback on their project and perhaps realise earlier why it's not that important. (This point of course applies not only to researchers). As a result of the second point, it seems less likely that EA researchers get off-track following intellectual curiosities rather than what's most important (which I suspect researchers, in general, are prone to)
Load more