I'm currently working as a Senior Research Scholar at the Future of Humanity Institute.

Wiki Contributions


Ben_Snodin's Shortform

Takeaways from some reading about economic effects of human-level AI

I spent some time reading things that you might categorise as “EA articles on the impact of human-level AI on economic growth”. Here are some takeaways from reading these (apologies for not always providing a lot of context / for not defining terms; hopefully clicking the links will provide decent context).

If you're interested in more on this topic, I'd highlight Holden Karnofsky's recent blog series and Tom Davidson's recent Open Phil report as good places to start.

In case it’s useful for other people, here’s the main stuff I (at least partially) read / listened to:

Report on Whether AI Could Drive Explosive Economic Growth

Thanks for this, I think it's really brilliant, I really appreciate how clearly the details are laid out in the blog and report. It's really cool to be able to see external reviewer comments too.

I found it kind of surprising that there isn't any mention of civilizational collapse etc when thinking about growth outcomes for the 21st century (e.g. in Appendix G, but also apparently in your bottom line probabilities in e.g. Section 4.6 "Conclusion" -- or maybe it's there and I missed it / it's not explicit).

I guess your probabilities for various growth outcomes in Appendix G are conditional on ~no civilizational collapse (from any cause) and ~no AI-triggered fundamental reshaping of society that unexpectedly prevents growth? Or should I read them more as "conditional on ~no civilizational collapse etc other than due to AI", with the probability mass for AI-triggered collapse etc being incorporated into your "AI robots don't have a tendency to drive explosive growth because none of our theories are well-suited for this situation" and/or "an unanticipated bottleneck prevents explosive growth"?

Ben_Snodin's Shortform

Thanks that's interesting, I've heard of it but I haven't looked into it.

Ben_Snodin's Shortform

Causal vs evidential decision theory

I wrote this last Autumn as a private “blog post” shared only with a few colleagues. I’m posting it publicly now (after mild editing) because I have some vague idea that it can be good to make things like this public. Decision theories are a pretty well-worn topic in EA circles and I'm definitely not adding new insights here. These are just some fairly naive thoughts-out-loud about how CDT and EDT handle various scenarios. If you've already thought a lot about decision theory you probably won't learn anything from this.

The last two weeks of the Decision Theory seminars I’ve been attending have focussed on contrasting causal decision theory (CDT) and evidential decision theory (EDT). This seems to be a pretty active area of discussion in the literature - one of the papers we looked at was published this year, and another is yet to be published.

In terms of the history of the field, it seems that Newcomb’s problem prompted a move towards CDT (e.g. in Lewis 1981). I find that pretty surprising because to me Newcomb’s problem provides quite a bit of motivation for EDT, and without weird scenarios like Newcomb’s problem I think I might have taken something like CDT to be the default, obviously correct theory. But it seems like you didn’t need to worry about having a “causal aspect” to decision theories until Newcomb’s problem and other similar problems brought out a divergence in recommendations from (what became known as) CDT and EDT.

I guess this is a very well-worn area (especially in places like Lesswrong) but anyway I can’t resist giving my fairly naive take even though I’m sure I’m just repeating what others have said. When I first heard about things like Newcomb’s problem a few years ago I think I was a pretty ardent CDTer, whereas nowadays I am much more sympathetic to EDT.

In Newcomb’s problem, it seems pretty clear to me that one-boxing is the best option, because I’d rather have $1,000,000 than $1000. Seems like a win for EDT. 

Dicing With Death is designed to give CDT problems, and in my opinion it does this very effectively. In Dicing With Death, you have to choose between going to Aleppo or Damascus, and you know that whichever you choose, death will have predicted your choice and be waiting for you (a very bad outcome for you). Luckily, a merchant offers you a magical coin which you can toss to decide where to go, in which case death won’t be able to predict where you go, giving you a 50% chance of avoiding death. The merchant will charge a small fee for this. However CDT gets into some strange contortions and as a result recommends against paying for the magical coin, even though the outcome if you pay for the magical coin seems clearly better. EDT recommends paying for the coin, another win for EDT.

To me, The Smoking Lesion is a somewhat problematic scenario for EDT. Still, I feel like it’s possible for EDT to do fine here if you think carefully enough. 

You could make the following simple model for what happens in The Smoking Lesion: in year 1, no-one knows why some people get cancer and some don’t. In year 2, it’s discovered that everyone who smokes develops cancer, and furthermore there’s a common cause (a lesion) that causes both of these things. Everyone smokes iff they have the lesion, and everyone gets cancer iff they have the lesion. In year 3, following the publication of these results, some people who have the lesion try not to smoke. Either (i) none of them can avoid smoking because the power of the lesion is too strong; or (ii) some of them do avoid smoking, but (since they still have the lesion) they still develop cancer. In case (i), the findings from year 2 remain valid even after everyone knows about them. In case (ii), the findings from year 2 are no longer valid: they just tell you about how the world would have been if the correlation between smoking and cancer wasn’t known.

The cases where you use the knowledge about the year 2 finding to decide not to smoke are exactly the cases where the year 2 finding doesn’t apply. So there’s no point in using the knowledge about the year 2 finding to not smoke: either your not smoking (through extreme self-control etc) is pointless because you still have the lesion and this is a case where the year 2 finding doesn’t apply, or it’s pointless because you don’t have the lesion.

So it seems to me like the right answer is to smoke if you want to, and I think EDT can recommend this by incorporating the fact that if you choose not to smoke purely because of the year 2 finding, this doesn’t give you any evidence about whether you have the lesion (though this is pretty vague and I wouldn’t be that surprised if making it more precise made me realise it doesn’t work).

In general it seems like these issues arise from treating the agent’s decision making process as being removed from the physical world - a very useful abstraction which causes issues in weird edge cases like the ones considered above.

Ben_Snodin's Shortform

Changing your working to fit the answer

I wrote this last Autumn as a private “blog post” shared only with a few colleagues. I’m posting it publicly now (after mild editing) because I have some vague idea that it can be good to make things like this public. It is quite rambling and doesn't really have a clear point (but I think it's at least an interesting topic).

Say you want to come up with a model for AI timelines, i.e. the probability of transformative AI being developed by year X for various values of X. You put in your assumptions (beliefs about the world), come up with a framework for combining them, and get an answer out. But then you’re not happy with the answer - your framework must have been flawed, or maybe on reflection one of your assumptions needs a bit of revision. So you fiddle with one or two things and get another answer - now it looks much better, close enough to your prior belief that it seems plausible, but not so close that it seems suspicious.

Is this kind of procedure valid? Here’s one case where the answer seems to be yes: if your conclusions are logically impossible, you know that either there’s a flaw in your framework or you need to revise your assumptions (or both). 

A closely related case is where the conclusion is logically possible, but extremely unlikely. It seems like there’s a lot of pressure to revise something then too.

But in the right context revising your model in this way can look pretty dodgy. It seems like you’re “doing things the wrong way round” - what was the point of building the model if you were going to fiddle with the assumptions until you got the answer you expected anyway?

I think this is connected to a lot of related issues / concepts:

  • Model building
    • Option pricing models in finance: you start (both historically and conceptually) with the nice clean Black-Scholes model, which fails to explain actually observed option prices. Due to this, various assumptions are relaxed or modified, adding (arguably, somewhat ad hoc) complexity until, for the right set of parameters, the model gets all (sufficiently important) observed option prices right.
    • Regularisation / overfitting in ML: you might think of overfitting as “placing too much weight on getting the answer you expect”.
  • Arguments
    • “One person's modus ponens is another’s modus tollens”: if we’re presented with a logical argument, usually the person presenting it wants us to accept the premises and agree that the argument is valid, in which case we must accept the conclusion. If we don’t like the conclusion, we often focus on showing that the argument is invalid. But if you think the conclusion is very unlikely, you also have the option of acknowledging the argument as valid, but rejecting one of the premises. There are lots of fun examples of this from science and philosophy on Gwern’s page on the subject.
    • “Begging the question”: a related accusation in philosophy that seems to mean roughly “your conclusion follows trivially from your premises but I reject one of your premises (and by the way it should have been obvious that I’d reject one of your premises so it was a waste of both my time and yours that you made this argument)”
    • Reductio ad absurdum: disprove something by using it as an assumption that leads to an implausible (or maybe logically impossible) conclusion
    • “Proving too much”: an accusation in philosophy that is supposed to count against the argument doing the “proving”.
    • (Not) updating your beliefs from an argument that appears convincing on the face of it: if the conclusions are implausible enough, you might not update your beliefs too much the first time you encounter the argument, even if it appears watertight.
  • Research methods
    • Sanity checking your answer: check that the results of a complex calculation or experiment roughly match the result you get from a quick and crude approach.

Presumably, you could put this question of whether and how much to modify your model into some kind of formal Bayesian framework where on learning a new argument you update all your beliefs based on your prior beliefs in the premises, conclusion, and validity of the argument. I’m not sure whether there’s a literature on this, or whether e.g. highly skilled forecasters actually think like this.

In general though, it seems (to me) that there’s something important about “following where the assumptions / model takes you”. Maybe, given all the ways we fall short of being perfectly rational, we should (and I think that in fact we do) put more emphasis on this than a perfectly rational Bayesian agent would. Avoiding having a very strong prior on the conclusion seems helpful here.

Intervention options for improving the EA-aligned research pipeline

One (maybe?) low-effort thing that could be nice would be saying "these are my top 5" or "these are listed in order of how promising I think they are" or something (you may well have done that already and I missed it).

Intervention options for improving the EA-aligned research pipeline

Thanks, I think this is a great topic and this seems like a useful list (although I do find reading through 19 different types of options without much structure a bit overwhelming!).

I'll just ~repost a private comment I made before.

Encouraging and facilitating aspiring/junior researchers and more experienced researchers to connect in similar ways

This feels like an especially promising area to me. I'd guess there are lots of cases where this would be very beneficial for the junior researcher and at least a bit beneficial for the experienced researcher. It just needs facilitation (or something else, e.g. a culture change where people try harder to make this happen themselves, some strong public encouragement to juniors to make this happen, ...).

This isn't based on really strong evidence, maybe mostly my own (limited) experience + assuming at least some experienced researchers are similar to me. And that there are lots of excellent junior researcher candidates out there (again from first hand impressions).

Improving the vetting of (potential) researchers, and/or better “sharing” that vetting

This also seems like a big deal and an area where maybe you could improve things significantly with a relatively small amount of effort. I don't have great context here though.

Ben_Snodin's Shortform

Two papers I read on imprecise probabilities

I wrote this last Autumn as a private “blog post” shared only with a few colleagues. I’m posting it publicly now (after mild editing) because I have some vague idea that it can be good to make things like this public.

In this post I’m going to discuss two papers regarding imprecise probability that I read this week for a Decision Theory seminar. The first paper seeks to show that imprecise probabilities don’t adequately constrain the actions of a rational agent. The second paper seeks to refute that claim.

Just a note on how seriously to take what I’ve written here: I think I’ve got the gist of what’s in these papers, but I feel I could spend a lot more time making sure I’ve understood them and thinking about which arguments I find persuasive. It’s very possible I’ve misunderstood or misrepresented the points the papers were trying to make, and I can easily see myself changing my mind about things if I thought and read more.

Also, a note on terminology: it seems like “sharpness/unsharpness” and “precision/imprecision” are used interchangeably in these papers, as are “probability” and “credence”. There might well be subtle distinctions that I’m missing, but I’ll try to consistently use “imprecise probabilities” here.

Imprecise probabilities

I imagine there are (at least) several different ways of formulating imprecise probabilities. One way is the following: your belief state is represented by a set of probability functions, and your degree of belief in a particular proposition is represented by the set of values assigned to it by the set of probability functions. You also then have an imprecise expectation: each of your probability functions has an associated expected utility. Sometimes, all of your probability functions will agree on the action that has the highest expected value. In that case, you are rationally required to take that action. But if there’s no clear winner, that means there’s more than one permissible action you could take.

Subjective Probabilities should be Sharp

The first paper, Subjective Probabilities should be Sharp, was written in 2010 by Elga. The central claim is that there’s no plausible account of how imprecise probabilities constrain which choices are reasonable for a perfectly rational agent.

The argument centers around a particular betting scenario: someone tells you “I’m going to offer you bet A and then bet B, regarding a hypothesis H”:

Bet A: win $15 if H, else lose $10

Bet B: lose $10 if H, else win $15

You’re free to choose whether to take bet B independently of whether you choose bet A.

Depending on what you believe about H, it could well be that you prefer just one of the bets to both bets. But it seems like you really shouldn’t reject both bets. Taking both bets guarantees you’ll win exactly $5, which is strictly better than the $0 you’ll win if you reject both bets.

But under imprecise probabilities, it’s rationally permissible to have some range of probabilities for H, which implies that it’s permissible to reject both bet A and bet B. So imprecise probabilities permit something which seems like it ought to be impermissible.

Elga considers various rules that might be added to the initial imprecise probabilities-based decision theory, and argues that none of them are very appealing. I guess this isn’t as good as proving that there are no good rules or other modifications, but I found it fairly compelling on the face of it.

The rules that seemed most likely to work to me were Plan and Sequence. Both rules more or less entail that you should accept bet B if you already accepted bet A, in which case rejecting both bets is impermissible and it looks like the theory is saved. 

Elga tries to show that these don’t work by inviting us to imagine the case where a particular agent called Sally faces the decision problem. Sally has imprecise probabilities, maximises expected utility and has a utility function that is linear in dollars.

Elga argues that in this scenario it just doesn’t make sense for Sally to accept bet B only if she already accepted bet A - the decision to accept bet B shouldn’t depend on anything that came before. It might do if Sally had some risk averse decision theory, or had a utility function that was concave in dollars - but by assumption, she doesn’t. So Plan and Sequence, which had seemed like the best candidates for rescuing imprecise probabilities, aren’t plausible rules for a rational agent like Sally.

Should Subjective Probabilities be Sharp?

The 2014 paper by Bradley and Steele, Should Subjective Probabilities be Sharp? is, as the name suggests, a response to Elga’s paper. The core of their argument is that the assumptions for rationality implied by Elga’s argument are too strong and that it’s perfectly possible to have rational choice with imprecise probabilities provided that you don’t make these too-strong assumptions.

I’ll highlight two objections and give my view.

Objection 1:

  • Bradley and Steele give the label Retrospective Rationality to the idea that an agent’s sequence of decisions should not be dominated by another sequence the agent could have made. They seem to reject Retrospective Rationality as a constraint on rational decision making because “[it] is useless to an agent who is wondering what to do… [the agent] should be concerned to make the best decision possible at [the time of the decision]”. 
  • My view: I don’t find this a very compelling argument, at least in the current context - it seems to me that the agent should avoid foreseeably violating Retrospective Rationality, and in Elga’s betting scenario the irrationality of the “reject both bets” sequence of decisions seems perfectly foreseeable.

Objection 2:

  • Their second objection is that Elga is wrong to think that your current decision about whether to accept bet B should be unaffected by whether you previously accepted or rejected bet A (they make a similar point regarding the decision to take bet A with vs without the knowledge that you’re about to be offered bet B). 
  • My view: it’s true that, because in Elga’s betting scenario the outcomes of the bets are correlated, knowing whether or not you previously accepted bet A might well change your inclination to accept bet B, e.g. because of risk aversion or a non-linear utility function. But to me it seems right that for an agent whose decision theory doesn’t include these features, it would be irrational to change their inclination to accept bet B based on what came before - and Elga was considering such an agent. So I think I side with Elga here.

Summary and some thoughts

In summary, in Subjective Probabilities should be Sharp, Elga illustrates how imprecise probabilities appear to permit a risk-neutral agent with linear utility to make irrational choices. In addition, Elga argues that there aren’t any ways to rescue things while keeping imprecise probabilities. In Should Subjective Probabilities be Sharp?, Bradley and Steele argue that Elga makes some implausibly strong assumptions about what it takes to be rational. I didn't find these arguments very convincing, although I might well have just failed to appreciate the points they were trying to make.

I think it basically comes down to this: for an agent with decision theory features like Sally’s, i.e. no risk aversion and linear utility, the only way to avoid passing up opportunities like making a risk-free $5 by taking bet A and bet B is if you’re always willing to take one side of any particular bet. The problem with imprecise probabilities is that they permit you to refrain from taking either side, which implies that you’re permitted to decline the risk-free $5.

The fan of imprecise probabilities can wriggle out of this by saying that you should be allowed to do things like taking bet B only if you just took bet A - but I agree with Elga that this just doesn’t make sense for an agent like Sally. I think the reason this might look overly demanding on the face of it is that we’re not like Sally - we’re risk averse and have concave utility. But agents who are risk averse or have concave utility are allowed both to sometimes decline bets and to take risk-free sequences of bets, even according to Elga’s rationality requirements, so I don’t think this intuition pushes against Elga’s rationality requirements.

It feels kind of useful to have read these papers, because

  • I’ve been kind of aware of imprecise probabilities and had a feeling I should think about them, and this has given me a bit of a feel for what they’re about.
  • It makes further reading in this area easier.
  • It’s good to get an idea of what sort of considerations people think about when deciding whether a decision theory is a good one. Similarly to when I dug more into moral philosophy, I now have more of a feeling along the lines of “there’s a lot of room for disagreement about what makes a good decision theory”.
  • Relatedly, it’s good to get a bit of a feeling of “there’s nothing really revolutionary or groundbreaking here and I should to some extent feel free to do what I want”.
My attempt to think about AI timelines

Thanks for these comments and for the chat earlier!

  • It sounds like to you, AGI means ~"human minds but better"* (maybe that's the case for everyone who's thought deeply about this topic, I don't know). On the other hand, the definition I used here, "AI that can perform a significant fraction of cognitive tasks as well as any human and for no more money than it would cost for a human to do it", falls well short of that on at least some reasonable interpretations. I definitely didn't mean to use an unusually weak definition of AGI here (I was partly basing it on this seemingly very weak definition from Lesswrong, i.e. "a machine capable of behaving intelligently over many domains"), but maybe I did.
  • Under at least some interpretations of "AI that can perform a significant fraction of cognitive tasks as well as any human and for no more money than it would cost for a human to do it", you don't (as I understand it) think that AGI  strongly implies TAI; but my impression is that you don't think AGI under this definition is the right thing to analyse.
  • Given your AGI definition, I probably want to give a significantly larger probability to "AGI implies TAI" than I did in this post (though on an inside view I'm probably not in "90% seems on the low end" territory, having not thought about this enough to have that much confidence).
  • I probably also want to push back my AGI timelines at least a bit (e.g. by checking what AGI definitions my outside view sources were using; though I didn't do this very thoroughly in the first place so the update might not be very large).

*I probably missed some nuance here, please feel free to clarify if so.

My attempt to think about AI timelines

Thanks, this was helpful as an example of one way I might improve this process.

Load More