This post contains some notes that I wrote after ~ 1 week of reading about Certificates of Impact as part of my work as a Research Scholar at the Future of Humanity Institute, and a bit of time after that thinking and talking about the idea here and there.
In this post, I
I’m sharing this here in case it’s useful - the intended audience is people who are curious about what Certificates of Impact are, and (to some extent) people who are thinking seriously about Certificates of Impact.
Note that, since I haven’t invested much time thinking about Certificates of Impact, my understanding of this area is fairly shallow. I’ve tried to include appropriate caveats in the text to reflect this, but I might not have always succeeded, so please bear this in mind.
Within this document, I’m using Certificates of Impact to refer to the general idea about creating a market in altruistic impact. I think that the general idea is also referred to as Impact Certificates, Tradeable Altruistic Impact, and Impact Purchases.
Certificates of Impact is an idea that's been floating around in the Effective Altruism community for some time. Paul Christiano and Katja Grace ran an experiment with Certificates of Impact about 5 years ago. I've seen various EA forum posts about Certificates of Impact too (see the final section of this post for some links).
By a market in altruistic impact, I mean something like the following: we imagine a future where there are people who want to donate to charity, and there are people who are doing high impact projects, and rather than them making the effort to seek each other out, they connect through this market. In the market, the individuals or organisations doing the projects issue Certificates of Impact, and donors buy them. And maybe as a donor you don't need to try so hard to find the best project, you just buy some certificates from some marketplace; and as someone doing a high impact project, you don't have to work so hard to connect to donors, because you find that there are profit seeking organisations that are willing to buy your certificates, and that's your source of funding.
Note both that the above is quite vague, and also there are probably some aspects you could change and still have something that could fall under Certificates of Impact.
There are lots of varieties of Certificates of Impact-type systems that could be tried. To make things easier, from now on in this document I’ll assess a concrete proposal called Certificates of Impact with Dedication (idea due to Owen Cotton-Barratt):
A market is created / exists where someone can issue a Certificate for work they believe to be altruistically impactful. We call this person the Issuer. There is a statute of limitations on issuing Certificates of two years (i.e. Certificates can’t be issued for work more than two years old). The Certificate is assessed by a Validator who confirms that the work specified on the Certificate has in fact been done. The Issuer then sells the Certificate in the market, maybe via an auction mechanism. Note that the Certificate can refer to some percentage of the project, so for example it might represent 40% of the altruistic impact of a project, while the Issuer keeps the other 60%. The Certificate is traded on the secondary market by professional traders and then bought by an Ultimate Buyer who is the ultimate consumer of the Certificate. The Ultimate Buyer then Dedicates the Certificate, possibly to themselves so that they get the credit for the altruistic impact.
Whoever the certificate gets Dedicated to is the one who gets the credit for the counterfactual altruistic impact of the project that the certificate refers to. And once a certificate has been Dedicated it can't be traded anymore, so Dedication is its end point. Importantly (in my view), if you don't have a Dedication mechanism it's not clear whether people who own Certificates of Impact have bought them so that they can resell them at a profit, or because they want to have altruistic impact.
In this section, I list ways Certificates of Impact might go well or badly, or why it might or might not work. Generally, I’ve tried to err on the side of including things even where I consider them to be very speculative.
Note that my opinions, where I give them, are pretty unstable: I can easily imagine myself changing my mind on reflection or after seeing new arguments.
Let’s imagine that a well-functioning Certificates of Impact with Dedication system exists with a large number of active participants including profit-seeking intermediaries. I list the ways this could be good below.
Here is a summary of the possible benefits. For each one, I’ve put my opinion regarding how likely it is in brackets.
Here’s more detail
I list below considerations for thinking about how feasible it is to get to a state where this is big and being used by lots of people (whether it is actually achieving the desired outcomes of improved efficiency etc or not).
I list below the ways a large (but maybe not well-functioning) market in Certificates of Impact with Dedication could fail to have a positive impact, or even be net negative.
Here is a summary of the possible issues / harms. For each one, I’ve put my opinion regarding how likely it is and/or how bad it might be in brackets.
FWIW I think you should make this a top level post.
Kind of surprised that this post doesn't link at all to Paul's post on altruistic equity: https://forum.effectivealtruism.org/posts/r7vmtHZKuosJZ3Xq5/altruistic-equity-allocation
Takeaways from some reading about economic effects of human-level AI
I spent some time reading things that you might categorise as “EA articles on the impact of human-level AI on economic growth”. Here are some takeaways from reading these (apologies for not always providing a lot of context / for not defining terms; hopefully clicking the links will provide decent context).
If you're interested in more on this topic, I'd highlight Holden Karnofsky's recent blog series and Tom Davidson's recent Open Phil report as good places to start.
In case it’s useful for other people, here’s the main stuff I (at least partially) read / listened to:
This from Paul Christiano in 2014 is also very relevant (part of it makes similar points to a lot of the recent stuff from Open Philanthropy, but the arguments are very brief; it's interesting to see how things have evolved over the years): Three impacts of machine intelligence
Here are some notes I made while reading a transcript of a seminar called You and Your Research by Richard Hamming. (I'd previously read this article with the same name, but I feel like I got something out of reading this seminar transcript although there's a lot of overlap).
Things I took away for myself
You might be interested in checking out Ingredients for creating disruptive research teams e.g. on vision, autonomy, spaces for interaction.
Also I noticed that Jess Whittlestone wrote some probably much better notes on this a few years ago
Changing your working to fit the answer
I wrote this last Autumn as a private “blog post” shared only with a few colleagues. I’m posting it publicly now (after mild editing) because I have some vague idea that it can be good to make things like this public. It is quite rambling and doesn't really have a clear point (but I think it's at least an interesting topic).
Say you want to come up with a model for AI timelines, i.e. the probability of transformative AI being developed by year X for various values of X. You put in your assumptions (beliefs about the world), come up with a framework for combining them, and get an answer out. But then you’re not happy with the answer - your framework must have been flawed, or maybe on reflection one of your assumptions needs a bit of revision. So you fiddle with one or two things and get another answer - now it looks much better, close enough to your prior belief that it seems plausible, but not so close that it seems suspicious.
Is this kind of procedure valid? Here’s one case where the answer seems to be yes: if your conclusions are logically impossible, you know that either there’s a flaw in your framework or you need to revise your assumptions (or both).
A closely related case is where the conclusion is logically possible, but extremely unlikely. It seems like there’s a lot of pressure to revise something then too.
But in the right context revising your model in this way can look pretty dodgy. It seems like you’re “doing things the wrong way round” - what was the point of building the model if you were going to fiddle with the assumptions until you got the answer you expected anyway?
I think this is connected to a lot of related issues / concepts:
Presumably, you could put this question of whether and how much to modify your model into some kind of formal Bayesian framework where on learning a new argument you update all your beliefs based on your prior beliefs in the premises, conclusion, and validity of the argument. I’m not sure whether there’s a literature on this, or whether e.g. highly skilled forecasters actually think like this.
In general though, it seems (to me) that there’s something important about “following where the assumptions / model takes you”. Maybe, given all the ways we fall short of being perfectly rational, we should (and I think that in fact we do) put more emphasis on this than a perfectly rational Bayesian agent would. Avoiding having a very strong prior on the conclusion seems helpful here.
I wrote this last Autumn as a private “blog post” shared only with a few colleagues. I’m posting it publicly now (after mild editing) because I have some vague idea that it can be good to make things like this public.
In this post I’m going to discuss two papers regarding imprecise probability that I read this week for a Decision Theory seminar. The first paper seeks to show that imprecise probabilities don’t adequately constrain the actions of a rational agent. The second paper seeks to refute that claim.
Just a note on how seriously to take what I’ve written here: I think I’ve got the gist of what’s in these papers, but I feel I could spend a lot more time making sure I’ve understood them and thinking about which arguments I find persuasive. It’s very possible I’ve misunderstood or misrepresented the points the papers were trying to make, and I can easily see myself changing my mind about things if I thought and read more.
Also, a note on terminology: it seems like “sharpness/unsharpness” and “precision/imprecision” are used interchangeably in these papers, as are “probability” and “credence”. There might well be subtle distinctions that I’m missing, but I’ll try to consistently use “imprecise probabilities” here.
I imagine there are (at least) several different ways of formulating imprecise probabilities. One way is the following: your belief state is represented by a set of probability functions, and your degree of belief in a particular proposition is represented by the set of values assigned to it by the set of probability functions. You also then have an imprecise expectation: each of your probability functions has an associated expected utility. Sometimes, all of your probability functions will agree on the action that has the highest expected value. In that case, you are rationally required to take that action. But if there’s no clear winner, that means there’s more than one permissible action you could take.
The first paper, Subjective Probabilities should be Sharp, was written in 2010 by Elga. The central claim is that there’s no plausible account of how imprecise probabilities constrain which choices are reasonable for a perfectly rational agent.
The argument centers around a particular betting scenario: someone tells you “I’m going to offer you bet A and then bet B, regarding a hypothesis H”:
Bet A: win $15 if H, else lose $10
Bet B: lose $10 if H, else win $15
You’re free to choose whether to take bet B independently of whether you choose bet A.
Depending on what you believe about H, it could well be that you prefer just one of the bets to both bets. But it seems like you really shouldn’t reject both bets. Taking both bets guarantees you’ll win exactly $5, which is strictly better than the $0 you’ll win if you reject both bets.
But under imprecise probabilities, it’s rationally permissible to have some range of probabilities for H, which implies that it’s permissible to reject both bet A and bet B. So imprecise probabilities permit something which seems like it ought to be impermissible.
Elga considers various rules that might be added to the initial imprecise probabilities-based decision theory, and argues that none of them are very appealing. I guess this isn’t as good as proving that there are no good rules or other modifications, but I found it fairly compelling on the face of it.
The rules that seemed most likely to work to me were Plan and Sequence. Both rules more or less entail that you should accept bet B if you already accepted bet A, in which case rejecting both bets is impermissible and it looks like the theory is saved.
Elga tries to show that these don’t work by inviting us to imagine the case where a particular agent called Sally faces the decision problem. Sally has imprecise probabilities, maximises expected utility and has a utility function that is linear in dollars.
Elga argues that in this scenario it just doesn’t make sense for Sally to accept bet B only if she already accepted bet A - the decision to accept bet B shouldn’t depend on anything that came before. It might do if Sally had some risk averse decision theory, or had a utility function that was concave in dollars - but by assumption, she doesn’t. So Plan and Sequence, which had seemed like the best candidates for rescuing imprecise probabilities, aren’t plausible rules for a rational agent like Sally.
The 2014 paper by Bradley and Steele, Should Subjective Probabilities be Sharp? is, as the name suggests, a response to Elga’s paper. The core of their argument is that the assumptions for rationality implied by Elga’s argument are too strong and that it’s perfectly possible to have rational choice with imprecise probabilities provided that you don’t make these too-strong assumptions.
I’ll highlight two objections and give my view.
In summary, in Subjective Probabilities should be Sharp, Elga illustrates how imprecise probabilities appear to permit a risk-neutral agent with linear utility to make irrational choices. In addition, Elga argues that there aren’t any ways to rescue things while keeping imprecise probabilities. In Should Subjective Probabilities be Sharp?, Bradley and Steele argue that Elga makes some implausibly strong assumptions about what it takes to be rational. I didn't find these arguments very convincing, although I might well have just failed to appreciate the points they were trying to make.
I think it basically comes down to this: for an agent with decision theory features like Sally’s, i.e. no risk aversion and linear utility, the only way to avoid passing up opportunities like making a risk-free $5 by taking bet A and bet B is if you’re always willing to take one side of any particular bet. The problem with imprecise probabilities is that they permit you to refrain from taking either side, which implies that you’re permitted to decline the risk-free $5.
The fan of imprecise probabilities can wriggle out of this by saying that you should be allowed to do things like taking bet B only if you just took bet A - but I agree with Elga that this just doesn’t make sense for an agent like Sally. I think the reason this might look overly demanding on the face of it is that we’re not like Sally - we’re risk averse and have concave utility. But agents who are risk averse or have concave utility are allowed both to sometimes decline bets and to take risk-free sequences of bets, even according to Elga’s rationality requirements, so I don’t think this intuition pushes against Elga’s rationality requirements.
It feels kind of useful to have read these papers, because
Causal vs evidential decision theory
I wrote this last Autumn as a private “blog post” shared only with a few colleagues. I’m posting it publicly now (after mild editing) because I have some vague idea that it can be good to make things like this public. Decision theories are a pretty well-worn topic in EA circles and I'm definitely not adding new insights here. These are just some fairly naive thoughts-out-loud about how CDT and EDT handle various scenarios. If you've already thought a lot about decision theory you probably won't learn anything from this.
The last two weeks of the Decision Theory seminars I’ve been attending have focussed on contrasting causal decision theory (CDT) and evidential decision theory (EDT). This seems to be a pretty active area of discussion in the literature - one of the papers we looked at was published this year, and another is yet to be published.
In terms of the history of the field, it seems that Newcomb’s problem prompted a move towards CDT (e.g. in Lewis 1981). I find that pretty surprising because to me Newcomb’s problem provides quite a bit of motivation for EDT, and without weird scenarios like Newcomb’s problem I think I might have taken something like CDT to be the default, obviously correct theory. But it seems like you didn’t need to worry about having a “causal aspect” to decision theories until Newcomb’s problem and other similar problems brought out a divergence in recommendations from (what became known as) CDT and EDT.
I guess this is a very well-worn area (especially in places like Lesswrong) but anyway I can’t resist giving my fairly naive take even though I’m sure I’m just repeating what others have said. When I first heard about things like Newcomb’s problem a few years ago I think I was a pretty ardent CDTer, whereas nowadays I am much more sympathetic to EDT.
In Newcomb’s problem, it seems pretty clear to me that one-boxing is the best option, because I’d rather have $1,000,000 than $1000. Seems like a win for EDT.
Dicing With Death is designed to give CDT problems, and in my opinion it does this very effectively. In Dicing With Death, you have to choose between going to Aleppo or Damascus, and you know that whichever you choose, death will have predicted your choice and be waiting for you (a very bad outcome for you). Luckily, a merchant offers you a magical coin which you can toss to decide where to go, in which case death won’t be able to predict where you go, giving you a 50% chance of avoiding death. The merchant will charge a small fee for this. However CDT gets into some strange contortions and as a result recommends against paying for the magical coin, even though the outcome if you pay for the magical coin seems clearly better. EDT recommends paying for the coin, another win for EDT.
To me, The Smoking Lesion is a somewhat problematic scenario for EDT. Still, I feel like it’s possible for EDT to do fine here if you think carefully enough.
You could make the following simple model for what happens in The Smoking Lesion: in year 1, no-one knows why some people get cancer and some don’t. In year 2, it’s discovered that everyone who smokes develops cancer, and furthermore there’s a common cause (a lesion) that causes both of these things. Everyone smokes iff they have the lesion, and everyone gets cancer iff they have the lesion. In year 3, following the publication of these results, some people who have the lesion try not to smoke. Either (i) none of them can avoid smoking because the power of the lesion is too strong; or (ii) some of them do avoid smoking, but (since they still have the lesion) they still develop cancer. In case (i), the findings from year 2 remain valid even after everyone knows about them. In case (ii), the findings from year 2 are no longer valid: they just tell you about how the world would have been if the correlation between smoking and cancer wasn’t known.
The cases where you use the knowledge about the year 2 finding to decide not to smoke are exactly the cases where the year 2 finding doesn’t apply. So there’s no point in using the knowledge about the year 2 finding to not smoke: either your not smoking (through extreme self-control etc) is pointless because you still have the lesion and this is a case where the year 2 finding doesn’t apply, or it’s pointless because you don’t have the lesion.
So it seems to me like the right answer is to smoke if you want to, and I think EDT can recommend this by incorporating the fact that if you choose not to smoke purely because of the year 2 finding, this doesn’t give you any evidence about whether you have the lesion (though this is pretty vague and I wouldn’t be that surprised if making it more precise made me realise it doesn’t work).
In general it seems like these issues arise from treating the agent’s decision making process as being removed from the physical world - a very useful abstraction which causes issues in weird edge cases like the ones considered above.
Are you familiar with MIRI's work on this? One recent iteration is Functional Decision Theory, though it is unclear to me if they made more recent progress since then. It took me a long time to come around to it, but I currently buy that FDT is superior to CDT in the twin prisoner's dilemma case, while not falling to evidential blackmail (the way EDT does), as well as being notably superior overall in the stylized situation of "how should an agent relate to a world where other smarter agents can potentially read the agent's source code"
Thanks that's interesting, I've heard of it but I haven't looked into it.
Here are some forecasts for near-term progress / impacts of AI on research. They are the results of some small-ish number of hours of reading + thinking, and shouldn't be taken at all seriously. I'm sharing in case it's interesting for people and especially to get feedback on my bottom line probabilities and thought processes. I'm pretty sure there are some things I'm very wrong about in the below and I'd love for those to be corrected.
I realise that "excellent performance" etc is vague, I choose to live with that rather than putting in the time to make everything precise (or not doing the exercise at all).
If you don't know what multi-domain proteins and protein complexes are, I found this Mohammed Al Quraishi blog very useful (maybe try ctrl-f for those terms), although maybe you need to start with some relevant background knowledge. I don't have a great sense for how big a deal this would be for various areas of biological science, but my impression is that they're both roughly the same order of magnitude of usefulness as getting excellent performance on single-domain proteins was (i.e. what AF2 has already achieved).
As for why:
80% chance that excellent AI performance on multi-domain proteins is announced by end of 2023
70% chance that excellent AI performance on protein complexes is announced by end of 2023
20% chance of widespread adoption of a system like OpenAI Codex for data analysis by end of 2023
(NB this is just about data analysis / "data science" rather than about usage of Codex in general)
"Effective Altruism out of self interest"
I recently finished listening to Kevin Simler and Robin Hanson’s excellent The Elephant in the Brain. Although I’d probably been exposed to the main ideas before, it got me thinking more about people’s hidden motivations for doing things.
In particular, I’ve been thinking a bit about the motives (hidden or otherwise) for being an Effective Altruist.
It would probably feel really great to save someone’s life by rescuing them from a burning building, or to rescue a drowning child as in Peter Singer’s famous drowning child argument, and so you might think that the feeling of saving a life is reward enough. I do think it would feel really great to pull someone from a burning building or to save a drowning child - but does it feel as great to save a life by giving $4500 to AMF? Not to me.
It’s not too hard to explain why saving someone from a burning building would feel better - you get to experience the gratitude from the person, their loved ones and their friends, for example. Simler and Hanson give an additional reason, or maybe the underlying reason, which I find quite compelling: when you perform a charitable act, you experience a benefit by showing others that you’re the kind of person who will look out for them, making people think that you’d make a good ally (friend, romantic partner, and so on). To be clear, this is a hidden, subconscious motive - according to the theory, you will not be consciously aware that you have this motive.
What explains Effective Altruism, then? Firstly I should say that I don’t think Simler and Hanson would necessarily argue that “true altruism” doesn’t exist - I think they’d say that people are complicated, and you can rarely use a single motive (hidden or not) to explain the behaviour of a diverse group of individuals. So true altruism may well be part of the explanation, even on their view as I understand it. Still, presumably true altruism isn’t the only motive even for really committed Effective Altruists.
One thing that seems true about our selfish, hidden motives is that they only work as long as they can remain hidden. So maybe, in the case of charitable behaviour, it’s possible to alert everyone to the selfish hidden motive: “if you’re donating purely because you want to help others, why don’t you donate to the Against Malaria Foundation, and do much more good than you do currently by donating to [some famous less effective charity]?” When everyone knows that there’s a basically solid argument for only donating to effective charities if you want to benefit others, when people donate to ineffective charities it’ll transparently be due to selfish motives.
Thinking along these lines, joining the Effective Altruism movement can be seen as a way to “get in at the ground floor”: if the movement is eventually successful in changing the status quo, you will get brownie points for having been right all along, and the Effective Altruist area you’ve built a career in will get a large prestige boost when everyone agrees that it is indeed effectively altruistic.
And of course many Effective Altruists do want and expect the movement to grow. E.g. The Global Priorities Institute’s mission is (or at least was officially in 2017) to make Effective Altruist ideas mainstream within academia, and Open Philanthropy says it wants to grow the Effective Altruism community.
One fairly obvious (and hardly surprising) prediction you would make from this is that if Effective Altruism doesn’t look like it will grow further (either through community growth or through wider adoption of Effective Altruist ideas), you would expect Effective Altruists to feel significantly less motivated.
This in turn suggests that spreading Effective Altruist ideas might be important purely for maintaining motivation for people already part of the Effective Altruist community. This sounds pretty obvious, but I don’t really hear people talking about it.
Maybe this is a neglected source of interventions. This would make sense given the nature of the hidden motives Simler and Hanson describe - a key feature of these hidden motives is that we don’t like to admit that we have them, which is hard to avoid if we want to use them to justify interventions.
In any case, I don’t think that the existence of this motive for being part of the Effective Altruism movement is a particularly bad thing. We are all human, after all. If Effective Altruist ideas are eventually adopted as common sense partly thanks to the Effective Altruism movement, that seems like a pretty big win to me, regardless of what might have motivated individuals within the movement.
It would also strike me as a pretty Pinker-esque story of quasi-inevitable progress: the claim is that these (true) Effective Altruist beliefs will propagate through society because people like being proved right. Maybe I’m naive, but in this particular case it seems plausible to me.
Thinking along these lines, joining the Effective Altruism movement can be seen as a way to “get in at the ground floor”: if the movement is eventually successful in changing the status quo, you will get brownie points for having been right all along, and the Effective Altruist area you’ve built a career in will get a large prestige boost when everyone agrees that it is indeed effectively altruistic.
Joining EA seems like a very suboptimal way to get brownie points from society at large and even from groups which EA represents the best (students/graduates of elite colleges). Isn't getting into social justice a better investment? What are the subgroups you think EAs try hard to impress?
I guess I'm saying that getting into social justice is more like "instant gratification", and joining EA is more like "playing the long game" / "taking relative pain now for a huge payoff later".
Also / alternatively, maybe getting into social justice is impressing one group of people but making another group of people massively dislike you (and making a lot of people shrug their shoulders), whereas when the correctness of EA is known to all, having got in early will lead to brownie points from everyone.
So maybe the subgroup is "most people at some future time" or something?
(hopefully it's clear, but I'm ~trying to argue from the point of view of the post; I think this is fun to think about but I'm not sure how much I really believe it)
When everyone knows that there’s a basically solid argument for only donating to effective charities if you want to benefit others, when people donate to ineffective charities it’ll transparently be due to selfish motives.
I'm not sure that's necessarily true. People may have motives for donating to ineffective charities that are better characterised as moral but not welfare-maximising (special obligations, expressing a virtue, etc).
Also, if everyone knows that there's a solid argument for only donating to effective charities, then it seems that one would suffer reputationally for donating to ineffective charities. That may, in a sense, rather provide people with a selfish motive to donate to effective charities, meaning that we might expect donations to ineffective charities to be due to other motives.
I also wanted to share a comment on this from Max Daniel (also from last Autumn) that I found very interesting.
But many EAs already have lots of close personal relationships with other EAs, and so they can already get social status by acting in ways approved by those peers. I'm not sure it helps if the number of distant strangers also liking these ideas grow.I actually think that, if anything, 'hidden motives' on balance cause EAs to _under_value growth: It mostly won't feel that valuable because it has little effect on your day-to-day life, and it even threatens your status by recruiting competitors.This is particularly true for proposed growth trajectories that would chance the social dynamics of the movement. Most EAs enjoy abstract, intellectual discussions with other people who are smart and are politically liberal, so any proposal that would dilute the 'quality' of the movement or recruit a lot of conservatives is harmful for the enjoyment most current EAs derive from community interactions. (There may also be impartial reasons against such growth trajectories of course.)
My reaction to this:
I recently spent some time trying to work out what I think about AI timelines. I definitely don’t have any particular insight here; I just thought it was a useful exercise for me to go through for various reasons (and I did find it very useful!).
As it came out, I "estimated" a ~5% chance of TAI by 2030 and a ~20% chance of TAI by 2050 (the probabilities for AGI are slightly higher). As you’d expect me to say, these numbers are highly non-robust.
When I showed them the below plots a couple of people commented that they were surprised that my AGI probabilities are higher than my TAI ones, and I now think I didn’t think about non-AGI routes to TAI enough when I did this. I’d now probably increase the TAI probabilities a bit and lower the AGI ones a bit compared to what I’m showing here (by “a bit” I mean ~maybe a few percentage points).
I generated these numbers by forming an inside view, an outside view, and making some heuristic adjustments. The inside and outside views are ~weighted averages of various forecasts. My timelines are especially sensitive to how I chose and weighted forecasts for my outside view.
Here are my timelines in graphical form:
And here they are again alongside some other timelines people have made public:
If you want more detail, there’s a lot more in this google doc. I’ll probably write another shortform post with some more thoughts / reflections on the process later.
In the below I give a very rough summary of Will MacAskill’s article Are We Living At The Hinge Of History? and give some very preliminary thoughts on the article and some of the questions it raises.
I definitely don’t think that what I’m writing here is particularly original or insightful: I’ve thought about this for no more than a few days, any points I make are probably repeating points other people have already made somewhere, and/or are misguided, etc. This seems like an incredibly deep topic which I feel like I’ve barely scratched the surface of. Also, this is not a focussed piece of writing trying to make a particular point, it’s just a collection of thoughts on a certain topic.(If you want to just see what I think, skip down to "Some thoughts on the issues discussed in the article")
(note that the article is an updated version of the original EA Forum post Are we living at the most influential time in history?)
The Hinge of History claim (HH): we are among the most influential people ever (past or future). Influentialness is, roughly, how much good a particular person at a particular time can do through direct expenditure of resources (rather than investment)
Two worldviews prominent in longtermist EA imply that HH is true:
The base rates argument
Claim: our prior should be that we’re as likely as anyone else, past or present, to be the most influential person ever (Bostrom’s Self-Sampling Assumption (SSA)). Under this prior, it’s astronomically unlikely that any particular person is the most influential person ever.
Then the question is how much should we update from this prior
Counterargument 1: we only need to be at an enormously influential time, not the most influential, and the implications are ~the same either way
Counterargument 2: the action-relevant thing is the influentialness of now compared to any time we can pass resources on to
The inductive argument
Argument 1: we’re living on a single planet, implying greater influentialness
Argument 2: we’re now in a period of unusually fast economic and tech progress, implying greater influentialness. We can’t maintain the present-day growth rate indefinitely.
MacAskill seems sympathetic to the argument, but says it implies not that today is the most important time, but that the most important time is some time might be in the next few thousand years
A few other arguments for HH are briefly touched on in a footnote: that existential risk / value lock-in lowers the number of future people in the reference class for the influentialness prior; that we might choose other priors that are more favourable for HH, and that earlier people can causally affect more future people
It kind of feels like there are two somewhat independent things that are most interesting from the article:
Avoiding rejecting the Time of Perils and Bostrom-Yudkowsky views
I think there are a few ways we can go to avoid rejecting the Time of Perils and Bostrom-Yudkowsky views
A nearby alternative is to modify the Time of Perils and Bostrom-Yudkowsky views a bit so that they don’t imply we’re among the most influential people ever. E.g. for the Bostrom-Yudkowsky view we could make the value lock-in a bit “softer” by saying that for some reason, not necessarily known/stated, the lock-in would probably end after some moderate (on cosmological scales) length of time. I’d guess that many people might find a modified view more plausible even independently of the influentialness implications.
I’m not really sure what I think here, but I feel pretty sympathetic to the idea that we should be uncertain about the prior and that this maybe lends itself to having not too strong a prior against the Time of Perils and Bostrom-Yudkowsky views.
On the question of whether to expend resources now or later
The arguments MacAskill discusses suggest that the relevant time frame is the next few thousand years (because the next few thousand years seem (in expectation) especially high influentialness and because it might be effectively impossible to pass our resources further into the future).
It seems like the pivotal importance of priors on influentialness (or similar) then evaporates: it no longer seems that implausible on the SSA prior that now is a good time to expend resources rather than save. E.g. say there’ll be a 20 year period in the next 1000 years where we want to expend philanthropic resources rather than save them to pass on to future generations. Then a reasonable prior might be that we have a 20/1000 = 1 in 50 chance of being in that period. That’s a useful reference point and is enough to make us skeptical about arguments that we are in such a period, but it doesn’t seem overwhelming. In fact, we’d probably want to spend at least some resources now even purely based on this prior.
In particular, it seems like some kind of detailed analysis is needed, maybe along the lines of Trammell’s model or at least using that model as a starting point. I think many of the arguments in MacAskill’s article should be part of that detailed analysis, but, to stress the point, they don’t seem decisive to me.
This comment by Carl Shulman on the related EA Forum post and its replies has some stuff on this.
In the article, the Inductive Argument is supported by the idea of moral progress: MacAskill cites the apparent progress in our moral values over the past 400 years as evidence for the idea that we should expect future generations to have better moral values than we do. Obviously, whether we should expect moral progress in the future is a really complex question, but I’m at least sympathetic to the idea that there isn’t really moral progress, just moral fashions (so societies closer in time to ours seem to have better moral values just because they tend to think more like us).
Of course, if we don’t expect moral progress, maybe it’s not so surprising that we have very high influentialness: if past and future actors don’t share our values, it seems very plausible on the face of it that we’re better off expending our resources now than passing them off to future generations in the hope they’ll carry out our wishes. So maybe MacAskill’s argument about influentialness should update us away from the idea of moral progress?
But if we’re steadfast in our belief in moral progress, maybe it’s not so surprising that we have high influentialness because we find ourselves in a world where we are among the very few with a longtermist worldview, which won’t be the case in the future as longtermism becomes a more popular view. (I think Carl Shulman might say something like this in the comments to the original EA Forum post)
I like the way the article ends, providing some motivation for the Inductive Argument in a way I find appealing on a gut level:
Just as our powers to grow crops, to transmit information, to discover the laws of nature, and to explore the cosmos have all increased over time, so will our power to make the world better — our influentialness. And given how much there is still to understand, we should believe, and hope, that our descendents look back at us as we look back at those in the medieval era, marvelling at how we could have got it all so wrong.
I wrote this last Summer as a private “blog post” just for me. I’m posting it publicly now (after mild editing) because I have some vague idea that it can be good to make things like this public. These thoughts come from my very naive point of view (as it was in the Summer of 2020; not to suggest my present day point of view is much less naive). In particular if you’ve already read lots of moral philosophy you probably won’t learn anything from reading this. Also, I hope my summaries of other people’s arguments aren’t too inaccurate.
Recently, I’ve been trying to think seriously about what it means to do good. A key part of Effective Altruism is asking ourselves how we can do the most good. Often, considering this question seems to be mostly an empirical task: how many lives will be saved through intervention A, and how many through intervention B? Aside from the empirical questions, though, there are also theoretical ones. One key consideration is what we mean by doing good.
There is a branch of philosophy called moral philosophy which is (partly) concerned with answering this question.
It’s important to me that I don’t get too drawn into the particular framings that have evolved within the academic discipline of moral philosophy, which are, presumably, partly due to cultural or historical forces, etc. This is because I really want to try to come up with my own view, and I think that (for me) the best process for this involves not taking other people’s views or existing works too seriously, especially while I try to think about these things seriously for the first time.
Still, it seems useful to get familiar with the major insights and general way of thinking within moral philosophy, because
I’ve read a couple of Stanford Encyclopedia of Philosophy articles, and a series of posts by Lukas Gloor arguing for moral anti-realism.
I found the Stanford Encyclopedia of Philosophy articles fairly tough going but still kind of useful. I thought the Gloor posts were great.
The Gloor posts have kind of convinced me to take the moral anti-realist side, which, roughly, denies the existence of definitive moral truths.
While I suppose I might consider my “inside view” to be moral anti-realist at the moment, I can easily see this changing in the future. For example, I imagine that if I read a well-argued case for moral realism, I might well change my mind.
In fact, prior to reading Gloor’s posts, I probably considered myself to be a moral realist. I think I’d heard arguments, maybe from Will MacAskill, along the lines that i) if moral anti-realism is true, then nothing matters, whereas if realism is true, you should do what the true theory requires you to do, and ii) there’s some chance that realism is true, therefore iii) you should do what the true theory requires you to do.
Gloor discusses an argument like this in one of his posts. He calls belief in moral realism founded on this sort of argument “metaethical fanaticism” (if I’ve understood him correctly).
I’m not sure that I completely understood everything in Gloor’s posts. But the “fanaticism” label does feel appropriate to me. It feels like there’s a close analogy with the kinds of fanaticism that utilitarianism is susceptible to, for example. An example of that might be a Pascal’s wager type argument - if there’s a finite probability that I’ll get infinite utility derived from an eternal life in a Christian heaven, I should do what I can to maximise that probability.
It feels like something has gone wrong here (although admittedly it’s not clear what), and this Pascal’s wager argument doesn’t feel at all like a strong argument for acting as if there’s a Christian heaven. Likewise, the “moral realist wager” doesn’t feel like a strong argument for acting as if moral realism is true, in my current view.
Gloor also argues that we don’t lose anything worth having by being moral anti-realists, at least if you’re his brand of moral anti-realist. I think he calls the view he favours “pluralistic moral reductionism”.
On his view, you take any moral view (or maybe combination of views) you like. These can (and maybe for some people, “should”) be grounded in our moral intuitions, and maybe use notions of simplicity of structure etc, just as a moral realist might ground their guess(?) at the true moral theory in similar principles. Your moral view is then your own “personal philosophy”, which you choose to live by.
One unfortunate consequence of this view is that you don’t really have any grounds to argue with someone else who happens to have a different view. Their view is only “wrong” in the sense that it doesn’t agree with yours; there’s no objective truth here.
From this perspective, it would arguably be nicer if everyone believed that there was a true moral view that we should strive to follow (even if we don’t know what it is). Especially if you also believe that we could make progress towards that true moral view.
I’m not sure how big this effect is, but it feels like more than nothing. So maybe I don’t quite agree that we don’t lose anything worth having by being moral anti-realists.
In any case, the fact that we might wish that moral realism is true doesn’t (presumably) have any bearing on whether or not it is true.
I already mentioned that reading Gloor’s posts has caused me to favour moral anti-realism. Another effect, I think, is that I am more agnostic about the correct moral theory. Some form of utilitarianism, or at least consequentialism, seems far more plausible to me as the moral realist “one true theory” than a deontological theory or virtue ethics theory. Whereas if moral anti-realism is correct, I might be more open to non-consequentialist theories. (I’m not sure whether this new belief would stand up to a decent period of reflection, though - maybe I’d be just as much of a convinced moral anti-realist consequentialist after some reflection).
I wrote this last Summer as a private “blog post” just for me. I’m posting it publicly now (after mild editing) because I have some vague idea that it can be good to make things like this public. These rambling thoughts come from my very naive point of view (as it was in the Summer of 2020; not to suggest my present day point of view is much less naive). In particular if you’ve already read lots of moral philosophy you probably won’t learn anything from reading this.
Generally, reading various moral philosophy writings has probably made me (even) more comfortable trusting my own intuitions / reasoning regarding what “morality” is and what the “correct moral theory” is.
I think that, when you start engaging with moral philosophy, there’s a bit of a feeling that when you’re trying to reason about things like what’s right and wrong, which moral theory is superior to the other, etc, there are some concrete rules you need to follow, and (relatedly) certain words or phrases have a solid, technical definition that everyone with sufficient knowledge knows and agrees on. The “certain words or phrases” I have in mind here are things like “morally right”, “blameworthy”, “ought”, “value”, “acted wrongly”, etc.
To me right now, the situation seems a bit more like the following: moral philosophers (including knowledgeable amateurs, etc) have in mind definitions for certain words, but these definitions may be more or less precise, might change over time, and differ from person to person. And in making a “moral philosophy” argument (say, writing down an argument for a certain moral theory), the philosopher can use flexibility of interpretation as a tool to make their argument appear more forceful than it really is. Or, the philosopher’s argument might imply that certain things are self-evidently true, and the reader might be (maybe unconsciously) fooled into thinking that this is the case, when in fact it isn’t.
It seems to me now that genuinely self-evident truths are in very short supply in moral philosophy. And, now that I think this is the case, I feel like I have much more licence to make up my own mind about things. That feels quite liberating.
But it does also feel potentially dangerous. Of course, I don’t think it’s dangerous that *I* have freedom to decide what “doing good” means to me. But I might find it dangerous that others have that freedom. People can consider committing genocide to be “doing what is right” and it would be nice to have a stronger argument against this than “this conflicts with my personal definition of what good is”. And, of course, others might well think it’s dangerous that I have the freedom to decide what doing good means.
Now that we’re in this free-for-all, even defining morality seems problematic.
I suppose I can make some observations about myself, like
I guess these things are all in the region of “wanting to improve the lives of others”. This sounds a lot like wanting to do what is morally good / morally praiseworthy, and seems at least closely related to morality.
In some ways, whether I label some of my goals and beliefs as being to do with “morality” doesn’t matter - either way, it seems clear that the academic field of moral philosophy is pretty relevant. And presumably when people talk about morality outside of an academic context, they’re at least sometimes talking about roughly the thing I’m thinking of.
Here are some thoughts after reading a book called "The Inner Game of Tennis" by Timothy Gallwey. I think it's quite a famous book and maybe a lot of people know it well already. I consider it to be mainly about how to prevent your system 2/conscious mind/analytical mind from interfering with the performance of your system 1/subconscious mind/intuitive mind. This is explained in the context of tennis, but it seems applicable to many other contexts, as the author himself argues. If that sounds interesting, I recommend checking the book out, it's short and quite readable.
My interest in the book comes mainly from thinking about the best way to go about doing research, at a day-to-day level. Although the arguments of the book seem most directly applicable to learning a physical skill/activity and (to some extent) to performing well at key moments, I still think there are lessons for mental activities performed routinely, i.e. for activities like research.
I think reading the book has generally pushed me a bit more in favour of "trusting my system 1/intuitive mind" while doing research, e.g. trusting that my brain is doing some important processing when I feel inclined to just stare into space and not make any apparent progress to whatever it is I'm trying to achieve at that moment. This feels pretty important.
I think Owen Cotton Barratt says some interesting things about trusting his intuition for prioritisation in this interview with Lynette Bye, which feels kind of related.
The book predates by many decades Kahneman's Thinking Fast and Slow, which (I think) popularised the concept of system 1 mind and system 2 mind. The book instead refers to "self 1" and "self 2" which seem to have roughly similar meanings, although unfortunately reversed: Gallwey's self 1 and Kahneman's system 2 refer to the conscious/analytical mind, while Gallwey's self 2 and Kahneman's system 1 refer to the subconscious/intuitive mind.
Here are some disorganised notes on bits that seemed worth highlighting (page numbers refer to 2015 edition published by Pan Books):