This is a short follow up to my post on the optimal timing of spending on AGI safety work which, given exact values for the future real interest, diminishing returns and other factors, calculated the optimal spending schedule for AI risk interventions.
This has also been added to the post’s appendix and assumes some familiarity with the post.
Here I consider the most robust spending policies and supposes uncertainty over nearly all parameters in the model Inputs that are not considered include: historic spending on research and influence, rather than finding the optimal solutions based on point estimates and again find that the community’s current spending rate on AI risk interventions is too low.
My distributions over the the model parameters imply that
I recommend entering your own distributions for the parameters in the Python notebook here. Further, these preliminary results use few samples: more reliable results would be obtained with more samples (and more computing time).
I allow for post-fire-alarm spending (i.e., we are certain AGI is soon and so can spend some fraction of our capital). Without this feature, the optimal schedules would likely recommend a greater spending rate.
Caption: Fixed spending rate. See here for the distributions of utility for each spending rate.
Caption: Simple - two regime - spending rate
Caption: The results from a simple optimiser, when allowing for four spending regimes: 2022-2027, 2027-2032, 2032-2037 and 2037 onwards. This result should not be taken too seriously: more samples should be used, the optimiser runs for a greater number of steps and more intervals used. As with other results, this is contingent on the distributions of parameters.
Caption: An example real interest function r(t), cherry picked to show how our capital can go down significantly. See here for 100 unbiased samples of r(t).
Caption: Example probability-of-success functions. The filled circle indicates the current preparedness and probability of success.
Caption: Example competition functions. They all pass through (2022, 1) since the competition function is the relative cost of one unit of influence compared to the current cost.
This short extension started due to a conversation with David Field and comment from Vasco Grilo; I’m grateful to both for the suggestion.
Inputs that are not considered include: historic spending on research and influence, the rate at which the real interest rate changes, the post-fire alarm returns are considered to be the same as the pre-fire alarm returns.
And supposing a 50:50 split between spending on research and influence
This notebook is less user-friendly than the notebook used in the main optimal spending result (though not un user friendly) - let me know if improvements to the notebook would be useful for you.
The intermediate steps of the optimiser are here.
I think there are benefits to thinking about where to give (fun, having engagement with the community, skill building, fuzzies) but I think that most people shouldn’t think too much about it and - if they are deciding where to give - should do one of the following.
1 Give to the donor lottery
I primarily recommend giving through a donor lottery and then only thinking about where to give in the case you win. There are existing arguments for the donor lottery.
2 Deliberately funge with funders you trust
Alternatively I would recommend deliberately ‘funging’ with other funders (e.g. Open Philanthropy), such as through GiveWell’s Top Charities Fund.
However, if you have empirical or value disagreements with the large funder you funge with or believe they are mistaken, you may be able to do better by doing your own research.
3 If you work at an ‘effective’ organisation, take a salary cut
Finally, if you work at an organisation whose mission you believe effective, or is funded by a large funder (see previous point on funging), consider taking a salary cut.
(a) Saving now to give later
I would say to just give to the donor lottery and if you win: first, spend some time thinking and decide whether you want to give later. If you conclude yes, give to something like the Patient Philanthropy Fund, set-up some new mechanism for giving later or (as you always can) enter/create a new donor lottery.
(b) Thinking too long about it - unless it's rewarding for you
Where rewarding could be any of: fun, interesting, good for the community, gives fuzzies, builds skills or something else. There’s no obligation at all in working out your own cost effectiveness estimates of charities and choosing the best.
(c) Thinking too much about funging, counterfactuals or Shapley values
My guess is that if everyone does the ‘obvious’ strategy of “donate to the things that look most cost effective” and you’re broadly on board with the values, empirical beliefs and donation mindset of the other donors in the community, it’s not worth considering how counterfactual your donation was or who you’ve funged with.
Thanks to Tom Barnes for comments.
Consider the goal factoring the activity of “doing research about where to give this year”. It’s possible there are distinct tasks that achieve your goals better (e.g. “give to the donor lottery” and “do independent research on X” that better achieve your goals).
For example, I write here how - given Metaculus AGI timelines and a speculative projection of Open Philanthropy’s spending strategy - small donors donations’ can go further when not funging with them.
A sufficient (but certainly not necessary) condition could be “receives funding from an EA-aligned funded, such as Open Philanthropy” (if you trust the judgement and the share values of the funder)
This is potentially UK specific (I don’t know about other countries) and for people on relatively high salaries (>£50k, the point at which the marginal tax rate is greater than Gift Aid one can claim back).
With the caveat of making sure opportunities doesn’t get overfunded
I’d guess there is a high degree of values overlap in your community: if you donate to a global health organisation and another donor - as a result of your donation - decides to donate elsewhere, it seems reasonably likely they will donate to another global health organisation.
I’d guess this overlap is relatively high for niche EA organisations. I’ve written about how to factor in funging as a result of (implicit) differences of AI timelines. Other such empirical beliefs could include: beliefs about the relative importance of different existential risks among longtermists or the value of some global health interventions (e.g. Strong Minds)
For particularly public charitable organisations and causes, I’d guess there is less mindset overlap. That is, whether the person you’ve funged with shares the effectiveness mindset (and so their donation may be to a charity you would judge as less cost effectiveness than where you would donate if-accounting-for-funging.
The “community” is roughly the set of people who donate - or would donate - to the charities you are donating to.
I think you are mistaken on how Gift Aid / payroll giving works in the UK (your footnote 4), it only has an effect once you are a higher rate or additional rate taxpayer. I wrote some examples up here. As a basic rate taxpayer you don't get any benefit - only the charity does.
Thanks for the link to your post! I'm a bit confused about where I'm mistaken. I wanted to claim that:
(ignoring payroll giving or claiming money back from HMRC, as you discuss in yoir post) taking a salary cut (while at the 40% marginal tax rate) is more efficient (at getting money to your employer) than receiving taxed income than donating it (with gift aid) to your employer
Is this right?
My impression is that people within EA already defer too much in their donation choices and so should be spending more time thinking about how and where to give, what is being missed by Givewell/OP etc. Or defer some (large) proportion of their giving to EA causes but still have a small amount for personal choices.
Fair point. I think that because I'm somewhat more excited about one person doing a 100 hour investigation rather than 10 people doing 10 hour investigations and I would still push for people to enter small-medium sized a donor lotteries (which is arguably a form of deferral).
Ah gotcha, re: the pay cut thing then yes 100%, not least because employers also pay national insurance of 13.8%!
So your employer is paying 13.8%, then you are paying 40% income tax, and 2% employee national insurance.
And gift aid / payroll giving is pretty good, but not that good!
Epistemic status: I’ve done work suggesting that AI risk funders be spending at a higher rate, and I'm confident in this result. The other takes are less informed!
In principle I think the effective giving community could be in a situation where we should marginally be saving/investing more than we currently do (being ‘patient’).
However, I don’t think we’re in such a situation and in fact believe the opposite. My main crux is AI timelines; if I thought that AGI was less likely than not to arrive this century, then I would almost certainly believe that the community should marginally be spending less now.
I think patient philanthropy could be thought of as saying one of:
I don’t think we should call (1) patient philanthropy. Large funders (e.g. Open Philanthropy) already do some form of (1) by just not spending all their capital all this year. Doing (1) is instrumentally useful for the community and is necessary in any case where the community is not spending all of its capital this year.
I like (2) a lot more. This definition is relative to the community’s current spending rate and could be intuitively ‘impatient’. Throughout, I’ll use ‘patient’ to refer to (2): thinking the community’s current spending rate is too high (and so we do better by saving more now and spending later).
As an aside, thinking that the most ‘influential’ time ahead is not equivalent to being patient. Non-patient funders can also think this but believe their last spending this year goes further than in any other year.
A potential third definition could be something like “patience is spending 0 to ~2% per year” but I don’t think it is useful to discuss.
Of course, the large funders and the patient philanthropist may have different beliefs that lead them to disagree on the community’s optimal spending rate. If I believes one of the following, I’d like decrease my guess of the community’s optimal spending rate (and becoming more patient):
Since it seems likely that there are multiple points of disagreement leading to different spending rates, ‘‘patient philanthropy’ may be a useful term for the cluster of empirical beliefs that imply the community should be spending less. However, it seems better to be more specific about which particular beliefs are driving this the most.
For example “AI skeptical patient philanthropists” and “better-AI-opportunities-now patient philanthropists” may agree that the community’s current spending rate is too high, but disagree on the optimal (rate of) future spending.
Patient philanthropists can be considered as funders with a very high ‘bar’. That is, they will only spend down on opportunities better than x utils per $ and if none currently exist, they will wait.
Non-patient philanthropists also operate similarly but with a lower bar y<x. While the non-patient philanthropist has funds (and funds anything above x utils per dollar, including the opportunities that the patient philanthropist would otherwise fund) the patient philanthropist spends nothing. The patient philanthropist reasons that the counterfactual value of funding something the non-patient philanthropist would fund is zero and so chooses to save.
In this setup, the patient philanthropist is looking to fund and take credit for the ‘best’ opportunities and - while the large funder is around - the patient philanthropist is just funging with them. Once the large funder runs out of funds, the patient philanthropist’s funding is counterfactual.
If the large funder and patient philanthropist have differences in values or empirical beliefs, it is unsurprising they have different guesses of the optimal spending rate and 'bar'.
However, this should not happen with value and belief-aligned funders and patient philanthropists; if the funder is acting 'rationally' and spending at the optimal rate, then (by definition) there are no type-2 patient patient philanthropists that have the same beliefs.
There are some opportunities for trade between patient philanthropists and non-patient philanthropists, similar to how people can bet on AI timelines.
Let’s say Alice pledges to give $a/year from her income and thinks that the community should be spending more now. Let’s say that Bob thinks the community should be spending less and and saves $b/year from his income in order to give it away later. There’s likely an agreement possible (dependent on many factors) where they both benefit. A simple setup could involve:
This example closely follows similar setups suggested for betting on AI timelines.
Unless the amazing opportunity ofz utils/$ appears just after the large funder runs out of funds. Where ‘just after’ is the time that the large funder would have kept going with their existing spending strategy of funding everything about x utils/$ by using the patient philanthropist’s funds.
In this first comment, I stick with the explanations. In sub-comments, I'll give my own takes
We need the following ingredients
And finally, we need a choice of anthropic theory. On observation E (that you are your exact current observer moment), the anthropic differ by the likelihood P(E|W).
Bostrom gives the definition of something like
All other things equal, one should reason as if they are randomly selected from the set of all possible observer moments [in your reference class].
This can be formalised as
Heuristically we can describe SIA as updating towards worlds in direct proportion to the number of instances of "you" that there are.
Updating with SIA has the effect of
All other things equal, one should reason as if they are randomly selected from the set of all actually existent observer moments in their reference class
Heuristically we can describe SSA as updating towards worlds in direct proportion to how relatively common instances of "you" are in the reference class in the world.
This is a special case of SSA, where one takes the minimal reference class that contains the exact copies of your exact observer moment. That is, take Ri=Yi for every i. This means for any evidence you receive, ruling out worlds that do not contain a copy of you that has this same evidence.
Note that the formula for this update becomes |Yi|/|Yi| - for worlds where there is at least one copy of us, the likelihood is 1 and is otherwise 0.
Updating with SSA has the effect of
One could also easily write this in a continuous case
Strictly, this is the strong self-sampling assumption in Bostrom's original terminology (the difference being the use of observer moments, rather than observers)
One may choose not to do this, for example, by excluding simulated copies of oneself or Boltzmann brain copies of oneself.
The formal definition I gave for SSA is only for cases where the reference set is non-empty, so there's not something weird going on where we're deciding 0/0 to equal 0.
In SIA, being 'special' is being common and appearing often. In SSA, being 'special' is appearing often relative to other observers
Spoilers: using SIA with a decision theory that supposes you can 'control' all instances of you (e.g. evidential like theories, or functional-like theories) is Dutch-bookable. This is also the case for non-minimal reference class SSA with a decision theory that supposes you only control a single instance of you (e.g. causal decision theory).
I think we should reason in terms of decisions and not use anthropic updates or probabilities at all. This is what is argued for in Armstrong's Anthropic Decision Theory, which itself is a form of updateless decision theory.
In my mind, this resolves a lot of confusion around anthropic problems when they're reframed as decision problems.
I'd pick, in this order,
I choose this ordering because both minimal reference class SSA and SIA can give the 'best' decisions (ex-ante optimal ones) in anthropic problems, when paired with the right decision theory.
Minimal reference class SSA needs pairing with an evidential-like decision theory, or one that supposes you are making choices for all your copies. SIA needs pairing with a causal-like decision theory (or one that does not suppose your actions give evidence for, or directly control, the actions of your copies). Since I prefer the former set of decision theories, I prefer minimal reference class SSA to SIA.
Non-minimal reference class SSA, meanwhile, cannot be paired with any (standard) decision theory to get ex-ante optimal decisions in anthropic problems.
For more on this, I highly recommend Oesterheld & Conitzer's Can de se choice be ex ante reasonable in games of imperfect recall?
For example, the sleeping beauty problem or the absent-minded driver problem
Increasing/decreasing one's AGI timelines decrease/increase the importance  of non-AGI existential risks because there is more/less time for them to occur.
Further, as time passes and we get closer to AGI, the importance of non-AI x-risk decreases relative to AI x-risk. This is a particular case of the above claim.
but not necessarily tractability & neglectedness
If we think that nuclear/bio/climate/other work becomes irrelevant post-AGI, which seems very plausible to me
I've been building a model to calculate the optimal spending schedule on AGI safety and am looking for volunteers to run user experience testing.
Let me know via DM on the forum or email if you're interested :-)
The only requirements are (1) to be happy to call & share your screen for ~20 to ~60 minutes while you use the model (a Colab notebook which runs in your browser) and (2) some interest in AI safety strategy (but certainly no expertise necessary)
A diagram to show possible definitions of existential risks (x-risks) and suffering risks (s-risks)
The (expected) value & disvalue of the entire world’s past and future can be placed on the below axes (assuming both are finite).
By these definitions:
I highly recommend Nick Bostrom's working paper Base Camp for Mt. Ethics.Some excerpts on the idea of the cosmic host that I liked most:
34. At the highest level might be some normative structure established by what we may term the cosmic host. This refers to the entity or set of entities whose preferences and concordats dominate at the largest scale, i.e. that of the cosmos (by which I mean to include the multiverse and whatever else is contained in the totality of existence). It might conceivably consist of, for example, galactic civilizations, simulators, superintelligences, or a divine being or beings.
39. One might think that we could have no clue as to what the cosmic norms are, but in fact we can make at least some guesses: a. We should refrain from harming or disrespecting local instances of things that the cosmic host is likely to care about.b. We should facilitate positive-sum cooperation, and do our bit to uphold the cosmic normative order and nudge it in positive directions.c. We should contribute public goods to the cosmic resource pool, by securing resources and (later) placing them under the control of cosmic norms. Prevent xrisk and build AI?d. We should be modest, willing to listen and learn. We should not too headstrongly insist on having too much our way. Instead, we should be compliant, peace-loving, industrious, and humble vis-a-vis the cosmic host.
39. One might think that we could have no clue as to what the cosmic norms are, but in fact we can make at least some guesses:
a. We should refrain from harming or disrespecting local instances of things that the cosmic host is likely to care about.
b. We should facilitate positive-sum cooperation, and do our bit to uphold the cosmic normative order and nudge it in positive directions.
c. We should contribute public goods to the cosmic resource pool, by securing resources and (later) placing them under the control of cosmic norms. Prevent xrisk and build AI?
d. We should be modest, willing to listen and learn. We should not too headstrongly insist on having too much our way. Instead, we should be compliant, peace-loving, industrious, and humble vis-a-vis the cosmic host.
41. Maybe this could itself be part of an alignment goal: to build our AI such that it wants to be a good cosmic citizen and comply with celestial morality.a. We may also want it to cherish its parents and look after us in our old age. But a little might go a long way in that regard.
41. Maybe this could itself be part of an alignment goal: to build our AI such that it wants to be a good cosmic citizen and comply with celestial morality.
a. We may also want it to cherish its parents and look after us in our old age. But a little might go a long way in that regard.
Using goal factoring on tasks with ugh fields
Summary: Goal factor ugh tasks (listing the reasons for completing the task) and then generate multiple tasks that achieve each subgoal.
I sometimes am slow to reply to email and develop an ugh-field around doing it. Goal factoring "reply to the email" into
one can see that the first sub-goal may take some time (and maybe is the initial reason for not doing it straight away), the second sub-goal is easy! One can send an email saying you'll get back to them soon. [Of course, make sure you eventually fulfil the request, and potentially set a reminder to send a polite follow-up email if you're delayed longer!
Suppose you think only suffering counts* (absolute negative utilitarian), then the 'negative totalism' population axiology seems pretty reasonable to me.
The axiology does entail the 'Omela Conclusion' (OC), an analogue of the Repugnant Conclusion (RC), which states that for any state of affairs there is a better state in which a single life is hellish and everyone else's life is free from suffering. As a form of totalism, the axiology does not lead to an analogue of the sadistic conclusion and is non-anti-egalitarian.
The OC (supposing absolute negative utilitarianism) seems more palatable to me than the RC (supposing classical utilitarianism). I'm curious to what extent, if at all, this intuition is shared.
Further, given a (debatable) meta-intuition for robustness of one's ethical theory, does such a preference suggest one should update slightly towards absolute negative utilitarianism or vice versa?*or that individual utility is bounded above by 0
Most views in population ethics can entail weird/intuitively toxic conclusions (cf. the large number of'X conclusion's out there). Trying to weigh these up comparatively are fraught.In your comparison, it seems there's a straightforward dominance argument if the 'OC' and 'RC' are the things we should be paying attention to. Your archetypal classical utilitarian is also committed to the OC as 'large increase in suffering for one individual' can be outweighed by a large enough number of smaller decreases in suffering for others - aggregation still applies to negative numbers for classical utilitarians. So the negative view fares better as the classical one has to bite one extra bullet.There's also the worry in a pairwise comparison one might inadvertently pick a counterexample for one 'side' that turns the screws less than the counterexample for the other one. Most people find the 'very repugnant conclusion' (where not only Z > A, but 'large enough Z and some arbitrary number having awful lives > A') even more costly than the 'standard' RC. So using the more or less costly variant on one side of the scales may alter intuitive responses.By my lights, it seems better to have some procedure for picking and comparing cases which isolates the principle being evaluated. Ideally, the putative counterexamples share counterintuitive features both theories endorse, but differ in one is trying to explore the worst case that can be constructed which the principle would avoid, whilst the other the worst case that can be constructed with its inclusion.It seems the main engine of RC-like examples is the aggregation - it feels like one is being nickel-and-dimed taking a lot of very small things to outweigh one very large thing, even though the aggregate is much higher. The typical worry a negative view avoids is trading major suffering for sufficient amounts of minor happiness - most typically think this is priced too cheaply, particularly at extremes. The typical worry of the (absolute) negative view itself is it fails to price happiness at all - yet often we're inclined to say enduring some suffering (or accepting some risk of suffering) is a good deal at least at some extreme of 'upside'.So with this procedure the putative counter-example to the classical view would be the vRC. Although negative views may not give crisp recommendations against the RC (e.g. if we stipulate no one ever suffers in any of the worlds, but are more or less happy), its addition clearly recommends against the vRC: the great suffering isn't outweighed by the large amounts of relatively trivial happiness (but it would be on the classical view).Yet with this procedure, we can construct a much worse counterexample to the negative view than the OC - by my lights, far more intuitively toxic than the already costly vRC. (Owed to Carl Shulman). Suppose A is a vast but trivially-imperfect utopia - Trillions (or googleplexes, or TREE(TREE(3))) lives lives of all-but-perfect bliss, but for each enduring an episode of trivial discomfort or suffering (e.g. a pin-prick, waiting a queue for an hour). Suppose Z is a world with a (relatively) much smaller number of people (e.g. a billion) living like the child in Omelas. The negative view ranks Z > A: the negative view only considers the pinpricks in this utopia, and sufficiently huge magnitudes of these can worse than awful lives (the classical view, which wouldn't discount all the upside in A, would not). In general, this negative view can countenance any amount of awful suffering if this is the price to pay to abolish a near-utopia of sufficient size.(This axiology is also anti-egalitarian (consider replacing half the people in A with half the people in Z) and - depending how you litigate - susceptible to a sadistic conclusion. If the axiology claims welfare is capped above by 0, then there's never an option of adding positive welfare lives so nothing can be sadistic. If instead it discounts positive welfare, then it prefers (given half of A) adding half of Z (very negative welfare lives) to adding the other half of A (very positive lives)).I take this to make absolute negative utilitarianism (similar to average utilitarianism) a non-starter. In the same way folks look for a better articulation of egalitarian-esque commitments that make one (at least initially) sympathetic to average utilitarianism, so folks with negative-esque sympathies may look for better articulations of this commitment. One candidate could be what one is really interested in cases of severe rather than trivial suffering, so this rather than suffering in general should be the object of sole/lexically prior concern. (Obviously there are many other lines, and corresponding objections to each).But note this is an anti-aggregation move. Analogous ones are available for classical utilitarians to avoid the (v/)RC (e.g. a critical-level view which discounts positive welfare below some threshold). So if one is trying to evaluate a particular principle out of a set, it would be wise to aim for 'like-for-like': e.g. perhaps a 'negative plus a lexical threshold' view is more palatable than classical util, yet CLU would fare even better than either.
Thanks for such a detailed and insightful response Gregory.
Your archetypal classical utilitarian is also committed to the OC as 'large increase in suffering for one individual' can be outweighed by a large enough number of smaller decreases in suffering for others - aggregation still applies to negative numbers for classical utilitarians. So the negative view fares better as the classical one has to bite one extra bullet.
Thanks for pointing this out. I think I realised this extra bullet biting after making the post.
There's also the worry in a pairwise comparison one might inadvertently pick a counterexample for one 'side' that turns the screws less than the counterexample for the other one.
This makes a lot of sense, and not something I’d considered at all and seems pretty important when playing counterxample-intuition-tennis.
By my lights, it seems better to have some procedure for picking and comparing cases which isolates the principle being evaluated. Ideally, the putative counterexamples share counterintuitive features both theories endorse, but differ in one is trying to explore the worst case that can be constructed which the principle would avoid, whilst the other the worst case that can be constructed with its inclusion.
Again, this feels really useful and something I want to think about further.
The typical worry of the (absolute) negative view itself is it fails to price happiness at all - yet often we're inclined to say enduring some suffering (or accepting some risk of suffering) is a good deal at least at some extreme of 'upside'.
I think my slight negative intuition comes from that fact that although I may be willing to endure some suffering for some upside, I wouldn’t endorse inflicting suffering (or risk or suffering) on person A for some upside for person B. I don't know how much work the differences of fairness personal identity (i.e. the being that suffered gets the upside) between the examples are doing, and it what direction my intuition is 'less' biased.
Yet with this procedure, we can construct a much worse counterexample to the negative view than the OC - by my lights, far more intuitively toxic than the already costly vRC. (Owed to Carl Shulman). Suppose A is a vast but trivially-imperfect utopia - Trillions (or googleplexes, or TREE(TREE(3))) lives lives of all-but-perfect bliss, but for each enduring an episode of trivial discomfort or suffering (e.g. a pin-prick, waiting a queue for an hour). Suppose Z is a world with a (relatively) much smaller number of people (e.g. a billion) living like the child in Omelas
I like this example a lot! and definitely lean A > Z.
Reframing the situation, and my intuition becomes less clear: considering A’, in which TREE(TREE(3))) lives are in perfect bliss, but there are also TREE(TREE(3))) beings that monetarily experience a single pinprick before ceasing to to exist. This is clearly equivalent to A in the axiology but my intuition is less clear (if at all) that A’ > Z. As above, I’m unsure how much work personal identity is doing. In my mind, I find population ethics easier to think about by considering ‘experienced moments’ rather than individuals.
(This axiology is also anti-egalitarian (consider replacing half the people in A with half the people in Z) ...
Thanks for pointing out the error. I think think I’m right in saying that the ‘welfare capped by 0’ axiology is non-anti-egalitarian, which I conflated with absolute NU in my post (which is anti-egalitarian as you say). The axiologies are much more distinct than I originally thought.