I'm not sure but I think maybe I also have a different view than you on what problems are going to be bottlenecks to AI development. e.g. I think there's a big chance that the world would steam ahead even if we don't solve any of the current (non-philosophical) problems in alignment (interpretability, shutdownability, reward hacking, etc.).
try to make them "more legible" to others, including AI researchers, key decision makers, and the public
Yes, I agree this is valuable, though I think it's valuable mainly because it increases the probability that people use future AIs to solve these problems, rather than because it will make people slow down AI development or try very hard to solve them pre-TAI.
I don't think philosophical difficulty is that much of an increase to the difficulty of alignment, mainly because I think that AI developers should (and likely will) aim to make AIs corrigible assistants rather than agents with their own philosophical views that they try to impose on the world. And I think it's fairly likely that we can use these assistants (if we succeed in getting them and aren't disempowered by a misaligned AI instead) to help a lot with these hard philosophical questions.
I didn't meant to imply that Wei Dai was overrating the problems' importance. I agree they're very important! I was making the case that they're also very intractable.
If I thought solving these problems pre-TAI would be a big increase to the EV of the future, I'd take their difficulty to be a(nother) reason to slow down AI development. But I think I'm more optimistic than you and Wei Dai about waiting until we have smart AIs to help us on these problems.
I'm a philosopher who's switched to working on AI safety full-time. I also know there are at least a few philosophers at Anthropic working on alignment.
With regards to your Problems in AI Alignment that philosophers could potentially contribute to:
Whether longtermism is a crux will depend on what we mean by 'long,' but I think concern for future people is a crux for x-risk reduction. If future people don't matter, then working on global health or animal welfare is the more effective way to improve the world. The more optimistic of the calculations that Carl and I do suggest that, by funding x-risk reduction, we can save a present person's life for about $9,000 in expectation. But we could save about 2 present people if we spent that money on malaria prevention, or we could mitigate the suffering of ...
Oops yes, fundamentals between my and Bruce's cases are very similar. Should have read Bruce's comment!
The claim we're discussing - about the possibility of small steps of various kinds - sounds kinda like a claim that gets called 'Finite Fine-Grainedness'/'Small Steps' in the population axiology literature. It seems hard to convincingly argue for, so in this paper I present a problem for lexical views that doesn't depend on it. I sort of gestured at it above with the point about risk without making it super precise. The one-line summary is that expected welfare levels are finitely fine-grained.
Oh yep nice point, though note that - e.g. - there are uncountably many reals between 1,000,000 and 1,000,001 and yet it still seems correct (at least talking loosely) to say that 1,000,001 is only a tiny bit bigger than 1,000,000.
But in any case, we can modify the argument to say that S* feels only a tiny bit worse than S. Or instead we can modify it so that S is degrees celsius of a fire that causes suffering that just about can be outweighed, and S* is degrees celsius of a fire that causes suffering that just about can't be outweighed.
Nice post! Here's an argument that extreme suffering can always be outweighed.
Suppose you have a choice between:
(S+G): The most intense suffering S that can be outweighed, plus a population that's good enough to outweigh it G, so that S+G is good overall: better than an empty population.
(S*+nG): The least intense suffering S* that can't be outweighed, plus a population that's n times better than the good population G.
If extreme suffering can't be outweighed, we're required to choose S+G over S*+nG, no matter how big n is. But that seems implausible. S* is ...
Note also that you can accept outweighability and still believe that extreme suffering is really bad. You could - e.g. - think that 1 second of a cluster headache can only be outweighed by trillions upon trillions of years of bliss. That would give you all the same practical implications without the theoretical trouble.
+1 to this, this echoes some earlier discussion we've had privately and I think it would be interesting to see it fleshed out more, if your current view is to reject outweighability in theory
More importantly I think this points to a potentia...
Nice point, but I think it comes at a serious cost.
To see how, consider a different case. In X, ten billion people live awful lives. In Y, those same ten billion people live wonderful lives. Clearly, Y is much better than X.
Now consider instead Y* which is exactly the same as Y except that we also add one extra person, also with a wonderful life. As before, Y* is much better than X for the original ten billion people. If we say that the value of adding the extra person is undefined and that this undefined value renders the value of the whole change f...
It's an unfortunate naming clash, there are different ARC Challenges:
ARC-AGI (Chollet et al) - https://github.com/fchollet/ARC-AGI
ARC (AI2 Reasoning Challenge) - https://allenai.org/data/arc
These benchmarks are reporting the second of the two.
LLMs (at least without scaffolding) still do badly on ARC, and I'd wager Llama 405B still doesn't do well on the ARC-AGI challenge, and it's telling that all the big labs release the 95%+ number they get on AI2-ARC, and not whatever default result they get with ARC-AGI...
(Or in general, reporting benchmarks where they...
You should read the post! Section 4.1.1 makes the move that you suggest (rescuing PAVs by de-emphasising axiology). Section 5 then presents arguments against PAVs that don't appeal to axiology.
I think my objections still work if we 'go anonymous' and remove direct information about personal identity across different options. We just need to add some extra detail. Let the new version of One-Shot Non-Identity be as follows. You have a choice between: (1) combining some pair of gametes A, which will eventually result in the existence of a person with welfare 1, and (2) combining some other pair of gametes B, which will eventually result in the existence of a person with welfare 100.
The new version of Expanded Non-Identity is then the same as ...
Here's my understanding of the dialectic here:
Me: Some wide views make the permissibility of pulling both levers depend on whether the levers are lashed together. That seems implausible. It shouldn't matter whether we can pull the levers one after the other.
Interlocutor: But lever-lashing doesn't just affect whether we can pull the levers one after the other. It also affects what options are available. In particular, lever-lashing removes the option to create both Amy and Bobby, and removes the option to create neither Amy nor Bobby. So if a wide view has ...
In Parfit's case, we have a good explanation for why you're rationally required to bind yourself: doing so is best for you.
Perhaps you're morally required to bind yourself in Two-Shot Non-Identity, but why? Binding yourself isn't better for Amy. And if it's better for Bobby, it seems that can only be because existing is better for Bobby than not-existing, and then there's pressure to conclude that we're required to create Bobby in Just Bobby, contrary to the claims of PAVs.
And suppose that (for whatever reason) you can't bind yourself in Two-Shot Non-Ident...
Yes, nice points. If one is committed to contingent people not counting, then one has to say that C is worse than B. But it still seems to me like an implausible verdict, especially if one of B and C is going to be chosen (and hence those contingent people are going to become actual).
It seems like the resulting view also runs into problems of sequential choice. If B is best out of {A, B, C}, but C is best out of {B, C}, then perhaps what you're required to do is initially choose B and then (once A is no longer available) later switch to C, even if doing so is costly. And that seems like a bad feature of a view, since you could have costlessly chosen C in your first choice.
Taken as an argument that B isn't better than A, this response doesn't seem so plausible to me. In favour of B being better than A, we can point out: B is better than A for all of the necessary people, and pretty good for all the non-necessary people. Against B being better than A, we can say something like: I'd regret picking B over C. The former rationale seems more convincing to me, especially since it seems like you could also make a more direct, regret-based case for B being better than A: I'd regret picking A over B.
But taken as an argument that A is permissible, this response seems more plausible. Then I'd want to appeal to my arguments against deontic PAVs.
Yes, nice point. I argue against this kind of dependence in footnote 16 of the paper. Here's what I say there:
...Here’s a possible reply, courtesy of Olle Risberg. What we’re permitted to do depends on lever-lashing, but not because lever-lashing precludes pulling the levers one after the other. Instead, it’s because lever-lashing removes the option to create both Amy and Bobby, and removes the option to create neither Amy nor Bobby. If we have the option to create both and the option to create neither, then creating just Amy is permissible. If we don’t have
I'm quite surprised that superforecasters predict nuclear extinction is 7.4 times more likely than engineered pandemic extinction, given that (as you suggest) EA predictions usually go the other way. Do you know if this is discussed in the paper? I had a look around and couldn't find any discussion.
That all sounds approximately right but I'm struggling to see how it bears on this point:
If we want expected-utility-maximisation to rule anything out, we need to say something about the objects of the agent's preference. And once we do that, we can observe violations of Completeness.
Can you explain?
The only thing that matters is whether the agent's resulting behaviour can be coherently described as maximising a utility function.
If you're only concerned with externals, all behaviour can be interpreted as maximising a utility function. Consider an example: an agent pays $1 to trade vanilla for strawberry, $1 to trade strawberry for chocolate, and $1 to trade chocolate for vanilla. Considering only externals, can this agent be represented as an expected utility maximiser? Yes. We can say that the agent's preferences are defined over entire histories of ...
I think this paper is missing an important distinction between evolutionarily altruistic behaviour and functionally altruistic behaviour.
These two forms of behaviour can come apart.
A parent's care for their child is often functionally altruistic but evolutionarily selfish: it is motivated by an intrinsic concern for the child'...
I wouldn't call a small policy like that 'democratically unacceptable' either. I guess the key thing is whether a policy goes significantly beyond citizens' willingness to pay not only by a large factor but also by a large absolute value. It seems likely to be the latter kinds of policies that couldn't be adopted and maintained by a democratic government, in which case it's those policies that qualify as democratically unacceptable on our definition.
suggests that we are not too far apart.
Yes, I think so!
I guess this shows that the case won't get through with the conservative rounding off that you applied here, so future developments of this CBA would want to go straight for the more precise approximations in order to secure a higher evaluation.
And thanks again for making this point (and to weeatquince as well). I've written a new paragraph emphasising a more reasonable, less conservative estimate of benefit-cost ratios. I expect it'll probably go in the final draft, and I'll edit the post here to incl...
Thanks for this! All extremely helpful info.
Naively a benefit cost ratio of >1 to 1 suggests that a project is worth funding. However given the overhead costs of government policy, to governments propensity to make even cost effective projects go wrong and public preferences for money in hand it may be more appropriate to apply a higher bar for cost-effective government spending. I remember I used to have a 3 to 1 ratio, perhaps picked up when I worked in Government although I cannot find a source for this now.
This is good to know. Our BCR of 1.6 is bas...
Though I agree that refuges would not pass a CBA, I don't think they are an example of something that would be extreme cost to those alive today-I suspect significant value could be obtained with $1 billion.
I think this is right. Our claim is that a strong longtermist policy as a whole would place extreme burdens on the present generation. We expect that a strong longtermist policy would call for particularly extensive refuges (and lots of them) as well as the other things that we mention in that paragraph.
...We also focus on the risk of global catastrophes,
Maybe an obvious point, but I think we shouldn't lose sight of the importance of providing EA funding for catastrophe-preventing interventions, alongside attempts to influence government. Attempts to influence government may fail / fall short of what is needed / take too long given the urgency of action.
Yep, agreed!
Should we just get on with developing refuges ourselves?
My impression is that this is being explored. See, e.g., here.
Second, the argument overshoots.
The argument we mean to refer to here is the one that we call the ‘best-known argument’ elsewhere: the one that says that the non-existence of future generations would be an overwhelming moral loss because the expected future population is enormous, the lives of future people are good in expectation, and it is better if the future contains more good lives. We think that this argument is liable to overshoot.
I agree that there are other compelling longtermist arguments that don’t overshoot. But my concern is that governments c...
But CBA cares about marginal cost effectiveness and presumably the package can be broken into chunks of differing ex-ante cost-effectiveness (e.g. by intervention type, or by tranches of funding in each intervention). Indeed you suggest this later in the piece. Since the average only just meets the bar, if there is much variation, the marginal work won’t meet the bar, so government funding would cap out at something less than this, perhaps substantially so.
Yes, this is an important point. If we were to do a more detailed cost-benefit analysis of catastroph...
On the first, I think we should use both traditional CBA justifications as well as longtermist considerations
I agree with this. What we’re arguing for is a criterion: governments should fund all those catastrophe-preventing interventions that clear the bar set by cost-benefit analysis and altruistic willingness to pay. One justification for funding these interventions is the justification provided by CBA itself, but it need not be the only one. If longtermist justifications help us get to the place where all the catastrophe-preventing interventions that cl...
at other times you seem to trade on the idea that there is something democratically tainted about political advocacy on behalf of the people of the future — this is something I strongly reject.
I reject that too. We don’t mean to suggest that there is anything democratically tainted about that kind of advocacy. Indeed, we say that longtermists should advocate on behalf of future generations, in order to increase the present generation’s altruistic willingness to pay for benefits to future generations.
What we think would be democratically unacceptable is gov...
Thanks, these comments are great! I'm planning to work through them later this week.
I agree with pretty much all of your bulletpoints. With regards to the last one, we didn't mean to suggest that arguing for greater concern about existential risks is undemocratic. Instead, we meant to suggest that (in the world as it is today) it would be undemocratic for governments to implement polices that place heavy burdens on the present generation for the sake of small reductions in existential risk.
Thanks for the comment!
There are clear moral objections against pursuing democratically unacceptable policies
What we mean with this sentence is that there are clear moral objections against governments pursuing [perhaps we should have said 'instituting'] democratically unacceptable policies. We don't mean to suggest that there's anything wrong with citizens advocating for policies that are currently democratically unacceptable with the aim of making them democratically acceptable.
Great post!
For example, maybe, according to you, you’re an “all men are created equal” type. That is, you treat all men equally. Maybe you even write a fancy document about this, and this document gets involved in the founding of a country, or something.
There’s a thing philosophy can do, here, which is to notice that you still own slaves. Including: male slaves. And it can do that whole “implication” thing, about how, Socrates is a man, you treat all men equally, therefore you treat Socrates equally, except oh wait, you don’t, he’s your slave.
Charles Mills...
I think all of these objections would be excellent if I were arguing against this claim:
But I’m arguing against this claim:
And given that, I think your objections miss the mark.
On your first point, I’m prepared to grant that agents have no reason to rule out option A- at node 2. All I need to claim is that advanced artificial agents might rule out option A- at node 2. And I think my argument makes that...
There's a complication here related to a point that Rohin makes : if we can only see an agent's decisions and we know nothing about its preferences, all behavior can be rationalized as EU maximization.
But suppose we set that complication aside. Suppose we know this about an agent's preferences:
Then we can observe violations of Completeness. Suppose that we first offer our agent a choice between A and some other option B, and that the agent chooses A. Then we give the agent the chance to ...
Nice point. The rough answer is 'Yes, but only once the agent has turned down a sufficiently wide array of options.' Depending on the details, that might never happen or only happen after a very long time.
I've had a quick think about the more precise answer, and I think it is:
I said a little in another thread. If we get aligned AI, I think it'll likely be a corrigible assistant that doesn't have its own philosophical views that it wants to act on. And then we can use these assistants to help us solve philosophical problems. I'm imagining in particular that these AIs could be very good at mapping logical space, tracing all the implications of various views, etc. So you could ask a question and receive a response like: 'Here are the different views on this question. Here's why they're mutually exclusive and jointly exhaustive. He... (read more)