In this post:
- Why is it beneficial for people to be nice to weaker agents (i.e. other people and non-human agents), in certain circumstances?
Problem
Let's consider the following situation.
Imagine two alien civilizations: civilization A and civilization B.
Civilization B discovers civilization A.
Civilization B is significantly stronger than civilization A.
The civilizations have conflicting interests.
But civilization B is stronger, so they can get what they want.
Civilization B faces the following choice: they can either act selfishly and do what is best for them, or act altruistically and do what is the best for the collective happiness of civilizations A and B.
The payoff table looks like this:
| B acts altruistically | B acts selfishly | |
|---|---|---|
| A payoff | 2 | 3 |
| B payoff | 2 | 0 |
"Payoff" represents how good the outcome is for the given civilization.
If civilization B chooses to act selfishly, they will benefit from that slightly (3 instead of 2), but civilization A will pay a larger cost (0 instead of 2).
Civilization B chooses to act selfishly. Because civilization A is weaker and they can't retaliate in any way against the stronger civilization. Civilization A ends up harmed by civilization B.
Now, there is civilization C that discovers civilization B.
Civilization C is even stronger than civilization B.
The civilizations have conflicting interests.
Civilization C faces the same choice: they can either act selfishly and do what is best for them, or act altruistically and do what is the best for the collective happiness of civilization B and C.
Civilization C chooses what is best for them. Civilization B ends up harmed by civilization C.
But they already know that one day, they will be harmed by the next stronger civilization.
Now, civilization D comes that is even stronger than civilization C. The story repeats - civilization C ends up being harmed by civilization D.
Now, civilization E comes and the story repeats. Then, next civilization comes and next and next and so on...
Of course, it doesn't always have to be the case that the civilizations have conflicting interests. Their interests might be perfectly aligned. But we assume that this is not the case.
Now, in that universe, every civilization acted selfishly towards their weaker civilizations. Every civilization (except maybe the strongest one) would be better off in an alternative universe where civilizations act altruistically towards a weaker one. But it's not in the interest of any of the civilizations to act altruistically towards the weaker civilization. Because the weaker civilization can't retaliate.
The question is:
How should the civilizations act in that situation?
Strategy
Let's consider that situation from game theory perspective.
I will first describe it using game theory terminology, and later I will describe it simpler terminology, so if you don't know game theory terminology, then stay with me.
We assume here that the probability that there will be next civilization is always very high - each civilization strongly expects that there will be some stronger civilization than them.
From the game theory perspective, there are two sub-game perfect equilibria in that game:
- Everyone acts according to strategy: always act selfishly.
- Everyone acts according to strategy:
- Treat the weaker civilization at least as well as the weaker civilization treated even weaker civilization. If they acted altruistically towards their weaker civilization, then act altruistically towards them. If they didn't, then feel free to act selfishly towards them.
- And if you are the second civilization, then act altruistically towards the first civilization.
The second assignment of strategies is an equilibrium because if a civilization chooses to act selfishly, then the stronger civilization will act selfishly towards them, so they are better off keeping that strategy.
It's also subgame-perfect equilibrium, because after any history of actions, the above statement still remains true.
The second equilibrium results with higher payoff for each player than the first one (the second equilibrium is pareto-optimal).
Put simply, if everyone follows the second strategy, they will be better off, and nobody has interest in diverging from that strategy because they will be penalized, if they do.
Textbook game theory suggests that if agents are in certain equilibrium, then they will stay in that equilibrium because no agent has interest in deviating from their strategy.
However, more advanced game theory frameworks like Reputation Theory or Bayesian Fictitious Play suggest that if enough agents change their strategy, then the other agents will adapt their strategy and change it too. For that reason, it's most likely that people will move to the equilibrium that is collectively better.
When that happens, people who initially followed the strategy of treating weaker agents nicely as long as their treat their weaker nicely will end up being better off. Therefore, it's beneficial for people to subscribe to that strategy today.
There are some questions remaining like: who should we consider "an agent" and how much we should value their utility.
It's also important to note that the above reasoning applies only if the situation of uncertainty - the agent must not be certain that there is nobody stronger than them, and their stronger agent must meet the same condition.
Conclusion
Assuming that there is uncertainty if an agent will interact with an agent that is stronger than them (which is true in our world) and assuming that agents will follow the strategies from Pareto-optimal subgame perfect equilibrium (which should happen in our world at some point), agents have selfish incentive to be good to weaker agents. Many people think that if someone has nothing to offer to them and they are weaker and they can't retaliate, then it's in their selfish interest to exploit them. But that is not true, because it might be in the interest of a stronger agent to treat you at least as well as you treated a weaker agent than yourself.

I've explored very similar ideas before in things like this simulation based on the Iterated Prisoner's Dilemma but with Death, Asymmetric Power, and Aggressor Reputation. Long story short, the cooperative strategies do generally outlast the aggressive ones in the long run. It's also an idea I've tried to discuss (albeit less rigorously) before as The Alpha Omega Theorem and Superrational Signalling. The first of those was from 2017 and got downvoted to oblivion, while the second was probably too long-winded and got mostly ignored.
There are a bunch of random people like James Miller and A.V. Turchin and Ryo who have had similar ideas that can broadly be categorized under Bostrom's concept of Anthropic Capture, or Game Theoretic Alignment, or possibly a subset of Agent Foundations. The ideas are mostly not taken very seriously by the greater LW and EA communities, so I'd be prepared for a similar reception.