Good idea, I reposted the article itself here: https://forum.effectivealtruism.org/posts/GyenLpfzRKK3wBPyA/the-simple-case-for-ai-catastrophe-in-four-steps
I've been trying to keep the "meta" and the main posts mostly separate so hopefully the discussions for the metas and the main posts aren't as close together.
The bets I've seen you post seem rather disadvantageous to the other side, and I believed so at the time. Which is fine/good business from your perspective given that you managed to find takers. But it means I'm more pessimistic on finding good deals by both of our lights.
Here's my current four-point argument for AI risk/danger from misaligned AIs.
Request for feedback: I'm curious whether there are points that people think I'm critically missing, and/or ways that these arguments would not be convincing to "normal people." Original goal.
What are people's favorite arguments/articles/essays trying to lay out the simplest possible case for AI risk/danger?
Every single argument for AI danger/risk/safety I’ve seen seems to overcomplicate things. Either they have too many extraneous details, or they appeal to overly complex analogies, or they seem to spend much of their time responding to insider debates.
I might want to try my hand at writing the simplest possible argument that is still rigorous and clear, without being trapped by common pitfalls. To do that, I want to quickly survey the field so I can learn from the best existing work as well as avoid the mistakes they make.
I often see people advocate others sacrifice their souls. People often justify lying, political violence, coverups of “your side’s” crimes and misdeeds, or professional misconduct of government officials and journalists, because their cause is sufficiently True and Just. I’m overall skeptical of this entire class of arguments.
This is not because I intrinsically value “clean hands” or seeming good over actual good outcomes. Nor is it because I have a sort of magical thinking common in movies, where things miraculously work out well if you just ignore tradeoffs.
Rather, it’s because I think the empirical consequences of deception, violence, criminal activity, and other norm violations are often (not always) quite bad, and people aren’t smart or wise enough to tell the exceptions apart from the general case, especially when they’re ideologically and emotionally compromised, as is often the case.
Instead, I think it often helps to be interpersonally nice, conduct yourself with honor, and overall be true to your internal and/or society-wide notions of ethics and integrity.
I’m especially skeptical of galaxy-brained positions where to be a hard-nosed consequentialist or whatever, you are supposed to do a specific and concrete Hard Thing (usually involving harming innocents) to achieve some large, underspecified, and far-off positive outcome.
I think it's like those thought experiments about torturing a terrorist (or a terrorist's child) to find the location of the a ticking nuclear bomb under Manhattan where somehow you know the torture would do it.
I mean, sure, if presented that way I'd think it's a good idea but has anybody here checked the literature on the reliability of evidence extracted under torture? Is that really the most effective interrogation technique?
So many people seem eager to rush to sell their souls, without first checking to see if the Devil’s willing to fulfill his end of the bargain.
Thanks! I agree the math isn't exactly right. The point about x^2 on the rationals is especially sharp.
The problem with calling it "the paradox of the heap" is to make it sound like an actual paradox, instead of a trivially easy connection re:tipping points. I wish I had a better terminology/phrase for the connection I want to make.
I like Scott's Mistake Theory vs Conflict Theory framing, but I don't think this is a complete model of disagreements about policy, nor do I think the complete models of disagreement will look like more advanced versions of Mistake Theory + Conflict Theory.
To recap, here's my short summaries of the two theories:
Mistake Theory: I disagree with you because one or both of us are wrong about what we want, or how to achieve what we want)
Conflict Theory: I disagree with you because ultimately I want different things from you. The Marxists, who Scott was originally arguing against, will natively see this as about individual or class material interests but this can be smoothly updated to include values and ideological conflict as well.
I polled several people about alternative models for political disagreement at the same level of abstraction of Conflict vs Mistake, and people usually got to "some combination of mistakes and conflicts." To that obvious model, I want to add two other theories (this list is incomplete).
Consider Thomas Schelling's 1960 opening to Strategy of Conflict
I claim that this "rudimentary/obvious idea," that the conflict/cooperative elements of many human disagreements is structurally inseparable, is central to a secret third thing distinct from Conflict vs Mistake. If you grok the "obvious idea," we can derive something like
Negotiation Theory(?): I have my desires. You have yours. I sometimes want to cooperate with you, and sometimes not. I take actions maximally good for my goals and respect you well enough to assume that you will do the same; however in practice a "hot war" is unlikely to be in either of our best interests.
In the Negotiation Theory framing, disagreement/conflict arises from dividing the goods in non-zerosum games. I think the economists/game theorists' "standard models" of negotiation theory is natively closer to "conflict theory" than "mistake theory." (eg, often their models assume rationality, which means the "can't agree to disagree" theorems apply). So disagreements are due to different interests, rather than different knowledge. But unlike Marxist/naive conflict theory, we see that conflicts are far from desired or inevitable, and usually there are better trade deals from both parties' lights than not coordinating, or war.
(Failures from Negotiation Theory's perspectives often centrally look like coordination failures, though the theory is broader than that and includes not being able to make peace with adversaries)/
Another framing that is in some ways a synthesis and in some ways a different view altogether that can be simultaneously true with any of the previous theories:
Motivated Cognition: People disagree because their interests shape their beliefs. Political disagreements happen because one or both parties are mistaken about the facts, and those mistakes are downstream of material or ideological interests shading one's biases. Upton Sinclair: “It is difficult to get a man to understand something, when his salary depends on his not understanding it.”
Note the word "difficult," not impossible. This is Sinclair's view and I think it's correct. Getting people to believe (true) things against their material interests to believe is possible but the skill level required is higher than a neutral presentation of the facts to a neutral third party.
Interestingly, the Motivated Cognition framing suggests that there might not be a complete truth of the matter for whether "Mistake Theory" vs "Conflict Theory" vs Negotiation Theory is more correct for a given political disagreement. Instead, your preferred framing has a viewpoint-dependent and normative element to it.
Suppose your objective is just to get a specific policy passed (no meta-level preferences like altruism), and you believe this policy is in your interests and those of many others, and people who oppose you are factually wrong.
Someone who's suited to explanations like Scott (or like me?), might naturally fall into a Mistake Theory framing, and write clear-headed blogposts about why people who disagree with you are wrong. If the Motivated Cognition theory is correct, most people are at least somewhat sincere, and at some level of sufficiently high level of simplicity, people can update to agree with you even if it's not immediately in their interests (smart people in democracies usually don't believe 2+2=5 even in situations where it'd be advantageous for them to do so)
Someone who's good at negotiations and cooperative politics might more naturally adopt a Negotiation Theory framing, and come up with a deal that gets everybody (or enough people) what they want while having their preferred policy passed.
Finally, someone who's good at (or temperamentally suited to) non-cooperative politics and the more Machiavellian side of politics might identify the people who are most likely to oppose their preferred policies, and destroy their political influence enough that the preferred policy gets passed.
Anyway, here are my four models of political disagreement (Mistake, Conflict, Negotiation, Motivated Cognition). I definitely don't think these four models (or linear combinations of them) explain all disagreements, or are the only good frames for thinking of disagreement. Excited to hear alternatives [1]!
[1] In particular I'm wondering if there is a distinct case for ideological/memetic theories that operate along a similar level of abstraction as the existing theories, as opposed to thinking of ideologies as primarily given us different goals (which would make them slot in well with all the existing theories except maybe Mistake Theory).