Being nicer than Clippy

Joe_Carlsmith

Being nicer than Clippy

Comments 3

Sorted by

New & upvoted

Vasco Grilo🔸

Thanks for the post, Joe. Relatedly, readers may want to check Brian Tomasik's posts on cooperation and peace.

SummaryBot

Executive summary: The post argues that human values related to "niceness," boundaries, and liberalism are importantly different from the values of a "paperclip maximizer" AI, and suggests incorporating those human values into how we think about relating to AIs with different values.

Key points:

Unlike a paperclip maximizer AI that just cares about making paperclips, human values incorporate notions of "niceness" like respecting others' autonomy and not violently overthrowing them even if they have different values.
Concepts from political liberalism around tolerance, diversity, and respecting individual rights and boundaries are also relevant to how humans should ideally interact with AIs with different values.
These human values likely have practical benefits too in terms of cooperation, attractiveness of society, etc. that are worth preserving with AIs.
However, some minimal versions of liberalism may not guarantee a flourishing future, so we still need to empower agents who care about human values like love and beauty.

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

anormative

But the AIs-with-different-values – even: the cooperative, nice, liberal-norm-abiding ones – might not even be sentient! Rather, they might be mere empty machines. Should you still tolerate/respect/etc them, then?"

The flavor of discussion here on AI sentience that follows what I've quoted above always reminds me of, and I think is remarkably similar to, the content of this scene from the Star Trek: The Next Generation episode "The Measure of a Man." It's a courtroom drama-style scene in which Data, an android, is threatened by a scientists who wants to make copies of Data and argues he's property of the Federation. Patrick Stewart, playing Jean Luc-Picard, defending Data, makes an argument similar to Joe's.

You see, he's met two of your three criteria for sentience, so what if he meets the third. Consciousness in even the smallest degree. What is he then? I don't know. Do you? (to Riker) Do you? (to Phillipa) Do you? Well, that's the question you have to answer. Your Honour, the courtroom is a crucible. In it we burn away irrelevancies until we are left with a pure product, the truth for all time. Now, sooner or later, this man or others like him will succeed in replicating Commander Data. And the decision you reach here today will determine how we will regard this creation of our genius. It will reveal the kind of a people we are, what he is destined to be. It will reach far beyond this courtroom and this one android. It could significantly redefine the boundaries of personal liberty and freedom, expanding them for some, savagely curtailing them for others. Are you prepared to condemn him and all who come after him to servitude and slavery?

Comments

More from the author

137

Leaving Open Philanthropy, going to Anthropic

Joe_Carlsmith·8mo ago·22m read

Fake thinking and real thinking

Joe_Carlsmith·1y ago·Curated 1y ago·46m read

239

Killing the ants

Joe_Carlsmith·5y ago·9m read

Curated and popular this week

Hard-to-reverse decisions destroy option value

Stefan_Schubert·9y ago·Curated 1s ago·14m read

This post is co-authored with Ben Garfinkel. It is cross-posted from the CEA blog. A PDF version can be found here. Summary: Some strategic decisions available to the effective altruism m...

How (not) to fundraise from Anthropic staff

Jack Lewars·1w ago·7m read

Adapted from my Substack, Funding Anthropalypse. Short version: if you want a share of the coming Anthropic and OpenAI windfall - the $37bn+ that could be in play next year - the way in is to become 'legibly excellent', so the evaluators and donors that frontier lab staff already trust point them to yo...

If you're agentic, work in biosecurity

sharmaayushmaan🔸·4d ago·7m read

Disclaimer: Although I work on the Groups Team at CEA, I’m writing this in a personal capacity, and this post does not constitute an endorsement by CEA. Agency - the realisation that you really can just do things. TL;DR Biosecurity needs people (of any background) who are agentic and have a high execution velocity and track record....

Recent opportunities to take action

Marginal Victories: career advising and opportunities for U.S. democracy preservation & political work

Annika Burman 🔸·2d ago·2m read

I'm stepping down as Hive's Executive Director, and we're hiring my successor

SofiaBalderson, Hive·3d ago·3m read

Starting an EA group @ SUNY Binghamton

micahzarin·1d ago·1m read

In particular, vibes related to the "fragility of value," "extremal Goodhardt," and "the tails come apart." ↩︎
Though in fairness, forms of "threshold deontology" that introduce constraints that can only be violated if the stakes are high enough – e.g., you can only push the fat man if it will save x lives, where x is quite a bit larger than utilitarianism would suggest – face this issue, too. E.g., the onium at stake can quickly become more-than-x. Thanks to Will MacAskill for discussion here. ↩︎
See here for some debate. Part of my argument, in this essay, is that we should not do the "teach the aliens the value of friendship" thing that Soares seems to endorse here. ↩︎
Though: I don't think it disappears. ↩︎
Remember: caring about an agent's preferences is conceptually distinct from caring about her welfare. ↩︎
And I think we should be open to doing this even if they aren't sentient – more below. ↩︎
Hanson's critique of the alignment discourse emphasizes the distinction. ↩︎
As a maybe-clearer example: if a team of five people breaks into your house trying to kill you, you can kill all of them if necessary to save yourself. But if you are on the way to the hospital and the only way to save yourself is to run over five people on the road, you aren't permitted to do it. ↩︎
Though note that we're creating them – and doing so, in the AI risk story, without adequate care to avoid the relevant sorts of aggressions, for the sake of other not-always-fully-laudatory motives. This complicates the moral narrative. ↩︎
Maybe something about "consequentialism" in AIs-that-get-things-done is to blame? But even if you add in deontological constraints, Yudkowsky (as I understand him) predicts that the AIs will simply pursue the "nearest unblocked neighbor" of those constraints. ↩︎
Though: human society today often also puts adequate hard power behinds its walls, given the current attempted-invasions. And let's keep it that way, even as the invasions get oomphier. ↩︎
Thanks to Howie Lempel for discussion of this point. ↩︎
We can wonder why the existing political order lets this happen, but let's set this aside for now. ↩︎
Roughly twenty billion galaxies, according to Toby Ord's The Precipice, p. 233. ↩︎
"Like the colonialists?" Well: the "uninhabited" bit is really important – at least if you're a boundary-respecter. But let's not pretend that colonialist vibes are so far off in the distance, here. ↩︎
In particular: lots of human and animal lineages have suffered, died, and disappeared for lack of land (and this is not to mention: having their land actively stolen, invaded, and so on). And what are most wars fought over? Thanks to Carl Shulman for discussion here. ↩︎
Though I remain pretty uncertain/confused about various of the issues here. And obviously, it would be great to first get a bunch more ethical clarity about this sort of thing before having to make decisions about it. ↩︎
More seriously than e.g. the illusionists. ↩︎
E.g., I worry it'll end up looking like people saying "if an agent doesn't have phlogiston, it doesn't deserve any moral weight." ↩︎
Game theory works regardless of whether the agents you're interacting with are conscious. ↩︎
In the context of choosing-what-to-build-in-your-backyard, I feel much happier to focus directly on getting the "thing-that-matters-in-the-vicinity-what-we-currently-call-consciousness" thing right. But here I'm talking about the bits of ethics that are about relating-to-other-backyards (but: still in a terminal-values sense, not a game-theory sense). ↩︎
We're assuming that you're not running any slaves on the laptop. ↩︎
Thanks to Howie Lempel for discussion. ↩︎
And note that just because it's sentient doesn't mean the world it creates involves a lot of sentience. ↩︎
Though perhaps not: that Yudkowsky would advise them to do the same. ↩︎
For some of these rationales, note that it's not actually clear how this gets him away from the programmers just extrapolating their own volitions. After all, if their own extrapolated volitions would value fairness, not being a jerk, golden-ruling, etc in the manner in question, then the output of the extrapolation process would presumably reflect this (Yudkowsky uses this sort of dynamic to respond to various other objections to his proposal: e.g., "if that's a good objection, our extrapolated volitions will notice and adjust for it"). And if not, they would have avoided a mistake by their own lights by keeping the circle narrow.

Indeed, in a simple version of Yudkowsky's ontology, it's unclear how the programmers could possibly do better than just extrapolating their own volitions. Their own extrapolated volitions, after all, set the standard (on Yudkowsky's anti-realist ethics) for what the right choice would be. Is Yudkowsky imagining programmers who face the option to make a correct-by-definition choice, and advising them to maybe make a mistake instead?

Well, let's be careful. Some choices can't be unmade – including choices to find out what-you-should-have-done. Suppose, at t1, that your mother is about to drown, and you have a choice between saving her, or asking a genie for advice/service. If you ask the genie "what is the right decision at t1?", it might well answer at t2, "you should have saved your mother, who just drowned." And if you ask it "figure out what I should have done at t1, and then do it," it might be too late. So, too, with the choice to seek power. Power is useful for many values, yes, but famously, obviously, seeking power can compromise your values too. Indeed, it often does, given how many of our ethical values are specifically about regulating who gets what sort of power (cf "boundaries" above) – plus, you know, the power-corrupts thing, the biased-in-favor-of-yourself thing, and so on. And this holds true even if the power in question will grant you arbitrary insight into the values you compromised. If you take-over-the-world in the process of finding out whether you should've taken-over-the-world – well, you can still have fucked up.

And beyond this, certain kinds of cooperation, coordination, and commitment often involve making choices that might seem at the time, from the perspective of a certain kind of narrow rational calculation, like "mistakes." The way, for example, cooperating in a prisoner's dilemma – or paying in the city in "Parfit's hitchhiker" – is a "mistake." The type of mistake that seems, mysteriously, to get made by agents who end up rich, or alive-at-all. Is it a mystery? Sometimes, being the sort of person that others can trust, coordinate with, rely on, get-to-the-pareto-frontier-with, and so on requires being such that you don't just grab power for yourself (or lie, or steal, or crush the outgroup, or throw out the procedural norms of your democracy, or...) when you can get away with it, or think you can – even if that's what would get you the most (extrapolated) utility at the time (at least, for some notion of "would").

And we can talk about other possible reasons why Yudkowsky's programmers might use a wider "extrapolation base" than their own volitions as well (see e.g. Yudkowsky's original paper, and discussion on Arbital here, for longer discussion). ↩︎
I'm not counting the "Comp sci in 2027" as really laying out a position re: what to do. ↩︎
For example, in the context of whether animals should be empowered, Yudkowsky worries: what happens if you "uplift" a bear, or a chimp, or an ichneumonid wasp, and it just wants to eat babies, or to sit atop some violent and oppressive dominance hierarchy, or to lay parasitic eggs inside of everyone? And Yudkowsky worries about humans in this respect as well – see, e.g., his discussion of the "selfish bastards" problem here, in which so many present-day humans want sentient, suffering slaves that humanity's CEV says yes. But as I've tried to emphasize: these aren't just any old values differences. Rather, these are precisely the sort of values differences that liberalism/niceness/boundaries gets fussed about. ↩︎
Though: he was always less of an atheist than Yudkowsky. ↩︎
And blind hope that blah sort of deontological-seeming behavior will somehow lead to the best consequences can easily fail to grapple with the trade-offs that actual-deontology actually implies. ↩︎
If you think of libertarianism as encoding a minimal form of niceness/liberalism/boundaries, then a libertarian-ish, Age-of-Em-ish world where eventually all the sentient agents die/lose their property/get outcompeted, but through legal and minimal-ethical-constraint-respecting processes, might be one example here. ↩︎
And of course, even working on behalf of liberalism/niceness/boundaries is a form of yang in its own right. ↩︎

Being nicer than Clippy

Being nicer than Clippy

Utilitarian vices

Boundaries

What if the humans-who-like-paperclips get a bunch of power, though?

An aside on AI sentience

Giving AIs-with-different-values a stake in civilization

The power of niceness, community, and civilization

Is niceness enough?