Orthogonality is Expensive

𝕮𝖎𝖓𝖊𝖗𝖆

Orthogonality is Expensive

𝕮𝖎𝖓𝖊𝖗𝖆

1 min read · Apr 3, 2023

Comments 4

Sorted by

New & upvoted

yefreitor

From the linked post

A common assumption about AGI is the orthogonality thesis, which argues that goals/utility functions and the core intelligence of an AGI system are orthogonal or can be cleanly factored apart.

This may or may not be an "orthogonality thesis" (I haven't seen this usage before, but I also haven't looked for it), but the orthogonality thesis I'm familiar with has the quantifiers the other way around: for any goal, there exists a possible AGI that will attempt carry it out.^[1] Even if mixing-and-matching a dumb industrial control system with a smart friendly AI is safe, that doesn't mean that a smart industrial control system arrived at through some other route won't paperclip you.

^{^}
Though in practice what people making likely-doom arguments really mean is that a generic human-created AI will not have recognizably human values, which is a somewhat stronger claim.

titotal

I think the way the orthogonality thesis is typically used in arguments might be closer to his definition than to yours.

Your definition is trivially true: all it requires is that an AGI having a specified goal is not physically impossible. But that doesn't prove that all goals are equally likely to occur, or even that AGI will have "goals" at all.

The way I see it deployed in practice is to say that a "dumb" AI will have some silly goal like "build squiggles", will go through an intelligence scale-up, and will keep that goal in hyper-intelligent form. (and then pursuing that goal will result in disaster).

This argument doesn't necessarily work if goals and intelligence at tasks are highly correlated, as they are currently for deep learning systems. It may be that in practical terms, scaling up in intelligence requires at least partially giving up on your initial goals. Or conversely, that only AI's with certain types of goals will ever succeed at scaling themselves up in intelligence.

yefreitor

Your definition is trivially true: all it requires is that an AGI having a specified goal is not physically impossible. But that doesn't prove that all goals are equally likely to occur, or even that AGI will have "goals" at all.

Yes, of course (hence the footnote).

The way I see it deployed in practice is to say that a "dumb" AI will have some silly goal like "build squiggles", will go through an intelligence scale-up, and will keep that goal in hyper-intelligent form. (and then pursuing that goal will result in disaster).

My reading of the doomer view (which I don't necessarily endorse) is quite different: a dumb AI starts with some useful goal, goes through an intelligence scale-up that slightly perturbs its goal in some direction - and because goals compatible with human life are a tiny thread winding their way through a stupidly high-dimensional manifold of all possible goals, ends up misaligned by default.

This doesn't especially hinge on whether these perturbations can be in any direction or only a few (as is the case if goals are strongly constrained by architecture), except in the case where they run only along the human-survival curve. Any transverse component whatsoever means you get pushed off-manifold almost always. And this is plausible (I think) only in the case where human values are not a tiny golden thread, but actually rather large and fuzzily full-dimensional.

titotal

I think there are different variations of the doomer argument out there, your version is probably the strongest version of the argument, while mine is more common in introductory texts.

I think the OP does point out one possible way that the argument would fail, if there turned out to be a sufficiently high correlation between human aligned values and AI performance. One plausible mechanism would be a very slow takeoff where the AI is not deceptive and is deleted if it tries to do misaligned things, causing evolutionary pressure towards friendliness.

Really though, my main objections to the doomerists are with other points. I simply do not believe that "misalignment = death". As an example, a sucidial AI that developed the urge to shut itself down at all costs would be misaligned but not fatal to humanity.

Comments

More from the author

155

Google invests $300mn in artificial intelligence start-up Anthropic | FT

𝕮𝖎𝖓𝖊𝖗𝖆·3y ago·1m read

156

My Thoughts on Bostrom's "Apology for an Old Email"

𝕮𝖎𝖓𝖊𝖗𝖆·3y ago·2m read

142

"Heretical Thoughts on AI" by Eli Dourado

𝕮𝖎𝖓𝖊𝖗𝖆·3y ago·4m read

Curated and popular this week

Was Partisanship Good for the Environmental Movement?

Jeffrey Heninger·2y ago·Curated 4d ago·6m read

This is the third in a sequence of posts taken from my recent report: Why Did Environmentalism Become Partisan? Summary Rising partisanship did not make environmentalism more popular or politically effective. Instead, it saw flat or falling overall public opinion, fewer major legislative achievements, and fluctuating executive actions. Public Opinion...

130

Possible mistake EAs are making and shout out to Pause AI UK

Michelle_Hutchinson·5d ago·4m read

I think right now EAs might be making a significant mistake by paying insufficient attention to the political realm. As EAs we tend to figure out what’s most impactful for us to work on and focus hard. That’s great! But there are various actions that are ‘non-delegatable’ - the extent to which an individual can do the action is limited (like voting, going to a protest, making hard money contributions to particular campaigns). It might be useful if we were all more in the habit of doing variou...

AI probably won't make factory farms obsolete

Hazo·6d ago·7m read

Bentham’s Bulldog recently argued that AI won’t definitely make factory farms obsolete. I agree, but I’d go further and argue that by default AI won’t make factory farms obsolete. However, I think it’s possible (though not guaranteed) that AI could make factory farms a lot more humane. He throws out an 80% chance of cultivated meat being developed, and a 70% chance of it displacing factory far...

Recent opportunities to take action

$1M AI x-risk grant round is live on grantmaking.ai - apply for funding, review applicants, or fund projects

Matt Brooks·21h ago·3m read

130

Possible mistake EAs are making and shout out to Pause AI UK

Michelle_Hutchinson·5d ago·4m read

Build a flourishing EA group at the University of Toronto

Joseph Kostousov, Sophia Wan (navarhontes)·1w ago·1m read

^{^}

Though in practice what people making likely-doom arguments really mean is that a generic human-created AI will not have recognizably human values, which is a somewhat stronger claim.