Corporate AI Labs' Odd Role in Their Own Governance

Corporate AI Labs' Odd Role in Their Own Governance

Corporate AI Labs' Odd Role in Their Own Governance

15 min readJul 29, 2024

Comments

Curated and popular this week

What would an animal-aligned AI be aligned to?

Aidan Kankyoku, Anima International·1w ago·Curated 3d ago·15m read

This is a crosspost from the new Animal Welfare Alignment Newsletter by Anima International. You can subscribe on Substack if you are interested in following these efforts. Audio reading also available on Substack. The goals of this post are to: 1. Raise a question I see as crucially important to the goal of aligning AI to animal welfare...

177

The first video from Giving What We Can's new channel is out now!

JustinPortela·4d ago·1m read

Hello! I'm Justin Portela. I got hired by GWWC to make YouTube videos after AI in Context did such a kickass job. My channel is using that same cinematic, high-production value beauty to talk about everything in the EA universe that isn't AI. ...

New round of digital minds funding opportunities at Longview

zdgroff, Longview Philanthropy·6d ago·2m read

This is a linkpost for Request for Proposals: Research and Applied Work on Digital Minds. I'm glad to announce a request for proposals for research and applied work on digital minds at Longview Ph...

Recent opportunities to take action

A huge way you can help pigs in 5-20 minutes (in the US)

ElliotTep·2d ago·1m read

173

Possible mistake EAs are making and shout out to Pause AI UK

Michelle_Hutchinson·2w ago·4m read

New round of digital minds funding opportunities at Longview

zdgroff, Longview Philanthropy·6d ago·2m read

Tyler Johnston

Thank you for writing this! It's probably the most clear and rigorous way I've seen these arguments presented, and I think a lot of the specific claims here are true and important to notice.

That being said, I want to offer some counterarguments, both for their own sake and to prompt discussion in case I'm missing something. I should probably add the disclaimer that I'm currently working at an organization advocating for stronger self-governance among AI companies, so I may have some pre-existing biases toward defending this strategy. But it also makes this question very relevant to me and I hope to learn something here.

Addressing particular sections:

Only Profit-Maximizers Stay At The Frontier

This section is interesting and reminds me of some metaphors I've heard comparing the mechanism of free markets to Darwinism... i.e. you have to profit-maximize, and if you don't, someone else will and they'll take your place. It's survival of the fittest, like it or not. Take this naïve metaphor seriously enough and you would expect most market ecosystems to be "red in tooth and claw," with bare-minimum wages, rampant corner-cutting, nothing remotely resembling CSR/ESG, etc.

One problem is: I'm not sure how true this is to begin with. Plenty of large companies act in non-profit-maximizing ways simply out of human error, or passivity, or because the market isn't perfectly competitive (maybe they and their nearest rivals are benefitting from entrenchment and economies of scale that mean they no longer have to), or perhaps most importantly, because they are all responding to non-financial incentives (such as the personal values of the people at the company) that their competitors are equally subject to.

But more convincingly, I think social good / avoiding dangerous accidents really are just more aligned with profit incentives than the metaphor would naively suggest. I know your piece acknowledges this, but you also write it off as having limitations, especially under race conditions aiming toward a particular capabilities threshold.

But that doesn't totally follow to me — under such conditions, while you might be more open to high-variance, high-risk strategies to reach that threshold, you might also be more averse to those strategies since the costs (direct or reputational or otherwise) imposed by accidents before that threshold is reached become so much more salient. In the case of AI, the costs of a major misuse incident from an AI product (threatening investment/employee retention/regulatory scrutiny/etc.) might outweigh the benefits of moving quickly or without regard to safety — even when racing to a critical threshold. A lot of this probably depends on how far off you think such a capability threshold is, and where relative to the frontier you currently are. This is all to say that race dynamics might make high-variance high-risk strategies more attractive, but they also might make them less attractive, and the devil is probably in the details. I haven't heard a good argument for how the AI case shakes out (and I've been thinking about it for a while).

Also, correct me if I'm wrong, but one thing the worldview you write about here would suggest is that we shouldn't trust companies to fulfill their commitments to carbon neutrality, or that if they do, they will soon no longer be on the forefront of their industry — doing so is expensive, nobody is requiring it of them (at least not on the timeline they are committing to), the commitment is easy to abandon, and even if they do it, someone who chooses not to will outcompete them and take their place at the forefront of the market. But I just don't really expect that to happen. I think in 2030 there's a good chance Apple's supply chain will be carbon-neutral, and that they'll still be in the lead for consumer electronics (either because the reputational benefits of the choice, and the downstream effects it has on revenue and employee retention and whatnot, made it the profit-maximizing thing to do, and/or because they were sufficiently large/entrenched that they can just make choices like that due to non-financial personal/corporate values without damaging their competitive position, even when doing so isn't maximally efficient.)

Early in the piece, you write:

A profit-driven tech corporation seems exceedingly unlikely to hinge astronomical capex on an AI corporation that does not give off the unmistakable impression of pursuing maximal profits.

But we can already prove this isn't true given that OpenAI has a profit cap, their deal with Microsoft had a built-in expiration, and Anthropic is a B-corp. Even if you don't trust that some of these measures will be ahered to (e.g. I believe the details on OpenAI's profit cap quietly changed over time), they certainly do not give off the unmistakable impression of maximal profit seeking. But I think these facts exist because either (1) many of the people at these companies are thinking about social impact in addition to profit (2) social responsibility is an important intermediate step to being profitable, or (3) the companies are so entrenched that there simply are no alternative, extra profit-maximizing firms who can compete, i.e. they have the headroom to make concessions like this, much as Apple can make climate commitments. I'm not sure what the balance between these three explanations are, but #1 and #3 challenge the strong view that only seemingly hard-nosed profit-maximizers are going to win here, and #2 challenges the view that profit-maximizing is mutually exclusive with long-term safety efforts.

All this considered, my take here is instead something like "We should expect frontier AI companies to generally act in profit-maximizing ways, but we shouldn't expect them to always be perfectly profit-maximizing across all dimensions, nor should we expect that profit-maximizing is always opposed to safety."

Constraints from Corporate Structure Are Dangerously Ineffective

I don't have a major counterargument here, aside from the fact that well-documented and legally recognized corporate structures often can be pretty effective thanks in part to the fact that judges/regulators get input on when and how they can be changed, and while I'm no expert, my understanding is that there are ways to optimize for this.

But your idea that companies are exchangeable shells for what really matters under the hood — compute, data, algorithms, employees — seems very true and very underrated to me. I think of this as something like "realpolitik" for AI safety. What really matters, above ideology and figureheads and voluntary commitments, is where the actual power lies (which is also where the actual bottlenecks for developing AI are) and where that power wants to go.

Hope In RSPs Is Misguided

The claim that "RSPs on their own can and will easily be discarded once they become inconvenient" seems far too strong to me — and again, if it were true, we should expect to see this with all costly voluntary safety/CSR measures that are made in other industries (which often isn't the case).

A few things that may make non-binding voluntary commitments like RSPs hard to discard:

It's really hard to abandon them without looking hypocritical and untrustworthy (to the public, to regulators, to employees, to corporate partners, etc.)
In large bureaucracies, lock-in effects make it easy to create new teams/procedures/practices/cultures and much harder to change them.
Abandoning these commitments can open companies up to liability for deceptive advertising, misleading shareholders, or even fraud if the safety practices were used to promote e.g. an AI product, convince investors that they are a good company to support, convince regulators or the public that they can be trusted, etc. I'm not an expert on this by any means, nor do I have specific examples to point to right now, so take this one with a grain of salt.

There's also the fact that RSPs aren't strictly an invention of the AI labs. Plenty of independent experts have been involved in developing and advocating for either RSPs or risk evaluation procedures that look like them.

Here, I think a more defensible claim would be "The fact that RSPs may be easily discarded when inconvenient should be a point in favor of binding solutions like legislation, or at least indicate that they should be considered one of many potentially fallible safeguards for a defense-in-depth strategy"

An optimistic view of RSPs might be that they are a good way to hold AI corporations accountable - that public and political attention would be able to somehow sanction labs once they did diverge from their RSPs. Not only is this a fairly convoluted mechanism of efficacy, it also seems empirically shaky: Meta is a leading AI corporation with industry-topping amounts of compute and talent and does not publish RSPs. This seems to have garnered neither impactful public and political scrutiny nor hurt the Meta AI business.

Minor factual point: probably worth noting that Meta, as well as most leading AI labs, have now committed to publish an RSP. Time will tell what their policy ends up looking like.

It's true that the presence of, and quality of, RSPs at individual companies doesn't seem to have translated to any public/political scrutiny yet. I'm optimistic this can change (it's what I'm working on), or perhaps even will change by default once models reach a new level of capabilities that make catastrophic risks from AI an ever-more-salient issue among the public.

The downside of choosing an RSP-based legislative process should be obvious - it limits, or at least frames, the option space to the concepts and mechanisms provided by the AI corporations themselves. But this might be a harmful limitation: As we have argued above, these companies are incentivized to mainly provide mechanisms they might be able to evade, that might fit their idiosyncratic technical advantages, that might strengthen their market position, etc. RSP codification hence seems like a worse way to safe AI legislation than standard regulatory and legislative processes.

This is a question: my understanding is that the RSP model was specifically inspired by regulatory pathways from other industries, where voluntary measures like this got codified into what is now seen (in retrospect) as sensible policy. Is this true? I can't remember where I heard it, and can't find mention of it now, but if so, it seems like those past cases might be informative in terms of how successful we can expect the RSP codification strategy to be today.

That actually brings me to one last meta point that I want to make, which is that I am tempted to think that we are just in a weird situation where there are psychological facts about the people at leading profit-driven AI labs that make the heuristic of profit maximization a poor predictor of their behavior, and a lot of this comes down to genuine, non-financial concern about long-term safety.

Earlier I mentioned how even in a competitive market, you might see multiple corporations collectively acting in non-profit-maximizing ways due to non-financial incentives collectively acting upon the decision-makers at each those companies. Companies are full of humans who make choices for non-financial reasons, like wanting to feel like a good person, wanting to have a peaceful home life where their loved ones accept and admire them, and genuinely wanting to fix problems in the world. I think the current psychological profile of AI lab leaders (and, indeed, the AI lab employees that hold the "real power" under the hood) is surprisingly biased toward genuine concern about the risks of AI. Many of them correctly recognized, way before anyone else, how important this technology would be.

Sorry for the long comment. l do think AI labs need fierce scrutiny and binding constraints, and their incentives are largely not pointing in the right place and might bias them toward putting profit over safety — again, this is my main focus right now — but I'm also not ready to totally write off their ability to adopt genuinely valuable and productive voluntary measures to reduce AI risk.

Corporate AI Labs' Odd Role in Their Own Governance

Corporate AI Labs' Odd Role in Their Own Governance

Executive Summary

TL;DR

Introduction

Only Profit-Maximizers Stay At The Frontier

Constraints from Corporate Structure Are Dangerously Ineffective

Hope In RSPs Is Misguided

For-Profit Policy Work Is Called Corporate Lobbying

Conclusion