Cullen_OKeefe's Comments

What are the challenges and problems with programming law-breaking constraints into AGI?

The key difference in my mind is that the AI system does not need to determine the relative authoritativeness of different pronouncements of human value, since the legal authoritativeness of e.g. caselaw is pretty formalized. But I agree that this is less of an issue if the primary route to alignment is just getting an AI to follow the instructions of its principal.

What are the challenges and problems with programming law-breaking constraints into AGI?

Certainly you still need legal accountability -- why wouldn't we have that? If we solve alignment, then we can just have the AI's owner be accountable for any law-breaking actions the AI takes.

I agree that that is a very good and desirable step to take. However, as I said, it also incentives the AI-agent to obfuscate its actions and intentions to save its principal. In the human context, human agents do this but are independently disincentivized from breaking the law they face legal liability (a disincentive) for their actions. I want (and I suspect you also want) AI systems to have such incentivization.

If I understand correctly, you identify two ways to do this in the teenager analogy:

  1. Rewiring
  2. Explaining laws and their consequences and letting the agent's existing incentives do the rest.

I could be wrong about this, but ultimately, for AI systems, it seems like both are actually similarly difficult. As you've said, for 2. to be most effective, you probably need "AI police." Those police will need a way of interpreting the legality of an AI agent's {"mental" state; actions} and mapping them only existing laws.

But if you need to do that for effective enforcement, I don't see why (from a societal perspective) we shouldn't just do that on the actor's side and not the "police's" side. Baking the enforcement into the agents has the benefits of:

  1. Not incentivizing an arms race
  2. Giving the enforcer's a clearer picture of the AI's "mental state"
What are the challenges and problems with programming law-breaking constraints into AGI?

But my real reason for not caring too much about this is that in this story we rely on the AI's "intelligence" to "understand" laws, as opposed to "programming it in"; given that we're worried about superintelligent AI it should be "intelligent" enough to "understand" what humans want as well (given that humans seem to be able to do that).

My intuition is that more formal systems will be easier for AI to understand earlier in the "evolution" of SOTA AI intelligence than less-formal systems. Since law is more formal than human values (including both the way it's written and the formal significance of interpretative texts), then we might get good law-following before good value alignment.

I'm not sure what you're trying to imply with this -- does this make the AIs task easier? Harder? The generality somehow implies that the AI is safer?

Sorry. I was responding to the "all laws" point. My point was that I think that making a law-following AI that can follow (A) all enumerated laws is not much harder than one that can be made to follow (B) any given law. That is, difficulty of construction scales sub-linearly with the number of laws it needs to follow. The interpretative tools that should get to (B) should be pretty generalizable to (A).

What are the challenges and problems with programming law-breaking constraints into AGI?

First, it would be hard to do. I am a programmer / ML researcher and I have no idea how to program an AI to follow the law in some guaranteed way. I also have an intuitive sense that it would be very difficult. I think the vast majority of programmers / ML researchers would agree with me on this.

This is valuable information. However, some ML people I have talked about this with have given positive feedback, so I think you might be overestimating the difficulty.

Second, it doesn't provide much value, because you can get most of the benefits via enforcement, which has the virtue of being the solution we currently use.

Part of the reason that enforcement works, though, is that human agents have an independent incentive not to break the law (or, e.g., report legal violations) since they are legally accountable for their actions.

But AI-enabled police would be able to probe actions, infer motives, and detect bad behavior better than humans could. In addition, AI systems could have fewer rights than humans, and could be designed to be more transparent than humans, making the police's job easier.

This seems to require the same type of fundamental ML research that I am proposing: mapping AI actions onto laws.

FHI Report: The Windfall Clause: Distributing the Benefits of AI for the Common Good

I agree that the problem (that investors will prefer to invest in non-signatories, and hence it will reduce the likelihood of pro-social firms winning, if pro-social firms are more likely to sign) does seem like a credible issue. I found the description of the proposed solution rather confusing however. Given that I worked as an equity analyst for five years, I would be surprised if many other readers could understand it!

Apologies that this was confusing, and thanks for trying to deconfuse it :-)

Subsequent feedback on this (not reflected in the report) is that issuing low-value super-junior equity at the time of signing (and then holding it in trust) is probably the best option for this.

FHI Report: The Windfall Clause: Distributing the Benefits of AI for the Common Good

I strongly disagree with this non-sequitur. The fact that we have achieved some level of material success now doesn't mean that the future opportunity isn't very large. Again, Chamley-Judd is the classic result in the space, suggesting that it is never appropriate to tax investment for distributional purposes - if the latter must be done, it should be done with individual-level consumption/income taxation. This should be especially clear to EAs who are aware of the astronomical waste of potentially forgoing or delaying growth.

However, it's very hard to get individuals to sign a WC for a huge number of reasons. See

The pool of potentially windfall-generating firms is much smaller and more stable than the number of potential windfall-generating individuals, meaning that securing commitments from firms would probably capture more of the potential windfall than securing commitments from individuals. Thus, targeting firms as such seems reasonable.

FHI Report: The Windfall Clause: Distributing the Benefits of AI for the Common Good

Elsewhere in the document you do hint at another response - namely that by adopting the clause, companies will help avoid future taxation (though I am sceptical): ... However, it seems that the document equivocates on whether or not the clause is to reduce taxes, as elsewhere in the document you deny this:

I think both outcomes are possible. The second point is simply to point out that the WC does not and cannot (as a legal matter) prevent a state from levying taxes on firms. The first two points, by contrast, are a prediction that the WC will make such taxation less likely.

FHI Report: The Windfall Clause: Distributing the Benefits of AI for the Common Good

The report then goes on to discuss externalities:

Secondly, unbridled incentives to innovate are not necessarily always good, particularly when many of the potential downsides of that innovation are externalized in the form of public harms. The Windfall Clause attempts to internalize some of these externalities to the signatory, which hopefully contributes to steering innovation incentives in ways that minimize these negative externalities and compensate their bearers.

Here you approvingly cite Seb's paper, but I do not think it supports your point at all. Firms have both positive and negative externalities, and causing them to internalise them requires tailored solutions - e.g. a carbon tax.

I agree that the WC does not target the externalities of AI development maximally efficiently. However, I think that the externalities of such development are probably significantly correlated with windfall-generation. Windfall-generation seems to me to be very likely to accompany a risk of a huge number of negative externalities—such as those cited in the Malicious Use report and classic X-risks.

A good analogy might therefore be to a gas tax for funding road construction/maintenance, which imperfectly targets the thing we actually care about (wear and tear on roads), but is correlated with it so it's a decent policy.

To be clear, I agree that it's not the best way of addressing those externalities, and that the best possible option is to institute a Pigouvian tax (via insurance on them like Farquhar et al. suggest or otherwise).

'Being very profitable' is not a negative externality It is if it leads to inequality, which it seems likely to. Equality is a psychological good, and so windfall has negative psychological externalities on the "losers."

FHI Report: The Windfall Clause: Distributing the Benefits of AI for the Common Good

Furthermore, firms can voluntarily but irrationally reduce their incentives to innovate - for example a CEO might sign up for the clause because he personally got a lot of positive press for doing so, even at the cost of the firm.

This same reasoning also shows why firms might seek positional goods. E.g., executives and AI engineers might really care about being the first to develop AGI. Thus, the positional arguments for taxing windfall come back into play to the same extent that this is true.

Additionally, by publicising this idea you are changing the landscape - a firm which might have seen no reason to sign up might now feel pressured to do so after a public campaign, even though their submission is 'voluntary'.

This is certainly true. I think we as a community should discuss (as here) what the tradeoffs are. Reduced innovation in AI is a real cost. So too are the harms identified in the WC report and more traditional X-risk harms. We should set the demands of firms such that the costs to innovation are outweighed by benefits from long-run wellbeing.

FHI Report: The Windfall Clause: Distributing the Benefits of AI for the Common Good

As a blanket note about your next few points, I agree that the WC would disincentivize innovation to some extent. It was not my intention to claim—nor do I think I actually claimed (IIRC)—that it would have no socially undesirable incentive effects on innovation. Rather, the points I was making were more aimed at illuminating possible reasons why this might not be so bad. In general, my position is that the other upsides probably outweigh the (real!) downsides of disincentivizing innovation. Perhaps I should have been more clear about that.

But corporations are much less motivated by fame and love of their work than individuals, so this does not seem very relevant, and furthermore it does not address the inter-temporal issue which is the main objection to corporation taxes.

Yep, that seems right.

Load More