Navigating Risks from
Advanced Artificial Intelligence: A Guide for Philanthropists [Founders Pledge]

Tom Barnes🔸

Navigating Risks from Advanced Artificial Intelligence: A Guide for Philanthropists [Founders Pledge]

Tom Barnes🔸

2 min read · Jun 21, 2024

101

Comments 6

Sorted by

New & upvoted

Matthew_Barnett

(I have not read the full report yet, I'm merely commenting on a section in the condensed report.)

Big tech companies are incentivized to act irresponsibly
Whilst AI companies are set to earn enormous profits from developing powerful AI systems, the costs these systems impose are borne by society at large. These costs are negative externalities, like those imposed on the public by chemical companies that pollute rivers, or large banks whose failure poses systemic risks.
Further, as companies engage in fierce competition to build AI systems, they are more inclined to cut corners in a race to the bottom. In such a race, even well-meaning companies will have fewer and fewer resources dedicated to tackling the harms and threats their systems create. Of course, AI firms may take some action to mitigate risks from their products 4 - but there are well-studied reasons to suspect they will underinvest in such safety measures.

This argument seems wrong to me. While AI does pose negative externalities—like any technology—it does not seem unusual among technologies in this specific respect (beyond the fact that both the positive and negative effects will be large). Indeed, if AI poses an existential risk, that risk is borne by both the developers and general society. Therefore, it's unclear whether there is actually an incentive for developers to dangerously "race" if they are fully rational and informed of all relevant facts.

In my opinion, the main risk of AI does not come from negative externalities, but rather from a more fundamental knowledge problem: we cannot easily predict the results of deploying AI widely, over long time horizons. This problem is real but it does not by itself imply that individual AI developers are incentivized to act irresponsibly in the way described by the article; instead, it implies that developers may act unwisely out of ignorance of the full consequences of their actions.

These two concepts—negative externalities, and the knowledge problem—should be carefully distinguished, as they have different implications for how to regulate AI optimally. If AI poses large negative externalities (and these are not outweighed by their positive externalities), then the solution could look like a tax on AI development, or regulation with a similar effect. On the other hand, if the problem posed by AI is that it is difficult to predict how AI will impact the world in the coming decades, then the solution plausibly looks more like investigating how AI will likely unfold and affect the world.

Matthew_Barnett

From the full report,

It is not merely enough that we specify an “aligned” objective for a powerful AI system, nor just that objective be internalized by the AI system, but that we do both of these on the first try. Otherwise, an AI engaging in misaligned behaviors would be shut down by humans. So to get ahead, the AI would first try to shut down humans.

I dispute that we need to get alignment right on the first try, and otherwise we're doomed. However, this question depends critically on what is meant by "first try". Let's consider two possible interpretations of the idea that we only get "one try" to develop AI:

Interpretation 1: "At some point we will build a general AI system for the first time. If this system is misaligned, then all humans will die. Otherwise, we will not all die."

Interpretation 2: "The decision to build AI is, in a sense, irreversible. Once we have deployed AI systems widely, it is unlikely that we could roll them back, just like how we can't roll back the internet, or electricity."

I expect the first interpretation of this thesis will turn out incorrect because the "first" general AI systems will likely be rather weak and unable to unilaterally disempower all of humanity. This seems evident to me because of the fact that current AI systems are already fairly general (and increasingly so), and yet are weak, and are as-yet far from being able to disempower humanity.

These current systems also seem to be increasing in their capabilities somewhat incrementally, albeit at a rapid pace^[1]. I think it is highly likely that we will have many attempts at aligning general AI systems before they become more powerful than the rest of humanity combined, either individually or collectively. This implies that we do not get only "one try" to align AI—in fact, we will likely have many tries, and these attempts will help us accumulate evidence about the difficulty of alignment on the even more powerful systems that we build next.

To the extent that you are simply defining the "first try" as the last system developed before humans become disempowered, then this claim seems confused. Building such a system is better viewed as a "last try" than a "first try" at AI, since it would not necessarily be the first general AI system that we develop. It also seems likely that the construction of such a system would be aided substantially by AI-guided R&D, making it unclear to what extent it was really "humanity's try" at AI.

Interpretation 2 appears similarly confused. It may be true that the decision to deploy AI on a wide scale is irreversible, if indeed these systems have a lot of value and are generally intelligent, which would make it hard to "put the genie back in the bottle". However, AI does not seem unusual in this respect among technologies, as it is similarly nearly impossible to reverse the course of technological progress in almost all other domains.

More generally, it is simply a fundamental feature of all decision-making that actions are irreversible, in the sense that it is impossible to go back in time and make different decisions than the ones we had in fact made. As a general property of the world, rather than a narrow feature of AI development in particular, this fact in isolation does little to motivate any specific AI policy.

^{^}
I do not think the existence of emergent capabilities implies that general AI systems are getting more capable in a discontinuous fashion, as emergent capabilities are generally quite narrow abilities, rather than constituting an average competence level of AI systems. On broad measures of intelligence, such as the MMLU, AI systems appear to be developing more incrementally. And moreover, many apparently emergent capabilities are merely artifacts of the way we measure them, and therefore do not reflect underlying discontinuities in latent abilities.

Matthew_Barnett

From the full report,

Even if power-seeking APS systems are deployed, it’s not obvious that they would permanently disempower humanity. We may be able to stop the system in its tracks (by either literally or metaphorically “pulling the plug”). First, we need to consider the mechanisms by which AI systems attempt to takeover (i.e. disempower) humanity. Second, we need to consider various risk factors for a successful takeover attempt.
Hacking computer systems....
Persuading, manipulating or coercing humans....
Gain broad social influence... For instance, AI systems might be able to engage in
electoral manipulation, steering voters towards policymakers less willing or able to
prevent AIs systems being integrated into other key places of power.
Gaining access to money... If misaligned systems are rolled out into financial markets, they may be able to siphon off money without human detection.
Developing advanced technologies... An AI system adept at the science, engineering and manufacturing of nanotechnology, along with access to the physical world, might be able to rapidly construct and deploy dangerous nanosystems, leading to a “gray goo” scenario described by Drexler (1986).

I think the key weakness in this part of the argument is that it overlooks lawful, non-predatory strategies for satisfying goals. As a result, you give the impression that any AI that has non-human goals will, by default, take anti-social actions that harm others in pursuit of their goals. I believe this idea is false.

The concept of instrumental convergence, even if true^[1], does not generally imply that almost all power-seeking agents will achieve their goals through nefarious means. Ordinary trade, compromise, and acting through the legal system (rather than outside of it) are usually rational means of achieving your goals.

Certainly among humans, a desire for resources (e.g. food, housing, material goods) does not automatically imply that humans will universally converge on unlawful or predatory behavior to achieve their goals. That's because there are typically more benign ways of accomplishing these goals than theft or social manipulation. In other words, we can generally get what we want in a way that is not negative-sum and does not hurt other people as a side effect.

To the extent you think power-seeking behavior among humans is usually positive-sum, but will become negative-sum when in manifests in AIs, this premise needs to be justified. One cannot explain the positive sum-nature of the existing human world by positing that humans are aligned with each other and have pro-social values, as this appears to be a poor explanation for why humans obey the law.

Indeed, the legal system itself can be seen as a way for power-seeking misaligned agents to compromise on a framework that allows agents within it to achieve their goals efficiently, without hurting others. In a state of full mutual inter-alignment with other agents, criminal law would largely be unnecessary. Yet it is necessary, because humans in fact do not share all their goals with each other.

It is likely, of course, that AIs will exceed human intelligence. But this fact alone does not imply that AIs will take unlawful actions to pursue their goals, since the legal system could become better at coping with more intelligent agents at the same time AIs are incorporated into it.

We could imagine an analogous case in which genetically engineered humans are introduced into the legal system. As these modified humans get smarter over time, and begin taking on roles within the legal system itself, our institutions would adapt, and likely become more capable of policing increasingly sophisticated behavior. In this scenario, as in the case of AI, "smarter" does not imply a proclivity towards predatory and unlawful behavior in pursuit of one's goals.

^{^}
I personally doubt that the instrumental convergence thesis is true as it pertains to "sufficiently intelligent" AIs which were not purposely trained to have open-ended goals. I do not expect, for example, that GPT-5 or GPT-6 will spontaneously develop a desire to acquire resources or preserve their own existence, unless they are subject to specific fine-tuning that would reinforce those impulses.

Caruso

I haven't seen the phrase "Advanced Artificial Intelligence" in use before. How does AAI differ from Frontier AI, AGI, and Artificial Superintelligence?

Tom Barnes🔸

I'm somewhat loose with definitions here, defining it effectively the same way as "Transformative AI" (TAI) but intentionally not being too prescriptive for the most part (except in cases where definitions become important)

Caruso

Thank you.

Separately, I just read your executive summary re the nuclear threat; something that I think is particularly serious and worthy of effort. It read to me like the report suggests that there is such a thing as a limited nuclear exchange. If that's correct, I would offer that you're doing more harm than good by promoting that view which unfortunately some politicians and military officers share.

If you have not yet read, or listened to, Nuclear War: A Scenario by Anne Jacobsen, I highly encourage you to do so. Your budget for finding ways to prevent that from happening would, in my opinion, be well-spent creating condensed versions of what Jacobsen accomplished and making it go viral. You'll understand what I mean once you've consumed her book. It completely changed how I think about the subject.

Comments

More from the author

157

Relationship between EA Community and AI safety

Tom Barnes🔸·2y ago·1m read

[Linkpost] Michael Nielsen remarks on 'Oppenheimer'

Tom Barnes🔸·2y ago·3m read

124

Canva CEO commits at least $6 billion “to do the most good”

Tom Barnes🔸·4y ago·1m read

Curated and popular this week

Was Partisanship Good for the Environmental Movement?

Jeffrey Heninger·2y ago·Curated 2d ago·6m read

This is the third in a sequence of posts taken from my recent report: Why Did Environmentalism Become Partisan? Summary Rising partisanship did not make environmentalism more popular or politically effective. Instead, it saw flat or falling overall public opinion, fewer major legislative achievements, and fluctuating executive actions. Public Opinion...

124

Possible mistake EAs are making and shout out to Pause AI UK

Michelle_Hutchinson·4d ago·4m read

I think right now EAs might be making a significant mistake by paying insufficient attention to the political realm. As EAs we tend to figure out what’s most impactful for us to work on and focus hard. That’s great! But there are various actions that are ‘non-delegatable’ - the extent to which an individual can do the action is limited (like voting, going to a protest, making hard money contributions to particular campaigns). It might be useful if we were all more in the habit of doing variou...

102

New Video from AI in Context: The Fall and Rise of Sam Altman

ChanaMessinger, phoebe b, Aric Floyd·6d ago·3m read

New Video from AI in Context: The Fall and Rise of Sam Altman If you want to skip straight to the video, here it is! AI in Context is excited to be back with our fourth video! For those just hearing from us, we make videos for 80,000 Hours, telling stories about transformative AI...

Recent opportunities to take action

124

Possible mistake EAs are making and shout out to Pause AI UK

Michelle_Hutchinson·4d ago·4m read

Build a flourishing EA group at the University of Toronto

Joseph Kostousov, Sophia Wan (navarhontes)·6d ago·1m read

How Africa Can (and Must) Skip the 30-Year Animal Welfare Evolution

Jacob Ayang, Cheslyn Ceaser·1w ago·12m read

^{^}

I do not think the existence of emergent capabilities implies that general AI systems are getting more capable in a discontinuous fashion, as emergent capabilities are generally quite narrow abilities, rather than constituting an average competence level of AI systems. On broad measures of intelligence, such as the MMLU, AI systems appear to be developing more incrementally. And moreover, many apparently emergent capabilities are merely artifacts of the way we measure them, and therefore do not reflect underlying discontinuities in latent abilities.

Matthew_Barnett

From the full report,

Even if power-seeking APS systems are deployed, it’s not obvious that they would permanently disempower humanity. We may be able to stop the system in its tracks (by either literally or metaphorically “pulling the plug”). First, we need to consider the mechanisms by which AI systems attempt to takeover (i.e. disempower) humanity. Second, we need to consider various risk factors for a successful takeover attempt.
Hacking computer systems....
Persuading, manipulating or coercing humans....
Gain broad social influence... For instance, AI systems might be able to engage in
electoral manipulation, steering voters towards policymakers less willing or able to
prevent AIs systems being integrated into other key places of power.
Gaining access to money... If misaligned systems are rolled out into financial markets, they may be able to siphon off money without human detection.
Developing advanced technologies... An AI system adept at the science, engineering and manufacturing of nanotechnology, along with access to the physical world, might be able to rapidly construct and deploy dangerous nanosystems, leading to a “gray goo” scenario described by Drexler (1986).

^{^}
I personally doubt that the instrumental convergence thesis is true as it pertains to "sufficiently intelligent" AIs which were not purposely trained to have open-ended goals. I do not expect, for example, that GPT-5 or GPT-6 will spontaneously develop a desire to acquire resources or preserve their own existence, unless they are subject to specific fine-tuning that would reinforce those impulses.

Funding Recommendations

Alongside this report, we are sharing some of our latest recommended high-impact funding opportunities: The Centre for Long-Term Resilience, the Institute for Law and AI, the Effective Institutions Project and FAR AI are four promising organizations we have recently evaluated and recommend for more funding, covering our four respective focus areas. We are in the process of evaluating more organizations, and hope to release further recommendations.

Furthermore, the Founders Pledge’s Global Catastrophic Risks Fund supports critical work on these issues. If you would like to make progress on a range of catastrophic risks - including from advanced AI - then please consider donating to the Fund!

About Founders Pledge

Founders Pledge is a global non-profit empowering entrepreneurs to do the most good possible with their charitable giving. We equip members with everything needed to maximize their impact, from evidence-led research and advice on the world’s most pressing problems, to comprehensive infrastructure for global grant-making, alongside opportunities to learn and connect. To date, they have pledged over $10 billion to charity and donated more than $950 million. We’re grateful to be funded by our members and other generous donors. founderspledge.com