Complex Systems for AI Safety [Pragmatic AI Safety #3]

TW123; Dan H

Complex Systems for AI Safety [Pragmatic AI Safety #3]

TW123,

Comments 6

Sorted by

New & upvoted

Rory Greig

Thanks for writing this, in my opinion the field of complex systems provides a useful and under-explored perspective and set of tools for AI safety. I particularly like the insights you provide in the "Complex Systems for AI Safety" section, for example that ideas in complex systems foreshadowed inner alignment / mesa-optimisation.

I'd be interested in your thoughts on modelling AGI governance as a complex system, for example race dynamics.

I previously wrote a forum post on how complex systems and simulation could be a useful tool in EA for improving institutional decision making, among other things: https://forum.effectivealtruism.org/posts/kWsRthSf6DCaqTaLS/what-complexity-science-and-simulation-have-to-offer

Jay Bailey🔸

Possibly a newbie question: I noticed I was confused about the paragraph around deep learning vs. reinforcement learning.

"One example of obviously suboptimal resource allocation is that the AI safety community spent a very large fraction of its resources on reinforcement learning until relatively recently. While reinforcement learning might have seemed like the most promising area for progress towards AGI to a few of the initial safety researchers, this strategy meant that not many were working on deep learning."

I thought that reinforcement learning was a type of deep learning. My own understanding is that deep learning is any form of ML using multilayered neural networks, and that reinforcement learning today uses multilayered neural networks, and thus could be called "deep reinforcement learning", but is generally just RL for short. If that were true that would mean RL research was also DL research.

Am I misunderstanding some of the terminology?

TW123

The terminology around AI (AI, ML, DL, RL) is a bit confused sometimes. You're correct that deep reinforcement learning does indeed use deep neural nets, so it could be considered a part of deep learning. However, colloquially deep learning is often taken to mean the parts that aren't RL (so supervised, unsupervised, and self-supervised deep learning). RL is pretty qualitatively different from those in the way it is trained, so it makes sense that there would be a different term, but it can create confusion.

anonymous6

One way I think it is plausible to draw lines between RL/core DL is that post-AlphaGo a lot of people were very bullish on specifically deep networks + reinforcement learning. Part of the idea was that supervised learning required inordinately costly human labeling, whereas RL would be able to learn from cheap simulations and even improve itself online in the world. OpenAI was originally almost 100% RL-focused. That thread of research is far from dead but it has certainly not panned out the way people hoped at the time (e.g. OpenAI has shifted heavily away from RL).

Meanwhile non-RL deep learning methods, especially generative models that kind of sidestep the labeling issue, have seen spectacular success.

philgoetz

I was hoping for an essay about deliberately using nonlinear systems in constructing AI, because they can be more-stable than the most-stable linear systems if you know how to do a good stability analysis. This was instead an essay on using ideas about nonlinear systems to critique the AI safety research community. This is a good idea, but it would be very hard to apply non-linear methods to a social community. The closest thing I've seen to doing that was the epidemiological models used to predict the course of Covid-19.

The essay says, "The central lesson to take away from complex systems theory is that reductionism is not enough. It’s often tempting to break down a system into isolated events or components, and then try to analyze each part and then combine the results. This incorrectly assumes that separation does not distort the system’s properties." I hear this a lot, but it's wrong. It assumes that reductionism is linear--that you want to break a nonlinear system into isolated components, then relate them to each other with linear equations.

Reductionism can work on nonlinear systems if you use statistics, partial differential equations, and iteration. Epidemiological models and convergence proofs for neural networks are examples. Both use iteration, and may give only statistical claims, so you might still say "reductionism is not enough" if you want absolute certainty, e.g., strict upper bounds on distributions. But absolute certainty is only achievable in formal systems (unapplied math and logic), not in real life.

The above essay seems to me to be trying to use linear methods to understand a nonlinear system, decomposing it into separable heuristics and considerations to be attended to, such as the line-items in the flow charts and bulleted lists above. That was about the best you could do, given the goal of managing the AI safety community.

I'd really like to see you use your understanding of complex systems either to try to find some way of applying stability analysis to different AI architectures, or to study the philosophical foundations of AI safety as it exists today. The latter use assumptions of linearity, analytic solvability, distrust of noise and evolution, and a classical (i.e., ancient Greek) theory of how words work, which expects words to necessarily have coherent meanings, and for those meanings to have clear and stable boundaries, and requires high-level foundational assumptions because the words are at a high level of abstraction. This is all especially true of ideas that trace back to Yudkowsky. I think these can all be understood as stemming from over-simplifications required for linear analysis. They're certainly strongly correlated with it.

I dumped a rant that's mostly about the second issue (the metaphysics of the AI safety community today) onto this forum recently, here, which is a little more specific, though I fear perhaps still not specific enough to be better than saying nothing.

Artyom K

Thanks for publishing this and your research! Few discussion points:

1. It is unclear how can we apply the safety culture in our current highly competitive environment (Google vs Facebook, China vs USA). What concretely should be the incentives or policies to adopt a safety culture? And who enforces them? If one adopts it, another will get a competitive advantage as they will spend more on capabilities and then 'kill you' (Yudkowsky, AGI Ruin).

2. Extremely high stakes, i.e. x-risk. While systems theory was developed for dangerous, mission-critical systems, it didn’t deal with those systems that might disempower all humanity forever. We don’t have a second try. So no use of systems theory? It should be an iterative process, but misaligned AI would kill us in a first wrong try?

3. Systems Theory developed for systems built by humans for humans. And humans have a certain limited intelligence level. Why is it true that it is likely that Systems Theory is applicable for a intelligence above human one?

4. Systems Theory implies the control of a better entity on a worse entity: government issues policies to control organizations, AI lab stops researches on a dangerous path, electrician complies with instructions, etc. Now, isn’t AGI a better entity to give control to? Does it imply the humanity's dis-empowerment? Particularly, when we introduce a moral parliament (discussed in PAIS #4) won’t it mean that now this parliament is in power, not humanity?

Comments

philgoetz

Complex Systems for AI Safety [Pragmatic AI Safety #3]

Complex Systems for AI Safety [Pragmatic AI Safety #3]

A systems view of AI safety

Background: Complex Systems

Improving Contributing Factors

Complex Systems for AI Safety

Diversification

Conclusion

Resources on Complex Systems