18 karmaJoined Apr 2023



"for superhuman rogue AIs to be catastrophic for humanity, they need to not only be catastrophic for 2023_Humanity but also for humanity even after we also have the assistance of superhuman or near-superhuman AIs." 

This is a very interesting argument, and definitely worthy of discussion. I realise you have only sketched your argument here, so I won't try to poke holes in it. 

Briefly, I see two objections that need to be addressed:

1. One fear is that the rogue AIs may well be released on 2023_Humanity or a version very close to that due to the exponential capability growth we could see if we create an AI that is able to develop better AI itself. Net, it may be enough that it would be catastrophic for 2023_Humanity. 

2. The challenge of developing aligned superhuman AIs which would defend us against rogue AIs while themselves offering no threat is not trivial, and I'm not sure how many major labs are working on that right now, or if they can even write a clear problem-statement about what such an AI system should be.
From first principles, the concern is that this AI would necessarily be more limited (it needs to be aligned and safe) than a potential rogue AI, so why should we believe we could develop such an AI faster and enable it to stay ahead of potential rogue AIs? 

Far from disagreeing with your comment, I'm just thinking about how it would work and what tangible steps need to be taken to create the kind of well-aligned AIs which could protect humanity. 

Answer by Denis May 25, 2023-2-1

This is a timely post. It feels like funding is a critical obstacle for many organisations. 

One idea: Given the recent calls by many tech industry leaders for rapid work on AI governance, is there an opportunity to request direct funding from them for independent work in this area. 

To be very specific: Has someone contacted OpenAI and said: "Hey, we read with great interest your recent article about the need for governance of superintelligence. We have some very specific work (list specific items)  in that area which we believe can contribute to making this happen. But we're massively understaffed and underfunded. With $1m from you, we could put 10 researchers working on these questions for 1 year. Would you be willing to fund this work?"

What's in it for them? Two things:

  1. If they are sincere (as I believe they are), then they will want this work to happen, and some groups in the EA sphere are probably better placed to make it happen than they themselves are.
  2. We can offer independence (any results will be from the EA group, not from OpenAI and not influenced or edited by OpenAI) but at the same time we can openly credit them with funding this work, which would be good PR and a show of good faith on their part. 

Forgive me if this is something that everyone is already doing all the time! I'm still quite new to EA! 


IMHO this is quite an accurate and helpful statement, not a euphemism. I offer this perspective as someone who has worked many years in a corporate research environment - actually, in one of the best corporate research environments out there. 

There are three threads to the comment:

  1. Even before we reach AGI, it is very realistic to expect AI to become stronger than humans in many specific domains. Today, we have that in very narrow domains, like Chess / Go / Protein-folding. These domains will broaden. For example, a lot of chemistry research these days is done with simulations, which car just confirmed by experiment. An AI managing such a system could develop better chemicals, eventually better drugs, more efficiently than humans. This will happen, if it hasn't happened already. 
  2. One domain which is particularly susceptible to this kind of advance is IT, and so it's reasonable to assume that AI systems will get very good at IT very quickly - which can quickly lead to a point where AI is working on improving AI, leading to exponential progress (in the literal sense of the word "exponential") relative to what humans can do.
  3. Once AI has a given capability, its capacity is far less limited than humans. Humans need to be educated, to undergo 4-year university courses and PhD's just to become competent researchers in their chosen field. With AI, you just copy the software 100 times and you have 100 researchers who can work 24/7, who never forget any data, who can instantaneously keep abreast of all the progress in their field, who will never fall victim to internal politics or "not invented here" mentality, but will collaborate perfectly and flawlessly. 

Put all that together, and it's logical that once we have an AI that can do a specific domain task as well as a human (e.g. design and interpret simulated research into potentially interesting candidate molecules for drugs to fight a given disease), it is almost a no-brainer to reach the point where a corporation could use AI to massively accelerate their progress. 

As AI gets closer to AGI, the domains in which AI can work independently will grow, the need for human involvement will decrease and the pace of innovation will grow. Yes, there will be some limits, like physical testing, where AI will still need humans, but even there robots already do much of the work, so human involvement is decreasing every day.

it's also important to consider who was saying this: OpenAI. So their message was NOT that AI is bad. What they wanted us to take away was that AI has huge potential for good - like the way it can accelerate the development of medical cures, for example - BUT that it is moving forward so fast and most people do not realise how fast this can happen, and so we (in the know) need to keep pushing the regulators (mostly not experts) to regulate this while we still can. 


Just reading this now. I love the approach and especially the tangibility of it - identifying specific institutions, eventually developing specific plans to influence them in specific ways. 

Of course there are people spending billions of dollars to influence some of these institutions, but that doesn't mean that a small, well-organised group cannot have an important impact with a well-designed, focused campaign. 

My first reaction to the list itself is that it feels quite narrow (government institutions and IT companies). I wonder if this reflects reality or is just a result of the necessary limitations of a first iteration with quite limited resources. Nobody could question that every institution on this list is an important, influential institution, and if you manage to influence any of them in a positive way, that will be a great success. 

My point is more that there seem to be whole classes of organisations and institutions with huge impact which are not represented at all in the list. And while one could perhaps argue that any one of the institutions listed is more important than any one of the institutions I suggest below, I'm not sure that necessarily means that the most effective way forward is to focus on just the types of institutions listed, vs. looking at what potential there is in a much broader group - especially focusing on the question of influenceability (on which I'd suggest several of the institutions listed above might score quite poorly). 

Here are a few of the categories I would have expected to see in the list, but didn't: 

  • Oil companies, whose decisions are critical to the climate-crisis. Also OPEC, Gazprom and others. 
  • Pharma companies who have a critical role to play in global health - for example their decisions about enforcing patents in developing countries or allowing low-cost copies. 
  • FMCG Companies (Procter & Gamble, Unilever, ...) whose products are used by billions of people every single day, so anything they decide about health, or waste, or water-quality impacts so many people across the globe immediately. They also directly and indirectly (including supply-chain and distribution and sales) employ vast numbers of people all around the world, and their employment practices and policies directly impact many millions. 
  • Weapons manufacturers. Probably not realistic to hope they all just decide to stop making weapons, but they could do a lot by just not selling weapons so freely to authoritarian and inhumane regimes. 
  • Scientific bodies. So much of the world is driven by technology, which isn't just IT. It's true that bodies like governments and the EU do influence scientific research, but there may be many much-more-easily-influenced bodies which can also impact this. Maybe something as simple as requesting a speaking slot at an annual conference in which we'd highlight the gaps and opportunities could help create a new body of relevant research and development. 
  • Etc.

I'm not criticising the work - I think it's fantastic! - but rather wondering if it would be worth someone doing a second iteration and looking at some of these bodies which may not have been so obvious. All the ones I've listed above just reflect my personal expertise and experience as a chemical engineer, so probably is also far too narrow. Would be interesting to see what ideas we might get from a group of others, maybe historians, geologists, teachers, lawyers, biologists, astronomers, ... 

Really looking forward to following this and seeing where it leads! 


Thanks Jessica and Sean for this powerful and inspiring post. 

I would add one more, perhaps even more important, way in which engineers can contribute here - which is more or less what you have just done! 

As engineers, you probably don't even consciously realise this anymore, but the type of analysis you've shared here is pure engineering, it's a way of thinking that we get so drilled into us that we don't realise we didn't always think that way. 

For example, the three layer model (prevention, response, resilience) is almost perfectly analogous to how chemical engineers study explosion safety when handling solvents. First, you ask how to prevent an explosion (safe procedures, no ignition sources, nitrogen-blanketing, ...). Second, you ask how to minimise the risk of harm if there is an explosion (safe enclosures, PPE, minimum people present, etc. ...), and third, you study how to minimise the extent of harm (fire-evacuation procedures, emergency-help, first-aid training, ...). 

From this perspective, I'd also add one comment, which is that the assumption that each reduction of 50% in one means to total reduction of 50% is only true if they are independent. One of the most difficult challenges in a safety analysis is to identify cases where one accident can break two barriers at once. The stereotypical example of this (from my youth) was the nuclear war scenario in which the electromagnetic wave from the explosion destroyed the communication structure and so a lot of the response capability. We know about that now and can design around it, but there are probably other factors, like the viral infection that makes it impossible for the vaccine designers to do their work. 

I look forward to seeing more and more engineers working on civilisation resilience! Thanks for getting the ball rolling with Hi-Eng! 


Really great discussion. How can we get this kind of information out into the general population? 

IMHO the biggest challenge we face is convincing people that the default outcome, if we do nothing, is more likely to be that we get an AI which is much more powerful than humans. Tom and Luisa, you do a great job of making this case. If someone disagrees, it is up to them to demonstrate where is the flaw in the logic. 

I think we face three critical challenges in getting people to act on this as urgently as we need to:

  1. Our brains have evolved to learn from experience. As with climate change and even nuclear war, we have a deep-seated belief that things will change only gradually, and especially, that they will not change dramatically in negative ways, because in our lifetimes, that is what we've always experienced - even the most scary crises have tended to work out OK*, or where they haven't (e.g. climate change), we've somehow convinced ourselves that they have. And this belief is OK, until suddenly it's not, and then it's too late. 
  2. Most people know the word "exponential" but don't really understand what it means. The exponential growth you describe here, where each generation of AI can develop a new generation of AI that is X% better and faster, is just beyond most of our experience. It reminded me of chemistry experiments with acid-base titrations and the virtual impossibility of titrating to exactly pH 7 with strong acids and bases. If we imagine human-level AGI being at pH 7, we might feel comfortable that the pH has grown from 1.1 to 1.2 to 1.3 to 1.5 to 1.8, but not be aware that the next drop of NaOH will make it 10.4. 
  3. Our brains have evolved to live in denial. This is a vital and mostly positive trait. For example, we all know that we're going to die, but we put this out of our mind almost all of the time. When faced with something really frightening (and even out-of-control AI isn't quite as scary as death), we're able to put it to the back of our minds. Not consciously deny it, not intellectually deny it, but rather accept that it is logically true, and then ignore it and act like it wasn't true. Of course, it helps that there will always be those who will take advantage of this and make (usually flawed) arguments that the concerns are unwarranted, which then give us plausible deniability and make us feel even less worried. 

All this means that most of us (I include myself in this) read this article, fully accept that Tom's arguments are compelling, realise that we absolutely must do something, but somehow do not rush out and storm the parliament demanding immediate action. Instead, we go on to the next item on our to-do list, maybe laundry or grocery shopping ... I'm really determined to figure out a way to overcome this inertia. 


*Obviously this is true for those of us in the current generation, in the West. I'm sure those who lived through world wars or famines or national wars, even those today in Syria or Ukraine or Sudan, will have a better understanding of how things can suddenly go wrong. But most of the people taking the decisions about AI have never experienced anything like that. 


Great article!

The analogy to the economy at the end is wonderful. A lot of us don't realise how badly the economy works. But it's easy to see by just thinking about AI and what's happening right now. People are speculating that AI might one day do as much as 50% of the work now done by humans. A naive outsider might expect us to be celebrating in the streets and introducing a 3-day work-week for everyone. But instead, because our economy works the way it does, with almost all of most people's income directly tied to their "jobs", the reaction is mostly fear that it will eliminate jobs and leave people without any income. 

I'm guessing that the vast majority of people would love to move to a condition (which AI could enable) where everyone works only 50% as much but we keep the same benefits. But there is no realistic way to get there with our economy, at least not quickly. Even if we know what we want to achieve, we just cannot overcome all the barriers and Nash equilibria and individual interests. We understand the principles of each different part of the economy, but the whole picture is just far too complex for anyone to understand or for us, even with total collaboration, to manipulate effectively. 

I'm sure that if we were trying to design the economy from scratch, we would not want to create a system in which a hedge-fund manager can earn 1000 times as much as a teacher, for example. But that's what we have created. If we cannot control the incentives for humans within a system that we fundamentally understand, how well can we control the incentives for an AI system working in ways that we don't understand? 

It's worrying. And yet, AI can do so much good in so many ways for so many people, we have to find the right way forward. 


This is a great post. 

The ideal job is the one where you're doing things you love to do anyway, and getting paid for it. But it's hard to know in advance what that will be. So I would encourage people to test the waters a bit too. 

For example, lab-research is exciting, but for each 3 hour experiment you might spend a week preparing, doing risk-analyses, ordering materials, running QA checks, requesting budget, booking lab-space. And you may end up in a lab where you need to spend 15 minutes every time you enter or exit the lab just putting on and taking off protective clothing. Some people thrive in this environment, they love the details and the precision and the perfectionism of getting everything just right. 

Likewise, doing literature research and learning about the leading edge of the field is fascinating. But are you sure you're the kind of person who will look forward to having a 40-page technical document in highly concise, technical language to read every morning and every afternoon? 

Being involved in policy-setting feels like an incredibly important role - and it is. But it also requires keeping your own ego and opinions very much in check, being able to push enthusiastically for incremental gains, being able to compromise strategically, being willing to accept things that you don't like, listening respectfully to opinions you find repulsive and so on. (for example: imagine you're negotiating with China about cutting methane emissions and they say "we can agree to your proposal, but in return, we want you to endorse our one-China policy on Taiwan" ).
The people who are good at this will talk about the times they succeeded, but you need to be aware of how much effort and how many failed efforts they have to deal with. It requires huge degrees of resilience and grit. It is not for everyone. 


Really like this post. Simple, clear and very provocative. It would be great to see it shared more widely. 

If we could get people to ask themselves "which camp do I belong to?" and then to act accordingly ... 

Most of us look back at history and assume we would have been the exceptions - the people opposing slavery, protecting Jews, giving to the poor, etc. But the reality of our actions today (my own absolutely included) belies this for most of us. 

Your post is a timely reminder for us to ask ourselves some questions. 


There is a corporate motto: "10% of decisions need to be right. 90% of decisions just need to be taken!" which resonates perfectly with this post. 

To put this in an EA context - if you're unsure which of two initiatives to work on, that probably means that (to the best of your available knowledge) they are likely to have similar impacts. So, in the grand scheme of things, it probably doesn't matter which you choose. But the time you spend deciding is time that you are NOT dedicating to either of the initiatives. 

However, this is a good rule-of-thumb, but you need to be wary of exceptions. There are those 10% of cases where your decision matters a lot. In my case, as a chemical engineer, decisions about safety would typically be in that 10%. In an EA context, maybe it's decisions where you really are not sure if a particular initiative might be doing more harm than good which fall into this 10%. 

How to decide whether you can already take a decision?

  1. Does any decision have potentially very bad consequences? Not just wasted time, but actual harm, or major investments wasted or whatever. 
  2. How much of a difference is there likely to be depending on which decision you take?
  3. What new information are you likely to get (and when) which could help you make a better decision? 
  4. Put the pros and cons on a sheet of paper and discuss with a friend or colleague. Often times, this exercise alone, even before you discuss, will enable you to make a decision. 
Load more