All of Stephen McAleese's Comments + Replies

I wrote a blog post in 2022 (1.5 years ago) estimating that there were about 400 people working on technical AI safety and AI governance.

In the same post, I also created a mathematical model which said that the number of technical AI safety researchers was increasing by 28% per year.

Using this model for all AI safety researchers, we can estimate that there are now  people working on AI safety.

I personally suspect that the number of people working on AI safety in academia has grown faster than the number of people in new EA orgs so ... (read more)

One argument for continued technological progress is that our current civilization is not particularly stable or sustainable. One of the lessons from history is that seemingly stable empires such as the Roman or Chinese empires eventually collapse after a few hundred years. If there isn't more technological progress so that our civilization reaches a stable and sustainable state, I think our current civilization will eventually collapse because of climate change, nuclear war resource exhaustion, political extremism, or some other cause.

1
Hayven Frienby
3mo
I agree that our civilization is unstable, and climate change, nuclear war, and resource exhaustion are certainly important risks to be considered and mitigated. With that said, societal collapse—while certainly bad—is not extinction. Resource exhaustion and nuclear war won’t drive us to extinction, and even climate change would have a hard time killing us all (in the absence of other catastrophes, which is certainly not guaranteed). Humans have recovered from societal collapses several times in the past, so you would have to make some argument as to why this couldn’t happen again should the same thing happen to Western techno-capitalist society. As an example, if the P(collapse) given AGI is never achieved is 1, it would still be a preferable outcome to pursue versus creating AGI with P(extinction) of > .05 (this probability as cited in the recent AI expert survey). I’m willing to accept a very high level of s-risk(s) to avoid an x-risk with a sufficiently high probability of occurrence, because extinction would be a uniquely tragic event.

Thanks for the writeup. I like how it's honest and covers all aspects of your experience. I think a key takeaway is that there is no obvious fixed plan or recipe for working on AI safety and instead, you just have to try things and learn as you go along. Without these kinds of accounts, I think there's a risk of survivorship bias and positive selection effects where you see a nice paper or post published and you don't get to see experiments that have failed and other stuff that has gone wrong.

7
Jay Bailey
3mo
This is exactly right, and the main reason I wrote this up in the first place. I wanted this to serve as a data point for people to be able to say "Okay, things have gone a little off the rails, but things aren't yet worse than they were for Jay, so we're still probably okay." Note that it is good to have a plan for when you should give up on the field, too - it should just allow for some resilience and failures baked in. My plan was loosely "If I can't get a job in the field, and I fail to get funded twice, I will leave the field".  Also contributing to positive selection effects is that you're more likely to see the more impressive results in the field, because they're more impressive. That gives your brain a skewed idea of what the median person in the field is doing. Our brain thinks "Average piece of alignment research we see" is "Average output of alignment researchers". The counterargument to this is "Well, shouldn't we be aiming for better than median? Shouldn't these impressive pieces be our targets to reach?" I think so, yes, but I believe in incremental ambition as well - if one is below-average in the field, aiming to be median first, then good, then top-tier rather than trying to immediately be top-tier seems to me a reasonable approach.

I'm sad to hear that AISC is lacking in funding and somewhat surprised given that it's one of the most visible and well-known AI safety programs. Have you tried applying for grant money from Open Philanthropy since it's the largest AI safety grant-maker?

6
Linda Linsefors
3mo
My current understanding is that OpenPhil is very unlikely to give us money. 
1
Remmelt
3mo
Thanks, we’ll give it a go. Linda is working on sending something in for the “Request for proposals for projects to grow our capacity for reducing global catastrophic risks” Note though AISC does not really fit OpenPhil’s grant programs because we are not affiliated with a university and because we don’t select heavily on our own conceptions of who are “highly promising young people”.

"In brief, the book [Superintelligence] mostly assumed we will manually program a set of values into an AGI, and argued that since human values are complex, our value specification will likely be wrong, and will cause a catastrophe when optimized by a superintelligence"

Superintelligence describes exploiting hard-coded goals as one failure mode which we would probably now call specification gaming. But the book is quite comprehensive, other failure modes are described and I think the book is still relevant.

For example, the book describes what we would ... (read more)

Some information not included in the original post:

  • In April 2023, the UK government announced £100m in initial funding for a new AI Safety Taskforce.
  • In June 2023, UKRI awarded £31m to the University of Southhampton to create a new responsible and trustworthy AI consortium named Responsible AI UK.

I think work on near-term issues like unemployment, bias, fairness and misinformation is highly valuable and the book The Alignment Problem does a good job of describing a variety of these kinds of risks. However, since these issues are generally more visible and near-term, I expect them to be relatively less neglected than long-term risks such as existential risk. The other factor is importance or impact. I believe the possibility of existential risk greatly outweighs the importance of other possible effects of AI though this view is partially conditional... (read more)

Good question. I haven't done much research on this but a paper named Understanding AI alignment research: A Systematic Analysis found that the rate of new Alignment Forum and arXiv preprints grew from less than 20 per year in 2017 to over 400 per year in 2022. However, the number of Alignment Forum posts has grown much faster than the number of arXiv preprints. 

The Superalignment team currently has about 20 people according to Jan Leike. Previously I think the scalable alignment team was much smaller and probably only 5-10 people.

1
Tristan Williams
6mo
Good update, thanks for sourcing! 

At OpenAI, I'm pretty sure there are far more people working on near-term problems that long-term risks. Though the Superalignment team now has over 20 people from what I've heard.

1
Tristan Williams
6mo
Oh really? I thought it was far smaller, like in the range of 5-10. 

Thanks for the post. It was an interesting read.

According to The Case For Strong Longtermism, 10^36 people could ultimately inhabit the Milky Way. Under this assumption, one micro-doom is equal to 10^30 expected lives.

If a 50%-percentile AI safety researcher reduces x-risk by 31 micro-dooms, they could save about 10^31 expected lives during their career or about 10^29 expected lives per year of research. If the value of their research is spread out evenly across their entire career, then each second of AI safety research could be worth about 10^22 expected... (read more)

5
Nikola
6mo
  In case of EV calculations where the future is part of the equation, I think using microdooms as a measure of impact is pretty practical and can resolve some of the problems inherent with dealing with enormous numbers, because many people have cruxes which are downstream of microdooms. Some think there'll be 10^40 people, some think there'll be 10^20. Usually, if two people disagree on how valuable the long-term future is, they don't have a common unit of measurement for what to do today. But if they both use microdooms, they can compare things 1:1 in terms of their effect on the future, without having to flesh out all of the post-agi cruxes.

Thanks for pointing this out. I didn't know there was a way to calculate the exponentially moving average (EMA) using NumPy.

Previously I was using alpha = 0.33 for weighting the current value. When that value is plugged into the formula alpha = 2 / N + 1, it means I was averaging over the past 5 years.

I've now decided to average over the past 4 years so the new alpha value is 0.4.

I recommend this web page for a narrative on what's happening in our world in the 21st century. It covers many themes such as the rise of the internet, the financial crisis, covid, global warming, AI and demographic decline.

Thanks for the post. Until now, I used to learn about what LTFF funds by manually reading through its grants database. It's helpful to know what the funding bar looks like and how it would change with additional funding.

I think increased transparency is helpful because it's valuable for people to have some idea of how likely their applications are to be funded if they're thinking of making major life decisions (e.g. relocating) based on them. More transparency is also valuable for funders who want to know how their money would be used.

According to Price's Law, the square root of the number of contributors contributes half of the progress. If there are 400 people working on AI safety full-time then it's quite possible that just 20 highly productive researchers are making half the contributions to AI safety research. I expect this power law to apply to both the quantity and the quality of research.

I'm surprised that GPT-4 can't play tic tac toe given that there's evidence that it can play chess pretty well (though it eventually makes illegal moves).

1
Charlie_Guthmann
8mo
Curious how it would do on chess 960.

Thanks for spotting that. I updated the post.

I like the AI Alignment Wikipedia page because it provides an overview of the field that's well-written, informative, and comprehensive.

Excellent story! I believe there's strong demand for scenarios explaining how current AI systems could go on to have a catastrophic effect on the world and the story you described sounds very plausible.

I like how the story combines several key AI safety concepts such as instrumental convergence and deceptive alignment with a description of the internal dynamics of the company and its interaction with the outside world. 

AI risk has been criticized as implausible given the current state of AI (e.g. chatbots) but your realistic story describes how AI in its present form could eventually cause a catastrophe if it's not developed safely.

1
Karl von Wendt
9mo
Thank you! I hope this story will have a similar effect outside of the EA community.

Thanks for writing the post.

I know the sequence is about criticisms of labs but I personally think I would get more value if the post focused mainly on describing what the lab is doing with less about evaluating the organization because I think that the reader can form their own opinion themselves given an informative description. To use more technical language, I would be more interested in a descriptive post than a normative one.

My high-level opinion is that the post is somewhat more negative than I would like. My general sentiment on Conjecture is that ... (read more)

I think only person can do this every year because any other 0-word post would be a duplicate.

Great post. What I find most surprising is how small the scalable alignment team at OpenAI is. Though similar teams in DeepMind and Anthropic are probably bigger.

I added them to the list of technical research organizations. Sorry for the delay.

Inspiring progress! This post is a positive update for me.

Good point. It's important to note that black swans are subjective and depend on the person. For example, a Christmas turkey's slaughter is a black swan for it but not for its butcher.

I disagree because I think these kinds of post hoc explanations are invalidated by the hindsight fallacy. I think the FTX crash was a typical black swan because it seems much more foreseeable in retrospect than it was before the event.

To use another example, the 2008 financial crisis made sense in retrospect, but the Big Short movie shows that, before the event, even the characters shorting the mortgage bonds had strong doubts about whether they were right and most other people were completely oblivious.

Although the FTX crisis makes sense in retrospect, I have to admit that I had absolutely no idea that it was about to happen before the event.

4
Jason
1y
By definition, a black swan event is highly improbable based on foresight. I don't think this qualifies -- it was generally known that crypto ventures have a meaningful risk of collapsing and that a significant number of those collapses are tinted with fraud or misconduct. Many people, including myself, were wary enough of the crypto sector to stay out entirely for this and other reasons. I do not think I am a superforecaster or anything. As an example, the possibility of my wife and I both dying before our son grows up is quite low but is not unforeseeable. I certainly don't expect the train we are on right now to crash . . . but our possible death is not an "unknown unknown." Nor was FTX's. The specific risk of FTX exploding could have been specifically prepared for, and I think framing it as a black swan event obscures that.

Thanks! I used that format because it was easy for me to write. I'm glad to see that it improves the reading experience too.

I really like this post and I think it's now my favorite post so far on the recent collapse of FTX.

Many recent posts on this subject have focused on topics such as Sam Bankman Fried's character, what happened at FTX and how it reflects on EA as a whole.

While these are interesting subjects, I got the impression that a lot of the posts were too backward-looking and not constructive enough.

I was looking for a post that was more reflective and less sensational and focused on what we can learn from the experience and how we can adjust the strategy of EA going forward and I think this post meets these criteria better than most of the previous posts.

2
Jack Lewars
1y
Thanks Stephen, I really appreciate this feedback

This reminds of Nick Bostrom's story, "The Fable of the Dragon-Tyrant". Maybe somebody will write a story like this about ageing instead of smallpox in the future.

I think microgrants are a great idea! Because they're small, you can make lots of investments to different people with relatively little risk and cost.

1
Ben Yeoh
1y
Thanks for the suppport!

One way of doing automated AI safety research is for AI safety researchers to create AI safety ideas on aisafetyideas.com and then use the titles as prompts for a language model. Here is GPT-3 generating a response to one of the ideas:

1
Esben Kran
1y
Uuh, interesting! Maybe I'll do that as a weekend project for fun. An automatic comment based on the whole idea as a prompt.

This question would have been way easier if just I estimated the number of AI safety researchers in my city (1?) instead of the whole world.

Here is a model that involves taking thousands of trials of the product of six variables randomly set between 10% and 90% (e.g. 0.5^6 = 0.015 = 1.5%).

As other people have noted, conjunctive models tend to produce low probabilities (<5%).

7
Froolow
1y
Thanks, this is really interesting - in hindsight I should have included something like this when describing the SDO mechanism, because it illustrates it really nicely. Just to follow up on a comment I made somewhere else, the concept of a 'conjunctive model' is something I've not seen before and implies a sort of ontology of models which I haven't seen in the literature. A reasonable definition of a model is that it is supposed to reflect an underlying reality, and this will sometimes involve multiplying probabilities and sometimes involve adding two different sources of probabilities.  I'm not an expert in AI Risk so I don't have much of a horse in this race, but I do note that if the one published model of AI Risk is highly 'conjunctive' / describes a reality where many things need to occur in order for AI Catastrophe to occur then the correct response from the 'disjunctive' side is to publish their own model, not argue that conjunctive models are inherently biased - in a sense 'bias' is the wrong term to use here because the case for the disjunctive side is that the conjunctive model accurately describes a reality which is not our own.  (I'm not suggesting you don't know this, just that your comment assumes a bit of background knowledge from the reader I thought could potentially be misinterpreted!)

Great post. This is possibly the best explanation of the relationship between capabilities and safety I've seen so far.

The whole section on Price's Law has been replaced with a section on Lotka's Law.

Great talk. I think it breaks down the problem of AI alignment well. It also reminds me of the more recent breakdown by Dan Hendryks which decomposes ML safety into three problems: robustness, monitoring and alignment.

I've noticed that a lot of good ideas seem to come from talks. For example, Richard Hamming's famous talk on working on important problems. Maybe there should be more of them.

Thanks, I think you're right. I'll have to edit that section.

I went through all the authors from the Alignment Forum from the past ~6 months, manually researched each person and came up with a new estimate named  'Other' of about 80 people which includes independent researchers, other people in academia and people in programs such as SERI MATS.

More edits:
- DeepMind: 5 -> 10.
- OpenAI: 5 -> 10.
- Moved GoodAI from the non-technical to technical table.
- Added technical research organization: Algorithmic Alignment Group (MIT): 4-7.
- Merged 'other' and 'independent researchers' into one group named 'other' with new manually created (accurate) estimate.

Great point. The decline of religion has arguably left a cultural vacuum that new organizations can fill.

0[comment deleted]1y

Edit: updated OpenAI from 5 to 10.

From their website, AI Impacts currently has 2 researchers and 2 support staff (the current total estimate is 3).

The current estimate for Epoch is 4 which is similar to most estimates here.

I'm trying to come up with a more accurate estimate for independent researchers and 'Other' researchers.

New estimates:

CSER: 2-5-10 -> 2-3-7
FLI: 5-5-20 -> 3-4-6
Levelhume:  2-5-15 -> 3-4-10

Edit: added Rethink Priorities to the list of non-technical organizations.

I re-estimated the number of researchers in these organizations and the edits are shown in the 'EDITS' comment below.

Copied from the EDITS comment:

- CSER: 5-5-10 -> 2-5-15
- FLI: 5-5-20 -> 3-5-15
- Levelhume Centre: 5-10-70 (Low confidence) -> 2-5-15 (Medium confidence)

My counts for CSER:

- full-time researchers: 3
- research affiliates: 4

FLI: counted 5 people working on AI policy and governance.

 Levelhume Centre:

- 7 senior research fellows
- 14 research fellows

Many of them work at other organizations. I think 5 is a good conservative estimate.

New... (read more)

3
Mau
1y
[Edit: I think the following no longer makes sense because the comment it's responding to was edited to add explanations, or maybe I had just missed those explanations in my first reading. See my other response instead.] Thanks for this. I don't see how the new estimates incorporate the above information. (The medians for CSER, Leverhulme, and FLI seem to still be at 5 each.) (Sorry for being a stickler here--I think it's important that readers get accurate info on how many people are working on these problems.)

I re-estimated counts for many of the non-technical organizations and here are my conclusions:

  • I didn't change the CSET estimate (10) because there seems to be a core group of about 5 researchers there and many others (20-30). Their productivity also seems to be high: I counted over 20 publications so far this year though it seems like only about half of them are related to AI governance (list of publications).
  • I deleted BERI and SERI from the list because they don't seem to have any full-time researchers.
  • Epoch:  decreased estimate from 10 to 4.
  • Good AI
... (read more)
3
Mau
1y
Thanks for the updates! I have it on good word that CSET has well under 10 safety-focused researchers, but fair enough if you don't want to take an internet stranger's word for things. I'd encourage you to also re-estimate the counts for CSER, Leverhulme, and the Future of Life Institute. * CSER's list of team members related to AI lists many affiliates, advisors, and co-founders but only ~3 research staff. * The Future of Life Institute seems more focused on policy and field-building than on research; they don't even have a research section on their website. Their team page lists ~2 people as researchers. * Of the 5 people listed in Leverhulme's relevant page, one of them was already counted for CSER, and another one doesn't seem safety-focused. I also think the number of "Other" is more like 4.

Edits based on feedback from LessWrong and the EA Forum:

EDITS:
- Added new 'Definitions' section to introduction to explain definitions such as 'AI safety', 'researcher' and the difference between technical and non-technical research.

UPDATED ESTIMATES  (lower bound, estimate, upper bound):

TECHNICAL
- CHAI: 10-30-60 -> 5-25-50
- FHI: 10-10-40 -> 5-10-30
- MIRI: 10-15-30 -> 5-10-20

NON-TECHNICAL
- CSER: 5-5-10 -> 2-5-15
- Delete BERI from the list of non-technical research organizations
- Delete SERI from the list of non-technical research organizat... (read more)

2
Zach Stein-Perlman
1y
More minor suggestions: * OpenAI non-technical: there are more than 5. * AI Impacts non-technical: there are exactly 5. * I  would have said Epoch is at least 5 FTEs (disagreeing with Mauricio). * Better estimating the number of independent technical researchers seems pretty important and tractable. (Edit: also many people on the CHAI website don't actually do AI safety research, but definitely more than 5 do.)

Thanks for the information! Your estimate seems more accurate than mine.

In the case of Epoch, I would count every part-time employee as roughly half a full-time employee to avoid underestimating their productivity.

It's true that the universe B might never fully catch up because 99% of a single generation was lost. But over 1 billion years, we would expect about 40 million generations to live. Even if a few generations were lost, if there is a recovery the total loss won't be high.

1
Pablo Rosado
2y
Whether and to what extent the survivors could catch up with the counterfactual universe strongly depends on the boundary conditions. Universe A could have expanded to other planets by the time B fully recovers. We are comparing the potential of a full, and fully developed humanity with a small post-apocalyptic fraction of humanity. I agree with you that planet boundaries (and other physical constraints) could reduce the potential of a random 1% in A with respect to B. But I suppose it can also go the other way: The survivors in B could produce less humans than any 1% of A, and keep this trend for many (even all) future generations. My intuition here is very limited.

"The catastrophe that takes place in scenario B removes 99% of all humans alive, which in turn removes around 99% of all humans that could have lived at the end of time."

That would only happen if the population never recovered. But since I would expect the world to rapidly repopulate, I therefore would expect the long-term difference to be insignificant.

1
Pablo Rosado
2y
The survivors in B would eventually catch up with the living population of the world today, yes. However, the survivors in B would never catch up with the cumulative population of the universe where there was no catastrophe. While the survivors in B were recovering, the counterfactual universe has been creating more humans (as well as new pieces of art, scientific discoveries, etc.). It is impossible for B to catch up, regardless of how much you wait. All the human potential of the 99% who died in the catastrophe is lost forever.

I agree that the total number of humans who will ever live at the end of time is similar in A and B. Therefore I think there is almost no difference between A and B in the long term.

1
Pablo Rosado
2y
The number of humans who will ever live is similar in scenarios A and B. But keep in mind that in scenario A we have randomly picked only 1% of all existing humans. The catastrophe that takes place in scenario B removes 99% of all humans alive, which in turn removes around 99% of all humans that could have lived at the end of time. That is an enormous difference in the long term. And that is the main point of that section: Saving lives now has an enormous impact in the long term.
Load more