One argument for continued technological progress is that our current civilization is not particularly stable or sustainable. One of the lessons from history is that seemingly stable empires such as the Roman or Chinese empires eventually collapse after a few hundred years. If there isn't more technological progress so that our civilization reaches a stable and sustainable state, I think our current civilization will eventually collapse because of climate change, nuclear war resource exhaustion, political extremism, or some other cause.
Thanks for the writeup. I like how it's honest and covers all aspects of your experience. I think a key takeaway is that there is no obvious fixed plan or recipe for working on AI safety and instead, you just have to try things and learn as you go along. Without these kinds of accounts, I think there's a risk of survivorship bias and positive selection effects where you see a nice paper or post published and you don't get to see experiments that have failed and other stuff that has gone wrong.
I'm sad to hear that AISC is lacking in funding and somewhat surprised given that it's one of the most visible and well-known AI safety programs. Have you tried applying for grant money from Open Philanthropy since it's the largest AI safety grant-maker?
"In brief, the book [Superintelligence] mostly assumed we will manually program a set of values into an AGI, and argued that since human values are complex, our value specification will likely be wrong, and will cause a catastrophe when optimized by a superintelligence"
Superintelligence describes exploiting hard-coded goals as one failure mode which we would probably now call specification gaming. But the book is quite comprehensive, other failure modes are described and I think the book is still relevant.
For example, the book describes what we would ...
Some information not included in the original post:
I think work on near-term issues like unemployment, bias, fairness and misinformation is highly valuable and the book The Alignment Problem does a good job of describing a variety of these kinds of risks. However, since these issues are generally more visible and near-term, I expect them to be relatively less neglected than long-term risks such as existential risk. The other factor is importance or impact. I believe the possibility of existential risk greatly outweighs the importance of other possible effects of AI though this view is partially conditional...
Good question. I haven't done much research on this but a paper named Understanding AI alignment research: A Systematic Analysis found that the rate of new Alignment Forum and arXiv preprints grew from less than 20 per year in 2017 to over 400 per year in 2022. However, the number of Alignment Forum posts has grown much faster than the number of arXiv preprints.
At OpenAI, I'm pretty sure there are far more people working on near-term problems that long-term risks. Though the Superalignment team now has over 20 people from what I've heard.
Thanks for the post. It was an interesting read.
According to The Case For Strong Longtermism, 10^36 people could ultimately inhabit the Milky Way. Under this assumption, one micro-doom is equal to 10^30 expected lives.
If a 50%-percentile AI safety researcher reduces x-risk by 31 micro-dooms, they could save about 10^31 expected lives during their career or about 10^29 expected lives per year of research. If the value of their research is spread out evenly across their entire career, then each second of AI safety research could be worth about 10^22 expected...
Thanks for pointing this out. I didn't know there was a way to calculate the exponentially moving average (EMA) using NumPy.
Previously I was using alpha = 0.33 for weighting the current value. When that value is plugged into the formula alpha = 2 / N + 1, it means I was averaging over the past 5 years.
I've now decided to average over the past 4 years so the new alpha value is 0.4.
Thanks for the post. Until now, I used to learn about what LTFF funds by manually reading through its grants database. It's helpful to know what the funding bar looks like and how it would change with additional funding.
I think increased transparency is helpful because it's valuable for people to have some idea of how likely their applications are to be funded if they're thinking of making major life decisions (e.g. relocating) based on them. More transparency is also valuable for funders who want to know how their money would be used.
According to Price's Law, the square root of the number of contributors contributes half of the progress. If there are 400 people working on AI safety full-time then it's quite possible that just 20 highly productive researchers are making half the contributions to AI safety research. I expect this power law to apply to both the quantity and the quality of research.
I like the AI Alignment Wikipedia page because it provides an overview of the field that's well-written, informative, and comprehensive.
Excellent story! I believe there's strong demand for scenarios explaining how current AI systems could go on to have a catastrophic effect on the world and the story you described sounds very plausible.
I like how the story combines several key AI safety concepts such as instrumental convergence and deceptive alignment with a description of the internal dynamics of the company and its interaction with the outside world.
AI risk has been criticized as implausible given the current state of AI (e.g. chatbots) but your realistic story describes how AI in its present form could eventually cause a catastrophe if it's not developed safely.
Thanks for writing the post.
I know the sequence is about criticisms of labs but I personally think I would get more value if the post focused mainly on describing what the lab is doing with less about evaluating the organization because I think that the reader can form their own opinion themselves given an informative description. To use more technical language, I would be more interested in a descriptive post than a normative one.
My high-level opinion is that the post is somewhat more negative than I would like. My general sentiment on Conjecture is that ...
Great post. What I find most surprising is how small the scalable alignment team at OpenAI is. Though similar teams in DeepMind and Anthropic are probably bigger.
Good point. It's important to note that black swans are subjective and depend on the person. For example, a Christmas turkey's slaughter is a black swan for it but not for its butcher.
I disagree because I think these kinds of post hoc explanations are invalidated by the hindsight fallacy. I think the FTX crash was a typical black swan because it seems much more foreseeable in retrospect than it was before the event.
To use another example, the 2008 financial crisis made sense in retrospect, but the Big Short movie shows that, before the event, even the characters shorting the mortgage bonds had strong doubts about whether they were right and most other people were completely oblivious.
Although the FTX crisis makes sense in retrospect, I have to admit that I had absolutely no idea that it was about to happen before the event.
Thanks! I used that format because it was easy for me to write. I'm glad to see that it improves the reading experience too.
I really like this post and I think it's now my favorite post so far on the recent collapse of FTX.
Many recent posts on this subject have focused on topics such as Sam Bankman Fried's character, what happened at FTX and how it reflects on EA as a whole.
While these are interesting subjects, I got the impression that a lot of the posts were too backward-looking and not constructive enough.
I was looking for a post that was more reflective and less sensational and focused on what we can learn from the experience and how we can adjust the strategy of EA going forward and I think this post meets these criteria better than most of the previous posts.
This reminds of Nick Bostrom's story, "The Fable of the Dragon-Tyrant". Maybe somebody will write a story like this about ageing instead of smallpox in the future.
I think microgrants are a great idea! Because they're small, you can make lots of investments to different people with relatively little risk and cost.
One way of doing automated AI safety research is for AI safety researchers to create AI safety ideas on aisafetyideas.com and then use the titles as prompts for a language model. Here is GPT-3 generating a response to one of the ideas:
This question would have been way easier if just I estimated the number of AI safety researchers in my city (1?) instead of the whole world.
Here is a model that involves taking thousands of trials of the product of six variables randomly set between 10% and 90% (e.g. 0.5^6 = 0.015 = 1.5%).
As other people have noted, conjunctive models tend to produce low probabilities (<5%).
Great post. This is possibly the best explanation of the relationship between capabilities and safety I've seen so far.
Great talk. I think it breaks down the problem of AI alignment well. It also reminds me of the more recent breakdown by Dan Hendryks which decomposes ML safety into three problems: robustness, monitoring and alignment.
I've noticed that a lot of good ideas seem to come from talks. For example, Richard Hamming's famous talk on working on important problems. Maybe there should be more of them.
I went through all the authors from the Alignment Forum from the past ~6 months, manually researched each person and came up with a new estimate named 'Other' of about 80 people which includes independent researchers, other people in academia and people in programs such as SERI MATS.
More edits:
- DeepMind: 5 -> 10.
- OpenAI: 5 -> 10.
- Moved GoodAI from the non-technical to technical table.
- Added technical research organization: Algorithmic Alignment Group (MIT): 4-7.
- Merged 'other' and 'independent researchers' into one group named 'other' with new manually created (accurate) estimate.
Great point. The decline of religion has arguably left a cultural vacuum that new organizations can fill.
Edit: updated OpenAI from 5 to 10.
From their website, AI Impacts currently has 2 researchers and 2 support staff (the current total estimate is 3).
The current estimate for Epoch is 4 which is similar to most estimates here.
I'm trying to come up with a more accurate estimate for independent researchers and 'Other' researchers.
I re-estimated the number of researchers in these organizations and the edits are shown in the 'EDITS' comment below.
Copied from the EDITS comment:
- CSER: 5-5-10 -> 2-5-15
- FLI: 5-5-20 -> 3-5-15
- Levelhume Centre: 5-10-70 (Low confidence) -> 2-5-15 (Medium confidence)
My counts for CSER:
- full-time researchers: 3
- research affiliates: 4
FLI: counted 5 people working on AI policy and governance.
Levelhume Centre:
- 7 senior research fellows
- 14 research fellows
Many of them work at other organizations. I think 5 is a good conservative estimate.
New...
I re-estimated counts for many of the non-technical organizations and here are my conclusions:
Edits based on feedback from LessWrong and the EA Forum:
EDITS:
- Added new 'Definitions' section to introduction to explain definitions such as 'AI safety', 'researcher' and the difference between technical and non-technical research.
UPDATED ESTIMATES (lower bound, estimate, upper bound):
TECHNICAL
- CHAI: 10-30-60 -> 5-25-50
- FHI: 10-10-40 -> 5-10-30
- MIRI: 10-15-30 -> 5-10-20
NON-TECHNICAL
- CSER: 5-5-10 -> 2-5-15
- Delete BERI from the list of non-technical research organizations
- Delete SERI from the list of non-technical research organizat...
Thanks for the information! Your estimate seems more accurate than mine.
In the case of Epoch, I would count every part-time employee as roughly half a full-time employee to avoid underestimating their productivity.
It's true that the universe B might never fully catch up because 99% of a single generation was lost. But over 1 billion years, we would expect about 40 million generations to live. Even if a few generations were lost, if there is a recovery the total loss won't be high.
"The catastrophe that takes place in scenario B removes 99% of all humans alive, which in turn removes around 99% of all humans that could have lived at the end of time."
That would only happen if the population never recovered. But since I would expect the world to rapidly repopulate, I therefore would expect the long-term difference to be insignificant.
I agree that the total number of humans who will ever live at the end of time is similar in A and B. Therefore I think there is almost no difference between A and B in the long term.
I wrote a blog post in 2022 (1.5 years ago) estimating that there were about 400 people working on technical AI safety and AI governance.
In the same post, I also created a mathematical model which said that the number of technical AI safety researchers was increasing by 28% per year.
Using this model for all AI safety researchers, we can estimate that there are now 400×1.281.5≈580 people working on AI safety.
I personally suspect that the number of people working on AI safety in academia has grown faster than the number of people in new EA orgs so ... (read more)