Cameron Holmes

Research Manager (Alignment) @ AISI

70 karmaJoined Jul 2022Working (6-15 years)London, UK

Message

Interests:

AI safetyAI alignmentForecastingPrediction marketsSemiconductorsAI forecastingEffective Altruism GlobalCareer choice

Bio

Participation
4

I am working a research manager in the Alignment team at UK AISI to reduce the risks of misaligned AI.

Previously I worked as a Senior Research Manager at MATS, where I managed Interpretability research and supported scaling the team in London.

Prior to that I worked as a director of Product Management in Capital Markets, Data Analytics at Coalition Greenwich (S&P Global) and McLagan (Aon)

Interested in Prediction markets, Semiconductors. AMF monthly donor for 9 years.

How I can help others

Managing career transitions from broader technology/finance to high-impact careers. Particular for those mid-way through their career, as parents or moving into AI Safety.

Anything related to MATS, in particular the extension research phase in London

Posts
2

Sorted by New

About my Job: Research Manager

Cameron Holmes

· 2mo ago · 6m read

MATS Spring 2024 Extension Retrospective

HenningB

· 1y ago · 18m read

Comments
11

About my Job: Research Manager

Cameron Holmes2mo3

Thanks!

I think I've leaned into the people management aspects of the role here and away from the object-level research skills. So I guess the thing I'd flag as important is having a broad understanding of the research landscape (and an interest in keeping up with it). Although the deep technical knowledge isn't essential, it's important to be able to understand the position of projects within the research landscape.

Why I'm Posting AI-Safety-Related Clips On TikTok

Cameron Holmes6mo8

This seems great, although I expect most of the impact from this could come from broader public awareness rather than from new AI Safety talent as such and it might be worth leaning into that framing/goal a bit more?

I don't know exactly how to weigh 10k people aware of AIS (enough to consider it when voting) vs an additional person working on it full time, but I feel like the difference could be of that magnitude.

Tangent: I really like adding AIS topics into existing venues (e.g. Chana's video on Computerphile) for converting talent to AIS as I think it skips a lot of significant filters. Although I do get that this could still be the first step on that path (through multiple exposures)

Discussion Thread: Existential Choices Debate Week

Cameron Holmes11mo1

What makes you think this? Every technique there is is statistical in nature (due to the nature of the deep learning paradigm), and none are even approaching 3 9s of safety and we need something like 13 9s if we are going to survive more than a few years of ASI.

I also don't see how it's less foomy. SWE bench and ML researcher automation are still improving - what happens when the models are drop in replacements for top researchers?

The gap between weak AGI and strong AGI/ASI timeline predictions seems to have ticked up a bit. It doesn't seem like the intra-token reasoning/capabilities is scaling as hard as I'd previously feared. The models themselves are not getting so scarily capable and agentic in each forward pass, instead we are increasingly eliciting those capabilities/agency in context with the models remaining myopic and largely indifferent.

If the new paradigm holds with a significant focus on scaling inference it seems to both be less aggressive (in terms of scaling intelligence) and more conducive to 'passing' safety.

The current paradigm likely places a much lower burden on hard interpretability than I expected ~1 year ago, it feels much more like a verification problem than a full solve. With current rates of interpretability progress (and AI accelerating safety ~inline with capabilities) we could actually be able to verify that a CoT is faithful and legible and that might be ~sufficient.

Agreed, I still think there's a reasonable chance that ML research does fall within the set of capabilities that quickly reach superhuman levels and foom is still on the cards, also more RL in general is just inherently quite scary.

The 9s of safety makes sense from a control perspective but I think there's another angle, which is the possibility of a model that is aligned-enough to actually not want to pursue human extinction.

What is the eventual end result after total disempowerment? Extinction, right?

Potentially, but I think there's still room for scenarios where humans are broadly disempowered yet not extinct - worlds where we get a passing grade on safety. Where we effectively avoid strongly-agentic systems and achieve sufficient alignment such that human lives are valued, but fall short of the full fidelity necessary for a flourishing future.

Still this point has updated me slightly, I've reduced my disagreement.

My model looks something like this:

There are a bunch of increasingly hard questions on the Alignment Test. We need to get enough of the core questions right to avoid the ASI -> everyone quickly dies scenario. This is the 'passing grade'. There are some bonus/extra credit questions that we need to also get right to get an A (a flourishing future).

We don't know exactly which questions will be included or in which section. We also don't know the thresholds for these grades and we are (rightly) focusing the vast majority of our efforts on the expected fundamental questions to maximise our chance of the passing grade.

Relatively to ~1 year ago the 'passing grade' for alignment feels a bit easier and we've got a bit more study time. I've also become aware of just how much more difficult the A grade might be and that a pass might not be very valuable at all - I don't think anything has changed there, I was just somewhat ignorant of risks from gradual disempowerment.

It might make sense to dedicate say 5-20% of our effort to study for questions we expect in the bonus/extra credit section. I think we currently do less than that (perhaps 1-5%). So I think the vast majority of the effort should be spent on avoiding extinction, but I'm less sure about effort at the margin.

Discussion Thread: Existential Choices Debate Week

Cameron Holmes11mo1

Digital sentience could also dominate this equation.

Discussion Thread: Existential Choices Debate Week

Cameron Holmes11mo1

~~71%~~ ➔ 50% disagree

AI NotKillEveryoneism is the first order approximation of x-risk work.

I think we probably will manage to make enough AI alignment progress to avoid extinction. AI capabilities advancement seems to be on a relatively good path (less foomy) and AI Safety work is starting to make real progress for avoiding the worst outcomes (although a new RL paradigm, illegible/unfaithful CoT could make this more scary).

Yet gradual disempowerment risks seem extremely hard to mitigate, very important and pretty neglected. The AI Alignment/Safety bar for good outcomes could be significantly higher than avoiding extinction.

Most fundamentally human welfare currently seems highly contingent on our productivity and decoupling that could be very hard.

Discussion thread: Animal Welfare vs. Global Health Debate Week

Cameron Holmes1y3

I'm completely sold on the arguments in general EV terms (the vast suffering, tractability, importance, neglect - even within EA), up to the limits of how confident I can be about anything this complex. That's basically the fringe possibilities - weird second, third-order impacts from the messiness of life that mean I couldn't be >98% on something like this.

The deontological point was that maybe there is a good reason I should only care or vastly weight humans over animals through some moral obligation. I don't currently believe that but I'm hedging for it, because I could be convinced.

I realise now I'm basically saying I 90% agree that rolling a D20 for 3+ is a good idea, when it would be fair to also interpret it that I 100% agree it's a good idea ex ante.

(Also my first comment was terrible, sorry I just wanted to get on the board on priors before reading the debate)

Discussion thread: Animal Welfare vs. Global Health Debate Week

Cameron Holmes1y3

I think most of my reservations are mostly deontological, plus a few fringe possibilities

What I Learned In My First Year of Community Building

Cameron Holmes2y1

Thank you Jonny, admittedly I only made it to one event but it was my first in person interaction with an EA group and I really enjoyed it and found you very welcoming.

EA Origin Story Round-Up

Cameron Holmes3y5

I just find it delightful that that HPMOR is the start of so many people's EA origin story, partly just as a curiosity as I had an opposite path to so many people (AMF > EA > LW > HPMOR)

Presumably there are many people alive today because of a chain of events started with EY writing a fanfic of all things.

AGI and the EMH: markets are not expecting aligned or unaligned AI in the next 30 years

Cameron Holmes3y12

Great post

Regarding the second point about how EAs (or anyone else) might exploit an inefficiency in this space, I think it's tricky just because the amount of other risks that inform the pricing of long-dated bonds. Many of these (climate, demographics, geopolitics, populism etc..) could wipe out any short (or especially leveraged short) position before TAI is realised.

As noted in my other comment I expect for someone with high-conviction views on short TAI timelines there are bets that are:

Much higher in expected returns
Less capital intensive
Less susceptible to other risks

Examples of these bets are broadly discussed elsewhere but often are related to long/short equity bets on disrupting/disrupted companies and companies part of the supply chain (semiconductors design/fab/tooling, datacentre, data aggregators, communications etc..)

I think perhaps at best short long-dated bonds could form part of a short-timelines TAI bet in order to hedge against long positions elsewhere/maintain neutrality against other factors rather than the core position. It feels likely there are considerably better options for someone taking such a bet (as you allude to in the opportunities for future work)

Cameron Holmes

Bio

Participation4

How I can help others

Posts 2

Comments11

Participation
4

Posts
2

Comments
11