Reinforcement learning scaling might incentivise hidden reasoning architectures for AI

This is the third in a sequence of posts taken from my recent report: Why Did Environmentalism Become Partisan? Summary Rising partisanship did not make environmentalism more popular or politically effective. Instead, it saw flat or falling overall public opinion, fewer major legislative achievements, and fluctuating executive actions. Public Opinion...

127

Possible mistake EAs are making and shout out to Pause AI UK

Michelle_Hutchinson·5d ago·4m read

I think right now EAs might be making a significant mistake by paying insufficient attention to the political realm. As EAs we tend to figure out what’s most impactful for us to work on and focus hard. That’s great! But there are various actions that are ‘non-delegatable’ - the extent to which an individual can do the action is limited (like voting, going to a protest, making hard money contributions to particular campaigns). It might be useful if we were all more in the habit of doing variou...

105

New Video from AI in Context: The Fall and Rise of Sam Altman

ChanaMessinger, phoebe b, Aric Floyd·1w ago·3m read

New Video from AI in Context: The Fall and Rise of Sam Altman If you want to skip straight to the video, here it is! AI in Context is excited to be back with our fourth video! For those just hearing from us, we make videos for 80,000 Hours, telling stories about transformative AI...

Recent opportunities to take action

$1M AI x-risk grant round is live on grantmaking.ai - apply for funding, review applicants, or fund projects

Matt Brooks·14h ago·3m read

127

Possible mistake EAs are making and shout out to Pause AI UK

Michelle_Hutchinson·5d ago·4m read

Build a flourishing EA group at the University of Toronto

Joseph Kostousov, Sophia Wan (navarhontes)·1w ago·1m read

^{^}

Note that whole fields of explainable AI and AI interpretability exist, with many open research agendas. People are trying, bless them! They have made nonzero progress! But neural networks are still basically impenetrable.

^{^}

I say ‘encouraging’ to encompass all of training, prompting via input, forcing via injected context, or other steering injections.

^{^}

The company serving the AI might hide it from you. Whoever runs the AI may hide it from you (perhaps just showing you final outputs, or even passing them off as their own production). But at least the reasoning is out there, externalised for someone to look over in principle… if they can be bothered. (Maybe they’ll get their AI to do it!)

^{^}

Perhaps something like a few minutes’-worth of mathematical reasoning without making notes, and perhaps a few ‘steps’ of logic.

^{^}

There are reasons even this comes apart — records of human writing and speech are not usually records of human thought. Nowadays, some of the records are of AI writing from earlier generations! But there are enough basically faithful examples of ‘thinking out loud’ and ‘reasoning clearly’ that when you encourage a mostly-pretrained AI to reason out loud really really comprehensively, it seems to mostly do that in a human-readable way.

^{^}

There are some partial ways around this, but on the whole it’s right.

^{^}

Consider Amdahl’s law: the self-supervised part is ridiculously parallelisable — it’s ‘optimised’ — and the RL part isn’t. When RL is small, the overall boost is very large.

Reinforcement learning scaling might incentivise hidden reasoning architectures for AI

Reinforcement learning scaling might incentivise hidden reasoning architectures for AI

Hidden reasoning

A dash (em-dash?) of luck: ‘thinking out loud’

Aside: keeping visible reasoning faithful

The cake is a lie: reinforcement learning back in style

Reinforcement learning’s serial training penalty

Return of the recurrent network?

Does ‘thinking out loud’ go away?

Potential saving benefits of visible reasoning

Does it even matter?