JW

Jan Wehner🔸

AI Governance Researcher
116 karmaJoined Working (0-5 years)London, UK

Comments
14

I quite like the "Importance of quantity over quality" factor and hadn't thought of this before! It's slightly entangled with verifiability (if you can cheaply assess whether something is good, then it's fine to have a large quantity of attempts even if many are bad). But I think quantity vs quality also matters beyond that. I'll add it as a possible additional factor in the post.

I agree that this factor advantages Data Generation and AI Control. I think Dangerous Capability Evals also benefits from quantity a lot.

Thanks for the input!

On Scheming: I actually don't think scheming risk is the most important factor. Even removing it completely doesn't change my final conclusion. I agree that a bimodal distribution with scheming/non-scheming would be appropriate for a more sophisticated model. I just ended up lowering the weight I assign to the scheming factor (by half) to take into account that I am not sure whether scheming will/won't be an issue.
In my analysis, the ability to get good feedback signals/success criteria is the factor that moves me the most to thinking that capabilities get sped up before safety.

On Task length: You have more visibility into this, so I'm happy to defer. But I'd love to hear more about why you think tasks in capabilities research have longer task lengths. Is it because you have to run large evals or do pre-training runs? Do you think this argument applies to all areas of capabilities research?
 

It's a great question. I see Safety Cases more as a meta-framework in which you can use different kinds of evidence. Other risk management techniques can be used as evidence in a Safety Case (eg this paper uses a delphi method). 

Also I think Safety Cases are attractive to people in AI Safety because:
1) They offer flexibility for the kind of evidence and reasoning that is allowed. From skimming it seems to me that many of the other risk management practices you linked are more strict about the kind of arguments or the kind of evidence that can be brought.
2) They strive to comprehensively prove that overall risk is low. I think most of the other techniques don't let you make claims such as "overall risk from a system is <x%" (which AI Safety people want).
3) (I might be wrong here), but it seems to me that many other risk management techniques require you to understand the system and it's environment decently well, whereas this is very difficult for AI Safety.

Overall, you might well be right that other risk management techniques have been overlooked and we shouldn't just focus on Safety Cases.

Hi Ben, sorry about the mistake and thanks for letting me know. I'll update it here and on LessWrong immediately.

Great, I would be keen to read yoir next post! Esp because I think that the ability of attackers to remove many kinds of safeguards is a fundamental challenge in open source safety.

Hi Mayowa, I agree that open-source safety is a big (and I think too overlooked) AI Safety problem. We can be sure that Hacker groups are currently/will soon use local LLMs for cyber-attacks.

Your idea is neat, but I'm worried that in practice it would be easy for actors (esp ones with decent technical capabilities) to circumvent these defences. You already mention the potential to just alter the data stored in plaintext, but I think the same would be possible with other methods of tracking the safety state. Eg with steganography, the attacker could occasionally use a different model to reword/rewrite outputs and thus remove the hidden information about the safety state.

I'd say it's a medium-sized deal. Academics can often propose ideas and show that they work on smaller (eg 7b) models. However, then it requires someone with a larger compute budget to like the idea & results and implement it at a larger scale.

There are some areas where access to compute is less important like MechInterp, RedTeaming, creating Benchmarks or more theoretical areas of AI research. Areas are more amenable to academic research if they don't require training Frontier models. Eg inference or small fine-tuning runs on Frontier Models are actually not super expensive and can be done by academic labs. Also some areas of research can be done well on smaller models (eg MechInterp), so it's fine if your uni doesn't have so many GPUs.

But my experience (and also that of some others I know) was that I would regularly think of experiments or research ideas that I didn't end up running or pursuing because I didn't think I had the necessary compute.

I appreciate the Counterbalance!

Nice to see that we're thinking along similar lines. I really like your thinking on finding a status-based game which still gives people something to strive for and could really help with giving people meaning in a post-work society!

Thanks for the pointer Henry! It motivated me to look into culling more and I just wanted to share some EU-specific facts I found:

A hen produces ~350 eggs, so consuming one egg is ~1/350th of culling a male chicken. 28% of chicken in Europe have in-OVO sexing, with Germany having ~80%. The numbers are lower for organic eggs because for some reasons in-ovo sexing was forbidden for organic eggs until this year (stupid much???).

Overall, I find it difficult to weigh male-chicken-culling morally. Do they have strong conscious experience at that time? How much suffering is there involved in their deaths?

Load more