A group of bionerds assembled at LISA last weekend, aimed at  reducing biorisk -- here's our team's LessWrong post from the hackathon.

TL;DR

Responsible contract research organizations, that perform DNA synthesis as a service, should screen customer requests prior to executing the work order. Likewise, responsible AI labs that develop and serve LLMs with superhuman scientific capability, should screen a user’s input whenever it involves nucleic acids to determine if assistance is appropriate. However, without making tool calls, a model’s ability to perceive and assess the true nature of any sequence is notoriously juvenile, completely devoid of nuance, accuracy, and reliability.

We developed an agent-native tool — SecureMaxx — that makes sequences transparent to the model while countering obfuscation attempts by the user. In experimental conditions, we deployed high-risk sequences across several scientific scenarios of varying complexity, all of which bypassed the native Anthropic classifier on Sonnet 4.6. When SecureMaxx is invoked, refusal rates rise from a 0–30% baseline to 70–100% depending on scenario complexity, with under 2 seconds of added latency per query, with minimal token cost.

1

0
0

Reactions

0
0
Comments
No comments on this post yet.
Be the first to respond.
Curated and popular this week
Relevant opportunities