Delegated agents in practice: 
How companies might end up selling AI services that act on behalf of consumers and coalitions, and what this implies for safety research

AI Use Note: Main body text entirely human written. Claude (Opus 4.8) helped develop models of animal life histories in the appendix. Cross-posted from Good Structures. Executive Summary * Animal advocates sometimes make claims like “there are X of this animal...

How (not) to fundraise from Anthropic staff

Jack Lewars·6d ago·7m read

Adapted from my Substack, Funding Anthropalypse. Short version: if you want a share of the coming Anthropic and OpenAI windfall - the $37bn+ that could be in play next year - the way in is to become 'legibly excellent', so the evaluators and donors that frontier lab staff already trust point them to yo...

If you're agentic, work in biosecurity

sharmaayushmaan🔸·3d ago·7m read

Disclaimer: Although I work on the Groups Team at CEA, I’m writing this in a personal capacity, and this post does not constitute an endorsement by CEA. Agency - the realisation that you really can just do things. TL;DR Biosecurity needs people (of any background) who are agentic and have a high execution velocity and track record....

Recent opportunities to take action

Starting an EA group @ SUNY Binghamton

micahzarin·1d ago·1m read

Marginal Victories: career advising and opportunities for U.S. democracy preservation & political work

Annika Burman 🔸·2d ago·2m read

I'm stepping down as Hive's Executive Director, and we're hiring my successor

SofiaBalderson, Hive·2d ago·3m read

A scenario

A ‘pure’ delegated agent may start out as a personal service hosted through an encrypted AWS account. Wealthy, tech-savvy early adopters pay a monthly fee to use it as an extension of themselves – to pre-process information and automate decisions on their behalf.

The start-up's founders recognise that their new tool is much more intimate and intrusive than good ol' GMail and Facebook (which show ads to anonymised user segments). To market it successfully, they invest in building trust with target users. They design the delegated agent to assuage their user's fears around data privacy and unfeeling autonomous algorithms, leave control firmly in the user's hands, explain its actions, and prevent outsiders from snooping or interfering in how it acts on the user’s behalf (or at least give consistent impressions thereof). This instils founder effects in terms of the company's core expected design and later directions of development.

Research directions that may be relevant to existential safety

Narrow value learning: Protocols for eliciting preferences that are user time/input-efficient, user-approved/friendly and context-sensitive (reducing elicitation fatigue, and ensuring that users know how to interact and don’t disengage). Models for building accurate (hierarchical?) and interpretable (semi-symbolic?) representations of the user’s preferences on the fly within the service’s defined radius of influence.

Defining delegation: How to define responsibility and derive enforceable norms in cases where a person and an agent acting on its behalf collaborate on exercising control and alternate in the taking of initiative?

Heterogeneity of influence: How much extra negotiation power or other forms of influence does paying extra for a more sophisticated and computationally powerful delegated agent with more access to information offer? Where does it make sense for groups to pool funds to pay for a delegated agent to represent shared interests? To what extent does being an early mover or adopter in this space increase later influence?

Governance and enforcement: How to coordinate the distribution of punishments and rewards to heterogeneous delegated agents (and to the users who choose which designs to buy so they have skin in the game) such that they steer away from actions that impose negative externalities (including hidden systemic risks) onto other, less-represented persons and towards cooperating on creating positive externalities? See this technical paper if that question interests you.

Emergence of longer-term goals: Drexler argues for a scenario where services are developed that complete tasks within bounded times (including episodic RL).
Will a service designed to act on behalf of consumers or coalitions converge on a bounded planning horizon? Would the average planning horizon of a delegated agent be longer than that of ‘conventional’ CAIS? How would stuff like instrumental convergence and Goodharting look like in a messy system of users buying delegated agents that complete tasks across longer time horizons but flexibly elicit and update their model of the users’ preferences and enforcers’ policies?