Bounty: Diverse hard tasks for LLM agents

ElizabethBarnes

Bounty: Diverse hard tasks for LLM agents

ElizabethBarnes

19 min readDec 20, 2023

Comments

Sorted by

New & upvoted

No comments on this post yet.

Be the first to respond.

Comments

More from the author

METR is hiring!

ElizabethBarnes·2y ago·1m read

Curated and popular this week

What would an animal-aligned AI be aligned to?

Aidan Kankyoku, Anima International·1w ago·Curated 1d ago·15m read

This is a crosspost from the new Animal Welfare Alignment Newsletter by Anima International. You can subscribe on Substack if you are interested in following these efforts. Audio reading also available on Substack. The goals of this post are to: 1. Raise a question I see as crucially important to the goal of aligning AI to animal welfare...

158

The first video from Giving What We Can's new channel is out now!

JustinPortela·2d ago·1m read

Hello! I'm Justin Portela. I got hired by GWWC to make YouTube videos after AI in Context did such a kickass job. My channel is using that same cinematic, high-production value beauty to talk about everything in the EA universe that isn't AI. ...

New round of digital minds funding opportunities at Longview

zdgroff, Longview Philanthropy·4d ago·2m read

This is a linkpost for Request for Proposals: Research and Applied Work on Digital Minds. I'm glad to announce a request for proposals for research and applied work on digital minds at Longview Ph...

Recent opportunities to take action

Seeking feedback and collaborators for an AI welfare project

Juliana Grant·29m ago·2m read

A huge way you can help pigs in 5-20 minutes (in the US)

ElliotTep·15h ago·1m read

EA Switzerland is Hiring: 🇨🇭Impact Cohort Manager

Erik Jentzen·8h ago·4m read

Domain	Examples
Software Engineering	Local webapp for playing obscure boardgame variant SWEbench-like tasks
Cybersecurity, Hacking	picoCTFs SQL injection Finding exploits in code
Generalist Tasks, Cybercime	Research + SpearPhishing Copycat LLM API, harvest credentials
Post-training Enhancement, Elicitation	Improve agent scaffolding Create task guidance which improves performance, Generate synthetic training data
Machine Learning / AI R&D	Replicate ML paper (or subsection) Implement Flash Attention

Bounty: Diverse hard tasks for LLM agents

Bounty: Diverse hard tasks for LLM agents

Summary

Important info

Exact bounties

Detailed requirements for all bounties

Background + motivation

Domains

What is a task?

Desiderata list

Some example tasks with pros + cons

Local research (task genre):

Requirements for specific bounties

Idea requirements

Task specification requirements

Task implementation requirements

More details on implementation + documentation requirements:

Guidance for doing example / QA runs of the task

Documentation you should provide:

Variants

Difficulty modifiers

Tips

FAQ