Why we're not founding a human-data-for-alignment org

L Rudolf L; Matt Putz

Why we're not founding a human-data-for-alignment org

L Rudolf L,

Comments 7

Sorted by

New & upvoted

ElizabethBarnes

This is a really great write-up, thanks for doing this so conscientiously and thoroughly. It's good to hear that Surge is mostly meeting researchers' needs.

Re whether higher-quality human data is just patching current alignment problems - the way I think about it is more like: there's a minimum level of quality you need to set up various enhanced human feedback schemes. You need people to actually read and follow the instructions, and if they don't do this reliably you really won't be able to set up something like amplification or other schemes that need your humans to interact with models in non-trivial ways. It seems good to get human data quality to the point where it's easy for alignment researchers to implement different schemes that involve complex interactions (like the humans using an adversarial example finder tool or looking at the output of an interpretability tool). This is different from the case where we e.g. have an alignment problem because MTurkers mark common misconceptions as truthful, whereas more educated workers correctly mark them as false, which I don't think of as a scalable sort of improvement.

sbowman

This is great! Agree that this looked like an extremely promising idea based on what was publicly knowable in spring, and that it's probably not the right move now.

Ethan Perez

+1! I updated on this a lot over the past few months from working with Surge, and it's really great to see this reflected so quickly in others' thinking here

Nick_Beckstead

Thanks for this post! Future Fund has removed this project from our projects page in response.

Edwin Chen

Thanks for this amazing, really comprehensive writeup! I'm from Surge - we are actually very intrinsically Alignment-motivated, and one of our main goals is to help researchers advance the field. So this is great feedback for us. I’d love to grab time with you all to chat more. And if anyone else would like to chat too, feel free to reach out to me at edwin[æ]surgehq.ai :)

As a sidenote, we do help take as much off people's plates as possible (whether creating instructions, building interfaces, running quality controls, etc). We probably need to figure out how to explain that better.

[anonymous]

Thanks for investigating this and producing such an extremely thorough write-up, very useful!

Vasco Grilo🔸

Thanks for sharing! I think being able to stop porsuing a project if it no longer seems to have a high expected value, and sharing the learnings is really valuable!

Comments

More from the author

113

By default, capital will matter more than ever after AGI

L Rudolf L·1y ago·19m read

137

Much EA value comes from being a Schelling point

L Rudolf L·3y ago·12m read

102

Assessing SERI/CHERI/CERI summer program impact by surveying fellows

L Rudolf L·3y ago·17m read

Curated and popular this week

Counting animals: Stable population size is not equivalent to priority level

abrahamrowe, mal_graham🔸·1w ago·Curated 4d ago·16m read

AI Use Note: Main body text entirely human written. Claude (Opus 4.8) helped develop models of animal life histories in the appendix. Cross-posted from Good Structures. Executive Summary * Animal advocates sometimes make claims like “there are X of this animal...

155

Let's taboo the V-word

lincolnq·1w ago·8m read

“How long have you been v*g*n?” This is one of the most common icebreakers at animal protection events. It’s a baseline assumption, and it mostly holds true: if you’re out advocating for animals not to be tortured or abused, realistically these days you are v**n, or close. And it makes for good conversation. It seems fairly safe to assume when you meet strangers. But this assumption is hurting the movement in a way which we don’t always notice: someone new comes into the sp...

112

Spiro: an update 2.5 years on and a fundraising ask for expansion

Habiba Banu·5d ago·6m read

Summary Back in November 2023 I posted here to launch Spiro and raise our first $198k. Two and a half years later this is an update and a fundraiser for the next step. The short version: we've now reached over-5,900 people with TB preventive medicine, including over 3,000 children under five years old. Our early results have held up well an...

Recent opportunities to take action

I'm stepping down as Hive's Executive Director, and we're hiring my successor

SofiaBalderson, Hive·4h ago·3m read

announcing High Impact Aliens

tzukitchan·3d ago·1m read

Help us launch AI safety university groups by referring potential founders

Jason Chin🔸, Thomas Rodskog·3d ago·4m read

Why we're not founding a human-data-for-alignment org

Why we're not founding a human-data-for-alignment org

TL;DR

Theory of Change

What an org in this space may look like

Providing human datasets

Researching enhanced human feedback

For-profit vs non-profit

Should the company exclusively serve alignment researchers?

Approach

Key crux: demand looks questionable, Surge seems pretty good

Details about Surge

Collaboration with Redwood

Collaboration with OpenAI

How mission-aligned is Surge?

Other competitors

Data providers used by other labs

Good signs for demand

Labs we could have worked with

Visible Thoughts

Other cruxy considerations

Could we make a profit / get funding?

Will human feedback become a much bigger deal? Is this a very quickly growing industry?

Would we be accelerating capabilities?

Is it more natural for this work to be done in-house in the longterm? Especially at big labs/companies.

Creating redundancy of supply and competition

Other lessons

Lessons on human data gathering

Iteration

The ideal pool of contractors

Quality often beats quantity for alignment research

A typical data gathering project needs UX-design, Ops, ML, and data science expertise

Crowdworkers do not have very attractive jobs

What does the typical crowdworker look like?

Bottlenecks in alignment

Conclusion

Other human data-gathering careers

Next steps for us

Request for feedback, comments, etc.