Hide table of contents

(Cross-posted from LessWrong)

I think that Ought is one of the most promising projects working on AI alignment. There are several ways that readers can potentially help:

In this post I'll describe what Ought is currently doing, why I think it's promising, and give some detail on these asks.

(I am an Ought donor and board member.)

Factored evaluation

Ought's main project is currently designing and running "factored evaluation" experiments, and building relevant infrastructure. The goal of these experiments is to answer the following question:

Consider a principal who wants to incentivize experts to work on hard problems, but finds that the work is too complex for the principal to tell what is good or bad. Can they overcome this problem by dividing the evaluation task into smaller pieces, delegating each of them to other experts, and recursively incentivizing good work on those pieces using the same mechanism?

Here's what an experiment looks like:

  • Recruit a pool of "judges," "helpful experts," and "malicious experts."
  • Start with a question that is too difficult for the judge to answer.
    • In initial experiments, the questions are difficult because the judge lacks relevant context. For example, the judge may want to answer a question about a long article, while only being allowed to look at a few hundred characters of text.
    • Limiting the amount of text the judge can read is an artificial restriction. This is intended as a warm-up, an easier analogy for the situation where the experts have knowledge that cannot be explained to a single judge in a reasonable amount of time (or where we want to reduce the cost of judging).
  • An "honest expert" provides a good answer to that question, a "malicious expert" provides a bad answer to that question, and a judge is tasked with telling which answer was better. The answers themselves may be too large and complex for the judge to look at, in which case they'll be handed a compact representation of each answer which they can inspect as needed.
  • In order to tell which answer is better, the judge asks subquestions. For example, they might ask "What's the best reason to think that answer A is better than answer B?" or "What does the author of article X believe about question Y?"
  • We answer these subquestions by delegating to the experts. The honest expert answers each subquestion as well as they can. The malicious expert can propose a bad subanswer, and we run the experiment recursively to see if the judge would pick the good or bad subanswer to the subquestion. If the malicious expert can propose subanswers that corrupt the top-level answer, and that are selected by the judge when we run this experiment recursively, then the malicious expert wins and the experiment fails.

This is not Ought's only project, but it's currently the largest single focus. Other projects include: exploring how well we can automate the judge's role on simple questions using existing ML, and thinking about possible decomposition strategies and challenges for factored evaluation.

Why this is important for AI alignment

ML systems are trained by gradient descent to optimize a measurable objective. In the best case (i.e. ignoring misaligned learned optimization) they behave like an expert incentivized to optimize that objective. Designing an objective that incentivizes experts to reveal what they know seems like a critical step in AI alignment. I think human experts are often a useful analogy for powerful ML systems, and that we should be using that analogy as much as we can.

Not coincidentally, factored evaluation is a major component of my current best-guess about how to address AI alignment, which could literally involve training AI systems to replace humans in Ought's current experiments. I'd like to be at the point where factored evaluation experiments are working well at scale before we have ML systems powerful enough to participate in them. And along the way I expect to learn enough to substantially revise the scheme (or totally reject it), reducing the need for trials in the future when there is less room for error.

Beyond AI alignment, it currently seems much easier to delegate work if we get immediate feedback about the quality of output. For example, it's easier to get someone to run a conference that will get a high approval rating, than to run a conference that will help participants figure out how to get what they actually want. I'm more confident that this is a real problem than that our current understanding of AI alignment is correct. Even if factored evaluation does not end up being critical for AI alignment I think it would likely improve the capability of AI systems that help humanity cope with long-term challenges, relative to AI systems that help design new technologies or manipulate humans. I think this kind of differential progress is important.

Beyond AI, I think that having a clearer understanding of how to delegate hard open-ended problems would be a good thing for society, and it seems worthwhile to have a modest group working on the relatively clean problem "can we find a scalable approach to delegation?" It wouldn't be my highest priority if not for the relevance to AI, but I would still think Ought is attacking a natural and important question.

Ways to help

Web developer

I think this is likely to be the most impactful way for someone with significant web development experience to contribute to AI alignment right now. Here is the description from their job posting:

The success of our factored evaluation experiments depends on Mosaic, the core web interface our experimenters use. We’re hiring a thoughtful full-stack engineer to architect a fundamental redesign of Mosaic that will accommodate flexible experiment setups and improve features like data capture. We want you to be the strategic thinker that can own Mosaic and its future, reasoning through design choices and launching the next versions quickly.
Our benefits and compensation package are at market with similar roles in the Bay Area. 
We think the person who will thrive in this role will demonstrate the following:
4-6+ years of experience building complex web apps from scratch in Javascript (React), HTML, and CSS
Ability to reason about and choose between different front-end languages, cloud services, API technologies
Experience managing a small team, squad, or project with at least 3-5 other engineers in various roles
Clear communication about engineering topics to a diverse audience
Excitement around being an early member of a small, nimble research organization, and playing a key role in its success
Passion for the mission and the importance of designing schemes that successfully delegate cognitive work to AI
Experience with functional programming, compilers, interpreters, or “unusual” computing paradigms

Experiment participants

Ought is looking for contractors to act as judges, honest experts, and malicious experts in their factored evaluation experiments. I think that having competent people doing this work makes it significantly easier for Ought to scale up faster and improves the probability that experiments go well---my rough guess is that a very competent and aligned contractor working for an hour does about as much good as someone donating $25-50 to Ought (in addition to the $25 wage).

Here is the description from their posting:

We’re looking to hire contractors ($25/hour) to participate in our experiments [...] This is a pretty unique way to help out with AI safety: (i) Remote work with flexible hours - the experiment is turn-based, so you can participate at any time of day (ii) we expect that skill with language will be more important than skill with math or engineering.
If things go well, you’d likely want to devote 5-20 hours/week to this for at least a few months. Participants will need to build up skill over time to play at their best, so we think it’s important that people stick around for a while.
The application takes about 20 minutes. If you pass this initial application stage, we’ll pay you the $25/hour rate for your training and work going forward.
Apply as Experiment Participant

I think Ought is probably the best current opportunity to turn marginal $ into more AI safety, and it's the main AI safety project I donate to. You can donate here.

They are spending around $1M/year. Their past work has been some combination of: building tools and capacity, hiring, a sequence of exploratory projects, charting the space of possible approaches and figuring out what they should be working on. You can read their 2018H2 update here.

They have recently started to scale up experiments on factored evaluation (while continuing to think about prioritization, build capacity, etc.). I've been happy with their approach to exploratory stages, and I'm tentatively excited about their approach to execution.

52

0
0

Reactions

0
0

More posts like this

Comments5


Sorted by Click to highlight new comments since:

The success of our factored evaluation experiments depends on Mosaic, the core web interface our experimenters use.

Is Mosaic an open-source technology that an applicant would be expected to have existing familiarity with, or an in-house piece of software? (The text of the job ad is a little ambiguous.) The term is unfortunately somewhat ungoogleable, due to confusion with the Mosaic web browser.

Thanks for this post, Paul.

NOTE: Response to this post has been even greater than we expected. We received more applications for experiment participant than we currently have the capacity to manage so we are temporarily taking the posting down. If you've applied and don't hear from us for a while, please excuse the delay! Thanks everyone who has expressed interest - we're hoping to get back to you and work with you soon.

Thanks for this interesting article - I really like seeing people write up explanations for lesser known ways to do lots of good.

My nitpick:

You can read their 2018H2 update here.

This link is broken for me. I think it was supposed to go to https://ought.org/updates/2018-12-31-progress-update

The "Updates" page is actually at https://ought.org/blog (but is displayed as https://ought.org/updates ).

Curated and popular this week
 ·  · 5m read
 · 
[Cross-posted from my Substack here] If you spend time with people trying to change the world, you’ll come to an interesting conundrum: Various advocacy groups reference previous successful social movements as to why their chosen strategy is the most important one. Yet, these groups often follow wildly different strategies from each other to achieve social change. So, which one of them is right? The answer is all of them and none of them. This is because many people use research and historical movements to justify their pre-existing beliefs about how social change happens. Simply, you can find a case study to fit most plausible theories of how social change happens. For example, the groups might say: * Repeated nonviolent disruption is the key to social change, citing the Freedom Riders from the civil rights Movement or Act Up! from the gay rights movement. * Technological progress is what drives improvements in the human condition if you consider the development of the contraceptive pill funded by Katharine McCormick. * Organising and base-building is how change happens, as inspired by Ella Baker, the NAACP or Cesar Chavez from the United Workers Movement. * Insider advocacy is the real secret of social movements – look no further than how influential the Leadership Conference on Civil Rights was in passing the Civil Rights Acts of 1960 & 1964. * Democratic participation is the backbone of social change – just look at how Ireland lifted a ban on abortion via a Citizen’s Assembly. * And so on… To paint this picture, we can see this in action below: Source: Just Stop Oil which focuses on…civil resistance and disruption Source: The Civic Power Fund which focuses on… local organising What do we take away from all this? In my mind, a few key things: 1. Many different approaches have worked in changing the world so we should be humble and not assume we are doing The Most Important Thing 2. The case studies we focus on are likely confirmation bias, where
 ·  · 2m read
 · 
I speak to many entrepreneurial people trying to do a large amount of good by starting a nonprofit organisation. I think this is often an error for four main reasons. 1. Scalability 2. Capital counterfactuals 3. Standards 4. Learning potential 5. Earning to give potential These arguments are most applicable to starting high-growth organisations, such as startups.[1] Scalability There is a lot of capital available for startups, and established mechanisms exist to continue raising funds if the ROI appears high. It seems extremely difficult to operate a nonprofit with a budget of more than $30M per year (e.g., with approximately 150 people), but this is not particularly unusual for for-profit organisations. Capital Counterfactuals I generally believe that value-aligned funders are spending their money reasonably well, while for-profit investors are spending theirs extremely poorly (on altruistic grounds). If you can redirect that funding towards high-altruism value work, you could potentially create a much larger delta between your use of funding and the counterfactual of someone else receiving those funds. You also won’t be reliant on constantly convincing donors to give you money, once you’re generating revenue. Standards Nonprofits have significantly weaker feedback mechanisms compared to for-profits. They are often difficult to evaluate and lack a natural kill function. Few people are going to complain that you provided bad service when it didn’t cost them anything. Most nonprofits are not very ambitious, despite having large moral ambitions. It’s challenging to find talented people willing to accept a substantial pay cut to work with you. For-profits are considerably more likely to create something that people actually want. Learning Potential Most people should be trying to put themselves in a better position to do useful work later on. People often report learning a great deal from working at high-growth companies, building interesting connection
 ·  · 31m read
 · 
James Özden and Sam Glover at Social Change Lab wrote a literature review on protest outcomes[1] as part of a broader investigation[2] on protest effectiveness. The report covers multiple lines of evidence and addresses many relevant questions, but does not say much about the methodological quality of the research. So that's what I'm going to do today. I reviewed the evidence on protest outcomes, focusing only on the highest-quality research, to answer two questions: 1. Do protests work? 2. Are Social Change Lab's conclusions consistent with the highest-quality evidence? Here's what I found: Do protests work? Highly likely (credence: 90%) in certain contexts, although it's unclear how well the results generalize. [More] Are Social Change Lab's conclusions consistent with the highest-quality evidence? Yes—the report's core claims are well-supported, although it overstates the strength of some of the evidence. [More] Cross-posted from my website. Introduction This article serves two purposes: First, it analyzes the evidence on protest outcomes. Second, it critically reviews the Social Change Lab literature review. Social Change Lab is not the only group that has reviewed protest effectiveness. I was able to find four literature reviews: 1. Animal Charity Evaluators (2018), Protest Intervention Report. 2. Orazani et al. (2021), Social movement strategy (nonviolent vs. violent) and the garnering of third-party support: A meta-analysis. 3. Social Change Lab – Ozden & Glover (2022), Literature Review: Protest Outcomes. 4. Shuman et al. (2024), When Are Social Protests Effective? The Animal Charity Evaluators review did not include many studies, and did not cite any natural experiments (only one had been published as of 2018). Orazani et al. (2021)[3] is a nice meta-analysis—it finds that when you show people news articles about nonviolent protests, they are more likely to express support for the protesters' cause. But what people say in a lab setting mig
Relevant opportunities