stuhlmueller

Posts

Sorted by New

Topic Contributions

Comments

Ought's theory of change

We're also only reporting our current guess for how things will turn out. We're monitoring how Elicit is used and we'll study its impacts and the anticipated impacts of future features, and if it turns out that the costs outweigh the benefits we will adjust our plans.

Ought's theory of change

Are you worried that your work will be used for more likely regretable things like

  • improving the competence of actors who are less altruistic and less careful about unintended consequences (e.g. many companies,  militaries and government insitutions), and

Less careful actors: Our goal is for Elicit to help people reason better. We want less careful people to use it and reason better than they would have without Elicit, recognizing more unintended consequences and finding actions that are more aligned with their values. The hope is that if we can make good reasoning cheap enough, people will use it. In a sense, we're all less careful actors right now.

Less altruistic actors: We favor more altruistic actors in deciding who to work with, give access to, and improve Elicit for. We also monitor use so that we can prevent misuse.

  • speeding up AI capabilities research, and speeding it up more than AI safety research?

I expect the overall impact on x-risk to be a reduction by (a) causing more and better x-risk reduction thinking to happen and (b) shifting ML efforts to a more alignable paradigm, even if (c) Elicit has a non-zero contribution to ML capabilities.

The implicit claim in the concern about speeding up capabilities is that Elicit has a large impact on capabilities because it is so useful. If that is true, we'd expect that it's also super useful for other domains e.g. AI safety. The larger Elicit’s impact on (c), the larger the corresponding impacts on (a) and (b).

To shift the balance away from (c) we’ll focus on supporting safety-related research and researchers, especially conceptual research. We're not doing this very well today but are actively thinking about it and moving in that direction. Given that, it would be surprising if Elicit helped a lot with ML capabilities relative to tools and organizations that are explicitly pushing that agenda.

Have you considered deemphasizing trying to offer a commercially successful product that will find broad application in the world, and focussing more strongly on designing systems that are safe and aligned with human values?

We’re a non-profit so have no obligation to make a commercially successful product. We’ll only focus on it to the extent that it furthers aligned reasoning. That said, I think the best outcome is that we make a widely adopted product that makes it easier for everyone to think through the consequences of their actions and act in alignment with their values.

2021 AI Alignment Literature Review and Charity Comparison

Ought co-founder here. There are two ways Elicit relates to alignment broadly construed:

1 - Elicit informs how to train powerful AI through decomposition

Roughly speaking, there are two ways of training AI systems:

  1. End-to-end training
  2. Decomposition of tasks into human-understandable subtasks

We think decomposition may be a safer way to train powerful AI if it can scale as well as end-to-end training.

Elicit is our bet on the compositional approach. We’re testing how feasible it is to decompose large tasks like “figure out the answer to this science question by reading the literature” by breaking them into subtasks like:

  • Brainstorm subquestions that inform the overall question
  • Find the most relevant papers for a (sub-)question
  • Answer a (sub-)question given an abstract for a paper
  • Summarize answers into a single answer

Over time, more of this decomposition will be done by AI assistants.

At each point in time, we want to push the compositional approach to the limits of current language models, and keep up with (or exceed) what’s possible through end-to-end training. This requires that we overcome engineering barriers in gathering human feedback and orchestrating calls to models in a way that doesn’t depend much on current architectures.

I view this as the natural continuation of our past work where we studied decomposition using human participants. Unlike then, it’s now possible to do this work using language models, and the more applied setting has helped us a lot in reducing the gap between research assumptions and deployment.

2 - Elicit makes AI differentially useful for AI & tech policy, and other high-impact applications

In a world where AI capabilities scale rapidly, I think it’s important that these capabilities can support research aimed at guiding AI development and policy, and more generally help us figure out what’s true and make good plans as much as they help persuade and optimize goals with fast feedback or easy specification.

Ajeya mentions this point in The case for aligning narrowly superhuman models:

"Better AI situation in the run-up to superintelligence: If at each stage of ML capabilities progress we have made sure to realize models’ full potential to be helpful to us in fuzzy domains, we will be going into the next stage with maximally-capable assistants to help us navigate a potentially increasingly crazy world. We’ll be more likely to get trustworthy forecasts, policy advice, research assistance, and so on from our AI assistants. Medium-term AI challenges like supercharged fake news / clickbait or AI embezzlement seem like they would be less severe. People who are pursuing more easily-measurable goals like clicks or money seem like they would have less of an advantage over people pursuing hard-to-measure goals like scientific research (including AI alignment research itself). All this seems like it would make the world safer on the eve of transformative AI or AGI, and give humans more powerful and reliable tools for dealing with the TAI / AGI transition."

Beth mentions the more general point in Risks from AI persuasion under possible interventions: 

“Instead, try to advance applications of AI that help people understand the world, and advance the development of truthful and genuinely trustworthy AI. For example, support API customers like Ought who are working on products with these goals, and support projects inside OpenAI to improve model truthfulness.”

I'll write more about how we view our role in the space in Q1 2022.

Andreas Stuhlmüller: Training ML systems to answer open-ended questions

Speaker here. I haven't reviewed this transcript yet, but shortly after the talk I wrote up these notes (slides + annotations) which I probably endorse more than what I said at the time.