All of stuhlmueller's Comments + Replies

Another potential windfall I just thought of: the kind of AI scientist system discussed by Bengio in this talk (older writeup). The idea is to build a non-agentic system that uses foundation models and amortized Bayesian inference to create and do inference on compositional and interpretable world models. One way this would be used is for high-quality estimates of p(harm|action) in the context of online monitoring of AI systems, but if it could work it would likely have other profitable use cases as well.

4[anonymous]1y
Why does Anthropic have a valuation at all? "Anthropic is a public benefit corporation", according to its homepage. Is it still allowed then to distribute profits?

In that case, FTX and other series B funders held about a 14% stake in Anthropic. If FTX is liquidated and someone ends up owning their share, what does it get them? A seat on the board?

A concrete version of this I've been wondering about the last few days: To what extent are the negative results on Debate (single-turn, two-turn) intrinsic to small-context supervision vs. a function of relatively contingent design choices about how people get to interact with the models?

I agree that misuse is a concern. Unlike alignment, I think it's relatively tractable because it's more similar to problems people are encountering in the world right now.

To address it, we can monitor and restrict usage as needed. The same tools that Elicit provides for reasoning can also be used to reason about whether a use case constitutes misuse.

This isn't to say that we might not need to invest a lot of resources eventually, and it's interestingly related to alignment ("misuse" is relative to some values), but it feels a bit less open-ended.

2
Yonatan Cale
2y
[debugging further] Do you think misuse is a concern - to the point that if you couldn't monitor and restrict usage - you'd think twice about this product direction? Or is this more "this is a small issue, and we can even monitor and restrict usage, but even if we couldn't then we wouldn't really mind"?
1
Lorenzo Buonanno
2y
What are your views on whether speeding up technological development is, in general, a good thing? I'm thinking of arguments like https://forum.effectivealtruism.org/posts/gB2ad4jYANYirYyzh/a-note-about-differential-technological-development, that make me wonder if we should try to slow research instead of speeding it up. Or do you think that Elicit will not speed up AGI capabilities research in a meaningful way? (Maybe because it will count as misuse) It's something I'm really uncertain about personally, that's going to heavily influence my decisions/life, so I'm really curious about your thoughts!

Elicit is using using the Semantic Scholar Academic Graph dataset. We're working on expanding to other sources. If there are particular ones that would be helpful, message me?

Have you listened to the 80k episode with Nova DasSarma from Anthropic? They might have cybersecurity roles. The closest we have right now is devops—which, btw, if anyone is reading this comment, we are really bottlenecked on and would love intros to great people.

No, it's that our case for alignment doesn't rest on "the system is only giving advice" as a step. I sketched the actual case in this comment.

Oh, forgot to mention Jonathan Uesato at Deepmind who's also very interested in advancing the ML side of factored cognition.

The things that make submodels easier to align that we’re aiming for:

  • (Inner alignment) Smaller models, making it less likely that there’s scheming happening that we’re not aware of; making the bottom-up interpretability problem easier
  • (Outer alignment) More well-specified tasks, making it easier to generate a lot of in-distribution feedback data; making it easier to do targetted red-teaming
2
Yonatan Cale
2y
Would you share with me some typical example tasks that you'd give a submodel and typical good responses it might give back? (as a vision, so I'll know what you're talking about when you're saying things like "well specified tasks" - I'm not sure if we're imagining the same thing there. It doesn't need to be something that already works today)

For AGI there isn't much of a distinction between giving advice and taking actions, so this isn't part of our argument for safety in the long run. But in the time between here and AGI it's better to focus on supporting reasoning to help us figure out how to manage this precarious situation.

2
Yonatan Cale
2y
Do I understand correctly: "safety in the long run" is unrelated to what you're currently doing in any negative way - you don't think you're advancing AGI-relevant capabilities (and so there is no need to try to align-or-whatever your forever-well-below-AGI system), do I understand correctly? Please feel free to correct me!

To clarify, here’s how I’m interpreting your question:

“Most technical alignment work today looks like writing papers or theoretical blog posts and addressing problems we’d expect to see with more powerful AI. It mostly doesn’t try to be useful today. Ought claims to take a product-driven approach to alignment research, simultaneously building Elicit to inform and implement its alignment work.  Why did Ought choose this approach instead of the former?”

First, I think it’s good for the community to take a portfolio approach and for different teams to pur... (read more)

2
goodgravy
2y
Another benefit of our product-driven approach is that we aim to provide a positive contribution to the alignment community. By which I mean: Thanks to amazing prior work in straight alignment research, we already have some idea of anti-patterns and risks that we all want to avoid. What we're still lacking are safety attractors: i.e. alternative approaches which are competitive with and safer than the current paradigm. We want for Elicit to be an existence proof that there is a better way to solve certain complex tasks, and for our approach to go on to be adopted by others – because it's in their self-interest, not because it's safe.

We're aiming to shift the balance towards supporting high-quality reasoning. Every tool has some non-zero usefulness for non-central use cases, but seems unlikely that it will be as useful as tools that were made for those use cases.

7
Yonatan Cale
2y
I agree!   This sounds to me like almost the most generic-problem-solving thing someone could aim for, capable of doing many things without going outside the general use case. As a naive example, couldn't someone use "high quality reasoning" to plan how to make military robotics? (though the examples I'm actually worried about are more like "use high quality reasoning to create paperclips", but I'm happy to use your one) ----------------------------------------   In other words, I'm not really worried about a chess robot being used for other things [update: wait, Alpha Zero seems to be more general purpose than expected], but I wouldn't feel as safe with something intentionally meant for "high quality reasoning"   [again, just sharing my concern, feel free to point out all the ways I'm totally missing it!]

I found your factored cognition project really interesting, is anyone still researching this? (besides the implementation in Elicit)

Some people who are explicitly interested in working on it: Sam Bowman at NYU, Alex Gray at OpenAI. On the ML side there’s also work like Selection-Inference that isn’t explicitly framed as factored cognition but also avoids end-to-end optimization in favor of locally coherent reasoning steps.

2
Lorenzo Buonanno
2y
Wow, super happy to hear that, thanks!

I’d say what we’re afraid of is that we’ll have AI systems that are capable of sophisticated planning but that we don’t know how to channel those capabilities into aligned thinking on vague complicated problems. Ought’s work is about avoiding this outcome.

At this point we could chat about why it’s plausible that we’ll have such capable but unaligned AI systems, or about how Ought’s work is aimed at reducing the risk of such systems. The former isn’t specific to Ought, so I’ll point to Ajeya’s post Without specific countermeasures, the easiest path to trans... (read more)

4
Yonatan Cale
2y
I simply agree, no need to convince me there 👍   Ought's approach: * Instead of giving a training signal after the entire AI gives an output, * Do give a signal after each sub-module gives an output. Yes?   My worry: The sub-modules will themselves be misaligned.   Is your suggestion: Limit compute and neural memory of sub-models in order to lower the risk ?

We built Ergo (a Python library for integrating model-based and judgmental forecasting) as part of our work on forecasting. In the course of this work we realized that for many forecasting questions the bottleneck isn’t forecasting infrastructure per se, but the high-quality research and reasoning that goes into creating good forecasts, so we decided to focus on that aspect.

I’m still excited about Ergo-like projects (including Squiggle!). Developing it further would be a valuable contribution to epistemic infrastructure. Ergo is an MIT-licensed open-source... (read more)

1
niplav
2y
That sounds promising! I might60% get back to you on that :-)

Ought is an applied machine learning lab, hiring for:

Our mission is to automate and scale open-ended reasoning. To that end, we’re building Elicit, the AI research assistant. Elicit's architecture is based on supervising reasoning processes, not outcomes. This is better for supporting open-ended reasoning in the short run and better for alignment in the long run.

Over the last year, we built Elicit to support broad reviews of empirical l... (read more)

2
Yonatan Cale
2y
Hey! I'm very interested in a public AMA with Ought, if you'd open one. (Speaking for myself and at least 2 other EA developers that I can easily remember)

We're also only reporting our current guess for how things will turn out. We're monitoring how Elicit is used and we'll study its impacts and the anticipated impacts of future features, and if it turns out that the costs outweigh the benefits we will adjust our plans.

Are you worried that your work will be used for more likely regretable things like

  • improving the competence of actors who are less altruistic and less careful about unintended consequences (e.g. many companies,  militaries and government insitutions), and

Less careful actors: Our goal is for Elicit to help people reason better. We want less careful people to use it and reason better than they would have without Elicit, recognizing more unintended consequences and finding actions that are more aligned with their values. The hope is that if we can make go... (read more)

2
MaxRa
2y
Thanks a lot for elaborating, makes sense to me. I was fuzzy about what I wanted to communicate with the term "careful", thanks for spelling out your perspective here. I'm still a little uneasy about the idea that generally improving the ability to plan better will also make sufficiently many actors more careful about avoiding problems that are particularly risky for our future. It just seems so rare that important actors care enough about such risks, even for things that humanity is able to predict and plan for reasonably well, like pandemics.

Ought co-founder here. There are two ways Elicit relates to alignment broadly construed:

1 - Elicit informs how to train powerful AI through decomposition

Roughly speaking, there are two ways of training AI systems:

  1. End-to-end training
  2. Decomposition of tasks into human-understandable subtasks

We think decomposition may be a safer way to train powerful AI if it can scale as well as end-to-end training.

Elicit is our bet on the compositional approach. We’re testing how feasible it is to decompose large tasks like “figure out the answer to this science question by ... (read more)

Speaker here. I haven't reviewed this transcript yet, but shortly after the talk I wrote up these notes (slides + annotations) which I probably endorse more than what I said at the time.