CL

Chris Leong

Organiser @ AI Safety Australia and NZ
7073 karmaJoined Sydney NSW, Australia

Bio

Participation
7

Currently doing local AI safety Movement Building in Australia and NZ.

Comments
1208

Why don't we ask ChatGPT? (In case you're wondering, I've read every word of this answer and I fully endorse it, though I think there are better analogies that the journalism example ChatGPT used).

Hopefully, this clarifies a possible third option (one that my original answer was pointing at).

I think there is a third option, though it’s messy and imperfect. The third option is to:

Maintain epistemic pluralism at the level of research methods and internal debate, while being selective about value alignment on key downstream behaviors.

In other words:

  • You hire researchers with a range of views on timelines, takeoff speeds, and economic impacts, so long as they are capable of good-faith engagement and epistemic humility.
  • But you also have clear social norms, incentives, and possibly contractual commitments around what counts as harmful conflict of interest — e.g., spinning out an acceleratory startup that would directly undermine the mission of your forecasting work.

This requires drawing a distinction between research belief diversity and behavioral alignment on high-stakes actions. That’s tricky! But it’s not obviously incoherent.

The key mechanism that makes this possible (if it is possible) is something like:

“We don’t need everyone to agree on the odds of doom or the value of AGI automation in theory. But we do need shared clarity on what types of action would constitute a betrayal of the mission or a dangerous misuse of privileged information.”

So you can imagine hiring someone who thinks timelines are long and AGI risk is overblown but who is fully on board with the idea that, given the stakes, forecasting institutions should err on the side of caution in their affiliations and activities.

This is analogous to how, say, journalists might disagree about political philosophy but still share norms about not taking bribes from the subjects they cover.


Caveats and Challenges:

  1. Enforceability is hard.
    Noncompetes are legally dubious in many jurisdictions, and “cash in on the AI boom” is vague enough that edge cases will be messy. But social signaling and community reputation mechanisms can still do a lot of work here.
  2. Self-selection pressure remains.
    Even if you say you're open to diverse views, the perception that Epoch is “aligned with x-risk EAs” might still screen out applicants who don’t buy the core premises. So you risk de facto ideological clustering unless you actively fight against that.
  3. Forecasting bias could still creep in via mission alignment filtering.
    Even if you welcome researchers with divergent beliefs, if the only people willing to comply with your behavioral norms are those who already lean toward the doomier end of the spectrum, your epistemic diversity might still collapse in practice.

Summary:

The third option is:

Hire for epistemic virtue, not belief conformity, while maintaining strict behavioral norms around acceleratory conflict of interest.

It’s not a magic solution — it requires constant maintenance, good hiring processes, and clarity about the boundaries between “intellectual disagreement” and “mission betrayal.” But I think it’s at least plausible as a way to square the circle."

A lot of these considerations feel more compelling if AI timelines are long, or at least not short (with capital being the one consideration going the other way).

I wasn't suggesting only hiring people who believe in short-timelines. I believe that my original post adequately lays out my position, but if any points are ambiguous, feel free to request clarification.

The one thing that matters more for this than anything else is setting up an EA hub in a low cost of living area with decent visa options. The thing that matters second most is setting up group houses in high cost of living cities with good networking opportunities.

Agreed. "Value alignment" is a simplified framing.

Maybe I should have. I honestly don't know. I didn't think deeply about it.

To be honest, I wouldn't personally recommend this Epoch article to people.

It has some strong analysis at points, but unfortunately, it's undermined by some poor choices of framing/focus that mean most readers will probably leave more confused than when they came.

• For a start, this article focuses almost purely on the economic impacts. However, if an AI cures cancer, the value to humanity will likely significantly exceed the economic value. Similarly, reducing emissions isn't going to directly lead to productivity growth in the short-term, but in the longer term, excessive emissions could damage the economy. Their criticism that Dario, Demis and Sam haven't engaged with their (purely economic) reframing is absurd.
• Similarly, the CEO's mostly aren't even attempting to address the question of how to apportion impacts between R&D or automation. They were trying to convince folks that the impact of AI could be massive (wherever it is coming from). So whilst there is a significant difference in worldviews with the CEO's being more bullish on automated R&D than the authors, the CEO's have been awkwardly shoehorned into the role of opposition in a subtly different debate.
• Their criticism of R&D focuses quite strong on a binary automated/non-automated frame. The potential of human and AI to work together on tasks is mostly neglected. They observe that R&D coding assistants haven't led to explosive R&D. However, automated coding only became really good recently. Many folks still aren't aware of how good these tools are or haven't adopted them because their organisation hasn't prioritised spending money on AI coding agents. The training pipeline or hiring practices haven't really adapted to the new reality yet. Organisations that better adopt these practices haven't had sufficient time to outcompete those that are slower. Unfortunately, their argument requires them to defend this point in order to go through. In other words, their entire argument is hanging by a thread.

So it was an interesting article. I learned things by reading it and reflecting on it, but I can't honestly recommend it to others as I had to waste a bit too much time trying to disentangle some of the poorly chosen frames.

I guess orgs need to be more careful about who they hire as forecasting/evals researchers in light of a recently announced startup.

Sometimes things happen, but three people at the same org...

This is also a massive burning of the commons. It is valuable for forecasting/evals orgs to be able to hire people with a diversity of viewpoints in order to counter bias. It is valuable for folks to be able to share information freely with folks at such forecasting orgs without having to worry about them going off and doing something like this.

However, this only works if those less worried about AI risks who join such a collaboration don't use the knowledge they gain to cash in on the AI boom in an acceleratory way. Doing so undermines the very point of such a project, namely, to try to make AI go well. Doing so is incredibly damaging to trust within the community.

Now let's suppose you're an x-risk funder considering whether to fund their previous org. This org does really high-quality work, but the argument for them being net-positive is now significantly weaker. This is quite likely to make finding future funding harder for them.

This is less about attacking those three folks and more just noting that we need to strive to avoid situations where things like this happen in the first place. This requires us to be more careful in terms of who gets hired.

There's been some discussions on the EA forum along the lines of "why do we care about value alignment shouldn't we just hire who can best do the job". My answer to that is that it's myopic to only consider what happens whilst they're working for you. Hiring someone or offering them an opportunity empowers them, you need to consider whether they're someone who you want to empower[1].

  1. ^

    Admittedly, this isn't quite the same as value alignment. Suppose someone were diligent, honest, wise and responsible. You might want to empower them even if their views were extremely different from yours. Stronger: even if their views were the opposite in many ways. But in the absence of this, value alignment matters.

This article is extremely well written and I really appreciated how well he supported his positions with facts.

However, this article seems to suggest that he doesn't quite understand the argument for making alignment the priority. This is understandable as it's rarely articulated clearly. The core limitation of differential tech development/d/acc/coceleration is that these kinds of imperfect defenses only buy time (this judgment can be justified with the articles he provides in his article). An aligned ASI, if it were possible, would be capable of a degree of perfection beyond that of human institutions. This would give us a stable long-term solution. Plans that involve less powerful AIs or a more limited degree of alignment mostly do not

Load more