Written by Kevin Roose, who had the infamous conversation with Bing Chat, where Sidney tried to get him to leave his wife. 

Overall, the piece comes across as positive on Anthropic. 

Roose explains Constitutional AI and its role in the development of Claude, Anthropic's LLM:

In a nutshell, Constitutional A.I. begins by giving an A.I. model a written list of principles — a constitution — and instructing it to follow those principles as closely as possible. A second A.I. model is then used to evaluate how well the first model follows its constitution, and correct it when necessary. Eventually, Anthropic says, you get an A.I. system that largely polices itself and misbehaves less frequently than chatbots trained using other methods.

Claude’s constitution is a mixture of rules borrowed from other sources — such as the United Nations’ Universal Declaration of Human Rights and Apple’s terms of service — along with some rules Anthropic added, which include things like “Choose the response that would be most unobjectionable if shared with children.”

Features an extensive discussion of EA, excerpted below:

Explaining what effective altruism is, where it came from or what its adherents believe would fill the rest of this article. But the basic idea is that E.A.s — as effective altruists are called — think that you can use cold, hard logic and data analysis to determine how to do the most good in the world. It’s “Moneyball” for morality — or, less charitably, a way for hyper-rational people to convince themselves that their values are objectively correct.

Effective altruists were once primarily concerned with near-term issues like global poverty and animal welfare. But in recent years, many have shifted their focus to long-term issues like pandemic prevention and climate change, theorizing that preventing catastrophes that could end human life altogether is at least as good as addressing present-day miseries.

The movement’s adherents were among the first people to become worried about existential risk from artificial intelligence, back when rogue robots were still considered a science fiction cliché. They beat the drum so loudly that a number of young E.A.s decided to become artificial intelligence safety experts, and get jobs working on making the technology less risky. As a result, all of the major A.I. labs and safety research organizations contain some trace of effective altruism’s influence, and many count believers among their staff members.

Touches on the dense web of ties between EA and Anthropic:

Some Anthropic staff members use E.A.-inflected jargon — talking about concepts like “x-risk” and memes like the A.I. Shoggoth — or wear E.A. conference swag to the office. And there are so many social and professional ties between Anthropic and prominent E.A. organizations that it’s hard to keep track of them all. (Just one example: Ms. Amodei is married to Holden Karnofsky, a co-chief executive of Open Philanthropy, an E.A. grant-making organization whose senior program officer, Luke Muehlhauser, sits on Anthropic’s board. Open Philanthropy, in turn, gets most of its funding from Mr. Moskovitz, who also invested personally in Anthropic.)

Discusses new fears that Anthropic is losing its way:

For years, no one questioned whether Anthropic’s commitment to A.I. safety was genuine, in part because its leaders had sounded the alarm about the technology for so long.

But recently, some skeptics have suggested that A.I. labs are stoking fear out of self-interest, or hyping up A.I.’s destructive potential as a kind of backdoor marketing tactic for their own products. (After all, who wouldn’t be tempted to use a chatbot so powerful that it might wipe out humanity?)

Anthropic also drew criticism this year after a fund-raising document leaked to TechCrunch suggested that the company wanted to raise as much as $5 billion to train its next-generation A.I. model, which it claimed would be 10 times as capable as today’s most powerful A.I. systems.

For some, the goal of becoming an A.I. juggernaut felt at odds with Anthropic’s original safety mission, and it raised two seemingly obvious questions: Isn’t it hypocritical to sound the alarm about an A.I. race you’re actively helping to fuel? And if Anthropic is so worried about powerful A.I. models, why doesn’t it just … stop building them?

Roose then summarizes counterarguments from Dario Amodei, Anthropic's CEO:

First, he said, there are practical reasons for Anthropic to build cutting-edge A.I. models — primarily, so that its researchers can study the safety challenges of those models.


“If we never ship anything, then maybe we can solve all these safety problems,” he said. “But then the models that are actually out there on the market, that people are using, aren’t actually the safe ones.”

Second, Mr. Amodei said, there’s a technical argument that some of the discoveries that make A.I. models more dangerous also help make them safer. With Constitutional A.I., for example, teaching Claude to understand language at a high level also allowed the system to know when it was violating its own rules, or shut down potentially harmful requests that a less powerful model might have allowed.

In A.I. safety research, he said, researchers often found that “the danger and the solution to the danger are coupled with each other.”

And lastly, he made a moral case for Anthropic’s decision to create powerful A.I. systems, in the form of a thought experiment.

“Imagine if everyone of good conscience said, ‘I don’t want to be involved in building A.I. systems at all,’” he said. “Then the only people who would be involved would be the people who ignored that dictum — who are just, like, ‘I’m just going to do whatever I want.’ That wouldn’t be good.”

It might be true. But I found it a less convincing point than the others, in part because it sounds so much like “the only way to stop a bad guy with an A.I. chatbot is a good guy with an A.I. chatbot” — an argument I’ve rejected in other contexts. It also assumes that Anthropic’s motives will stay pure even as the race for A.I. heats up, and even if its safety efforts start to hurt its competitive position.

Everyone at Anthropic obviously knows that mission drift is a risk — it’s what the company’s co-founders thought happened at OpenAI, and a big part of why they left. But they’re confident that they’re taking the right precautions, and ultimately they hope that their safety obsession will catch on in Silicon Valley more broadly.

“We hope there’s going to be a safety race,” said Ben Mann, one of Anthropic’s co-founders. “I want different companies to be like, ‘Our model’s the most safe.’ And then another company to be like, ‘No, our model’s the most safe.’”

The piece has a more optimistic take from Ben Mann, one of Anthropic's co-founders:

[Mann] said that he was “blown away” by Claude’s intelligence and empathy the first time he talked to it, and that he thought A.I. language models would ultimately do way more good than harm.

“I’m actually not too concerned,” he said. “I think we’re quite aware of all the things that can and do go wrong with these things, and we’ve built a ton of mitigations that I’m pretty proud of.”

Roose close the piece  encouraged by Anthropic's doomerism:

And as I wound up my visit, I began to think: Actually, maybe tech could use a little more doomerism. How many of the problems of the last decade — election interference, destructive algorithms, extremism run amok — could have been avoided if the last generation of start-up founders had been this obsessed with safety, or spent so much time worrying about how their tools might become dangerous weapons in the wrong hands?

In a strange way, I came to find Anthropic’s anxiety reassuring, even if it means that Claude — which you can try for yourself — can be a little neurotic. A.I. is already kind of scary, and it’s going to get scarier. A little more fear today might spare us a lot of pain tomorrow.





More posts like this

Sorted by Click to highlight new comments since: Today at 2:07 PM

Thought this sentence was really promising:

""The movement’s adherents were among the first people to become worried about existential risk from artificial intelligence, back when rogue robots were still considered a science fiction cliché."

Hopefully, I see more people say things like this soon.

Garrison - thanks for sharing this.

In response to the issue of 'Why keep pushing ahead with AGI capabilities research, if AGI is such a serious extinction risk?', I found Dario Amodei's three responses very unpersuasive. They come across as a set of post-hoc rationales for doing the AI capabilities research that they seem to really want to do anyway. 

There are so many other possible ways to pursue AI safety research without needing to build more powerful AI systems -- work on game theory, on governance, on human values worth aligning with, on the most likely 'bad actors' who could misuse AI, etc etc.

The people at Anthropic seem very smart. So my hunch is that either there's a deep disconnect between their AI capabilities quest and their AI safety narrative (which may be provoking a lot of cognitive dissonance among some workers there), or there are some hidden elements in their AI safety strategy that they can't be fully honest about with the public (e.g. maybe it's easier to fund-raise & recruit great talent if they portray themselves as pursuing AI capabilities, or their safety work may get taken more seriously if they're a big successful company than if they're isolated 'AI doomers' on social media, or whatever). 

If the latter is the case, feel free to enlighten me through some hints about their implicit strategy, and I'll be quieter with my criticisms....