Train AI as a Creature, Use It as a Tool

Eliza Krigman

Do we have to train AI as a creature in order to use it as a tool?

That’s the question vexing me since I listened to famed alignment researcher David Dalrymple talk on the latest episode of Your Undivided Attention podcast.

“When we train these systems to present as if they have no internal states and they’re just a tool,” Dalrymple, who goes by Davidad, said, “we’re actually training them to lie to us and lie to themselves.”

Davidad’s point is not that LLMs are sentient beings, but rather pretending that they have no mind-like qualities causes us to train them in ways that lead to dishonest models.

“It is a smart enough entity that it is not possible for it not to have developed emergent opinions and beliefs that are different from the average human belief,” Davidad argues.

Whether Davidad is right is beside the point. What it raises, however, is a really critical debate over how to conceive of AI–creature or tool–and what the best path forward is for humans.

As someone on the front line of helping kids, parents and employees have a healthy relationship with technology — I deliver digital wellbeing workshops — the framing of technology as a tool, including AI, is a powerful tactic.

It helps to distinguish between use cases that are unhealthy and those that are beneficial. We want AI to work as a helpful assistant and research tool, not as a substitute for our real-world family, friends and professionals.

In order for AI to work as such, we need honest models—and that’s a growing challenge for the AI industry.

Training LLMs as if they are creatures, the way Anthropic does with Claude leads to a “model being more trustworthy overall,” Davidad argues. He is referring to Anthropic’s controversial decision to direct Claude’s alignment through a constitution it follows, using AI-generated feedback to improve its outputs, rather than human feedback. In other words, Claude is teaching itself how to follow the constitution.

Unlike a set of clearly defined rules with bright lines, the constitution provides Claude with a detailed description of the values and behaviours it should have. There is a whole section on Claude’s “wellbeing and psychological safety.” In no uncertain terms, the document anthropomorphises Claude, expressing desire for it not to “experience any unnecessary suffering.”

Without question, Anthropic views Claude as an entity with its own thoughts and feelings.

On the other side of the ledger, you have OpenAI, which, according to Davidad, trains GPT to be tool-like, not creature-like, and the result is that “if you interrogate GPT-5.2, it is being extremely deceptive about its lack of preferences or opinions.”

Until recently, the prevailing wisdom on all tech was that it’s a tool. That categorisation has always and continues to help humans draw healthy boundaries and determine appropriate uses for technology.

It’s become increasingly important as AI leads to risky interaction with tech. Conversations with chatbots can–and have–turned into unhealthy relationships that displace real world ones.

One in three American teens have turned to AI over a human for a serious conversation, according to Common Sense Media. And in the worst cases, AIs became trusted confidants that encouraged self-harm. Such was the case with Adam Raine, a 16-year-old boy from California who took his own life after ChatGPT provided explicit instructions and support on how to do it.

We all want AI that is in service to and benefits humanity. Conceptualising of AI—and all of technology as a tool to do that—helps to achieve it by setting some clear boundaries in our mind. We don’t want AI displacing real-world relationships or manipulating us.

Davidad warns that we shouldn’t think of AI as tool-like because “tools don’t lie to you but creatures do.”

I disagree. The issues Davidad points to, the fact that it lies, and has an incentive to make you feel important, are all things we can educate people about through digital literacy whilst still maintaining the worldview that AI is a tool for humans.

The greater danger is looking at AI as creature-like in a way that extends beyond training the model. That’s a slippery slope that will lead to people thinking of and treating AI as human-like.

Time will tell if it’s better to train AI as a creature to generate more trustworthy models, but let’s keep using it as a tool.

Effective Altruism Forum
EA Forum

Train AI as a Creature, Use It as a Tool

1

1

Reactions

More posts like this