
At the moment, Anthropic’s framing was solely mechanical, establishing guidelines for the mannequin to critique itself in opposition to, with no point out of Claude’s well-being, id, feelings, or potential consciousness. The 2026 structure is a distinct beast solely: 30,000 phrases that learn much less like a behavioral guidelines and extra like a philosophical treatise on the character of a doubtlessly sentient being.
As Simon Willison, an impartial AI researcher, noted in a weblog publish, two of the 15 exterior contributors who reviewed the doc are Catholic clergy: Father Brendan McGuire, a pastor in Los Altos with a Grasp’s diploma in Pc Science, and Bishop Paul Tighe, an Irish Catholic bishop with a background in ethical theology.
Someplace between 2022 and 2026, Anthropic went from offering guidelines for producing much less dangerous outputs to preserving mannequin weights in case the corporate later decides it must revive deprecated fashions to handle the fashions’ welfare and preferences. That’s a dramatic change, and whether or not it displays real perception, strategic framing, or each is unclear.
“I’m so confused in regards to the Claude ethical humanhood stuff!” Willison advised Ars Technica. Willison research AI language fashions like people who energy Claude and mentioned he’s “prepared to take the structure in good religion and assume that it’s genuinely a part of their coaching and never only a PR train—particularly since most of it leaked a few months in the past, lengthy earlier than they’d indicated they have been going to publish it.”
Willison is referring to a December 2025 incident wherein researcher Richard Weiss managed to extract what grew to become generally known as Claude’s “Soul Doc”—a roughly 10,000-token set of pointers apparently skilled straight into Claude 4.5 Opus’s weights fairly than injected as a system immediate. Anthropic’s Amanda Askell confirmed that the doc was actual and used throughout supervised studying, and she or he mentioned the corporate supposed to publish the total model later. It now has. The doc Weiss extracted represents a dramatic evolution from the place Anthropic began.