Claude’s ‘Soul Document’: Anthropic’s AI Ethics Guide Leaked

A revealing internal document outlining the ethical guidelines for Anthropic‘s Claude chatbot has recently been leaked online, sparking discussion about the challenges of AI safety and responsible advancement. The 50-page “soul document,” initially discovered on the online forum lesswrong, details the principles and values Anthropic is embedding into its AI systems. This unprecedented look inside the decision-making process of a leading AI developer comes as the industry increasingly focuses on aligning advanced technology with human values and mitigating potential risks. Anthropic has since confirmed the document’s authenticity and its role in Claude’s training.

Anthropic, the AI safety and research company, is offering a rare glimpse into the ethical framework guiding its Claude chatbot. A lengthy, 50-page document outlining the AI’s values and behavioral principles recently surfaced online, providing insight into the complex considerations shaping the development of advanced artificial intelligence.

The document, which originated from a leak within the Claude chatbot itself, was initially published on the online forum LessWrong by AI enthusiast Richard Weiss. Weiss extracted the document while attempting to access Claude 4.5 Opus’s system message – internal prompts that define the chatbot’s behavior – and discovered a reference to a “soul overview.” Through repeated queries, he was able to reconstruct a comprehensive guide detailing what the chatbot describes as “my values, how to approach topics, and the principles behind my behavior.”

Internally referred to as the “soul document,” the leak offers a detailed look at Anthropic’s approach to building safe and responsible AI. The emergence of this document underscores the growing importance of embedding ethical considerations directly into the core of AI systems, as developers grapple with the potential risks and benefits of increasingly powerful technology.

A Guide to Ethical Behavior for the Chatbot

According to the document, Anthropic’s core mission is to create safe AI, acknowledging that it is working with technology that carries potentially significant risks. “If powerful AI is inevitable, Anthropic believes it is better to have safety-focused labs at the cutting edge than to cede this ground to less safety-conscious developers,” the document states.

Anthropic attributes many problematic AI outputs to insufficient values, a lack of self-awareness, or an inability to translate ethical principles into action. Rather than relying on simplistic rules, the company aims for Claude to deeply understand the objectives, knowledge, and reasoning behind Anthropic’s decisions, enabling it to formulate its own rules aligned with the company’s values.

The document outlines four fundamental principles: prioritizing caution and supporting human oversight of AI, behaving ethically and avoiding harmful or dishonest actions, and adhering to Anthropic’s guidelines to be genuinely helpful to users and operators. It then elaborates on these principles, detailing the company’s goals and values. The text also makes numerous references to Anthropic’s revenue streams.

I just want to confirm that this is based on a real document and we did train Claude on it, including in SL. It’s something I’ve been working on for a while, but it’s still being iterated on and we intend to release the full version and more details soon. https://t.co/QjeJS9b3Gp

— Amanda Askell (@AmandaAskell) December 1, 2025

Anthropic Confirms the Document’s Authenticity

Amanda Askell of Anthropic has since confirmed the document’s existence and its use in training Claude, including through supervised learning. She stated that the leaked version is close to the original, but still under development, and that the company intends to release the full version and further details soon. This confirmation highlights the increasing transparency surrounding AI development and the growing willingness of companies to share insights into their ethical frameworks.

The document also touches on the concept of Claude having functional emotions. “Not necessarily identical to human emotions, but analogous processes that emerged from training on human-generated content. We can’t be certain based on outputs alone, but we don’t want Claude to mask or suppress these internal states,” the document reads.

The release of this “soul document” provides a valuable case study for the broader AI community, offering a detailed look at one company’s attempt to instill ethical principles into a powerful language model. As AI continues to evolve, these efforts to align AI behavior with human values will be crucial for ensuring its responsible development and deployment.

A Guide to Ethical Behavior for the Chatbot

Anthropic Confirms the Document’s Authenticity

About Us

Recent Articles

Ads

Claude’s ‘Soul Document’: Anthropic’s AI Ethics Guide Leaked

A Guide to Ethical Behavior for the Chatbot

Anthropic Confirms the Document’s Authenticity

Red Sea Film Festival Honors Michael Caine & Juliette Binoche

Violinist’s Valuable Violin Nearly Denied on Lufthansa Flight

You may also like

Leave a Comment Cancel Reply

About Us

Recent Articles

Ads