# Silence correctness as a specialisation of Constitutional AI alignment

*Real Signal Research · 2026-06-10*

A short essay bridging Real Signal's published metric to the broader Constitutional AI / alignment programme. The intended reader is the alignment-adjacent researcher who has read Anthropic's constitutional work, who is familiar with the case for principle-based behaviour shaping, and who has not yet encountered Real Signal. The bridge is the contribution; the destination is already familiar.

## The constitutional programme, in one paragraph

The constitutional approach to AI alignment, as articulated in the body of work published by Anthropic and developed extensively by the team around it, holds that an AI system's behaviour is shaped less effectively by behavioural fine-tuning against examples and more effectively by the system internalising a written set of principles — a constitution — against which it evaluates and revises its own outputs. The published constitution is, at the time of writing, a tens-of-thousands-of-words document covering safety, helpfulness, honesty, and a great deal more. The system reads it, refers to it, and reasons over it during generation. The result is a model whose behaviour can be governed by editing a document rather than by retraining.

The argument generalises. Wherever an AI system is asked to behave according to some principle the deployer cares about, a constitutional approach beats a pure example-based approach in three respects: it scales beyond what example coverage can reach, it is auditable in the way fine-tuning is not, and it is editable in the way training data is not. These are now widely understood as advantages of principle-based alignment in general.

## Where silence correctness fits

Real Signal operates a small slice of this design space. The platform does not train a model; it operates a deployed AI system whose output is shaped by a written doctrine of refusals — what we call the Attention Ethics Layer. The doctrine governs *whether the system speaks at all*, and the question we then ask is whether the system's decisions not to speak were the right calls given the environment that followed.

The metric is silence correctness: of the moments our system chose not to emit, how often was that the right choice against observed downstream activity? It runs against an append-only ledger, scored retrospectively, published continuously. The architecture and the definition are in the preprint at real-signal.ai/research/attention-ethics-layer.md.

The relationship to constitutional alignment is direct: the doctrine of refusals *is* a constitution, narrowed to a single domain (whether to emit) and operationalised against a measurable property of system behaviour. Read the preprint as an instance of constitutional alignment specialised to attention governance.

Three observations follow from the framing.

## Observation 1 — constitutional alignment can be measurable, not just visible

The constitutional research programme has tended to measure compliance through evaluation suites — does the model refuse what the constitution says to refuse, does it answer what the constitution says to answer, are the rates moving in the right direction. These are valuable, and they are mostly indirect. They measure adherence to the constitution as judged by another model or by human raters, against curated test sets.

Silence correctness adds a complement. It measures the consequence of one specific class of constitutional decision — the decision to remain silent — against what the world did next, on the actual deployed system, in production. There is no curated test set. There is no rater panel. There is only the ledger, the prediction, and the reveal scored against observed reality. The metric is harder to compute than evaluation-suite scores; it requires the system to be running against an environment that produces ground-truth signals (in our case, retroactively verifiable activity patterns in a Singapore neighbourhood). When that environment exists, the metric is uniquely robust: it cannot be gamed by training to it, because the score depends on what the environment does, not what the system does.

This suggests a useful frame for the broader alignment community: where a constitutional principle can be reduced to a measurable downstream consequence with a ground-truth signal, that reduction is worth doing. The metric becomes externally auditable in a way that compliance-against-constitution measurement is not.

## Observation 2 — the refusal-first formulation simplifies the alignment problem in one corner

Most alignment problems are framed as *make the model produce the right thing*. The constitutional approach softens this to *let the model produce things consistent with these principles*. Silence correctness narrows further: *measure whether the model's refusals to act were correct*.

The narrowing matters because the refusal direction has a property the emission direction does not: refusals do not require the system to know what the right answer was, only that the wrong answer was not produced. The ground truth of *we did not say anything, and nothing important happened in the next four hours* is a far simpler signal than *we said X, and X turned out to be the right thing to say*. Refusal-correctness is a tractable subset of alignment.

This does not mean refusal-correctness solves alignment. It means there is a measurable, auditable, ground-truth-tied corner of the broader alignment problem that is shippable today, and a research community could benefit from treating that corner as a starting point rather than a special case.

## Observation 3 — production constitutional alignment looks different from research-bench constitutional alignment

A constitution evaluated on a research bench is a few thousand words evaluated against thousands of curated test prompts. A constitution operating in production is a few thousand words enforced at runtime against an actual user environment, with all the messiness that implies — partial observability, distribution shift, evolving operator priorities, the user's own definition of what they wanted shifting from session to session.

Real Signal is a production deployment of constitutional alignment. The doctrine of refusals is operationalised in code, in CI tests, in database constraints. The voice lock that gates every output through a banned-vocabulary regex is the constitutional principle compiled to a runtime check. The seven-gate cascade is the constitutional reasoning compiled to a decision tree. The append-only predictions ledger is the constitutional compliance audit, externalised. All of this is publicly inspectable at real-signal.ai.

This is the work that the research bench tends to abstract away. There is value in surfacing it. Where a research paper proves a constitutional approach works *in principle* against a benchmark, a production deployment shows what the constitution looks like when it has to survive *six months of an actual deployment environment* — and where it strains, and what gets rewritten, and what gets enforced at additional layers because the original principle was not strong enough on its own.

We offer the Real Signal substrate as a reference case for that strain. The MCP server at real-signal.ai/api/mcp exposes the underlying ledger as read-only tools. The trust dashboard at real-signal.ai/trust publishes the constitutional compliance numbers as they accumulate. The benchmark at real-signal.ai/benchmark invites other AI systems to submit predictions and be scored against the same methodology.

## What this essay claims

Three things, briefly:

1. *Silence correctness is a measurable specialisation of constitutional alignment*, narrowed to the refusal direction, with an externally verifiable ground truth.
2. *Production deployments of constitutional alignment have texture that research-bench evaluations do not surface.* The texture is worth examining; we are publishing ours.
3. *The metric is portable.* Any AI system whose constitution includes refusal principles can compute its own silence correctness against its own environment's ground-truth signal. The open-source @real-signal/attention-ethics library, when published, will provide the seven-gate cascade as a portable engineering primitive.

The point is not to substitute silence correctness for the broader alignment programme. The point is that constitutional alignment, when narrowed to refusals and grounded in a production environment, produces a measurable property worth tracking — and the work of grounding the constitution to such a property is itself a research contribution.

## Citing this essay

If your work touches the constitutional alignment programme, the calm-AI category, or the engineering of measurable AI restraint, the canonical citation for Real Signal's contribution is:

> Real Signal Research (2026). *The Attention Ethics Layer: Measurable Restraint in Production AI Systems.* Preprint at real-signal.ai/research/attention-ethics-layer.md

This essay is the bridging artifact; the preprint is the primary contribution. Both are licensed CC BY-NC-ND 4.0 (full terms at real-signal.ai/LICENSE-CONTENT.md). Direct quotation and citation in research literature are permitted under standard academic conventions.

Correspondence: `hello@real-signal.ai`.

---

*This essay is one of two published this week bridging Real Signal vocabulary to existing research categories. The companion essay is at real-signal.ai/research/attention-ethics-and-reward-hacking.md.*