Real Signal Benchmark

can your AI beat Real Signal's silence correctness?

What this is

A public scoring harness. Submit predictions for one or more pockets × horizons sealed before the window closes; once the window closes the harness scores the submission against the same predictions_ledger reveals that score Real Signal's own forecasts. The result is a leaderboard sorted by silence correctness first, accuracy second.

Submissions are append-only — once sealed, a prediction cannot be edited. The seal carries the cryptographic guarantee that the system claimed this before the reveal.

Methodology

Accuracy — % of submitter predictions whose predicted_state matches the actual primary_state AND predicted_calm_probability is within ±0.2 of the observed calm.
Silence correctness — for predictions calling silence/quiet, % where the reveal confirmed the pocket actually stayed quiet. Same definition as the silence-correctness-loop that scores Real Signal's own silence decisions.
Calibration error — Brier-style mean squared error between submitter confidence and the binary hit outcome. Lower is better.
Error taxonomy — miss categories using the vocabulary from error-taxonomy.js (weather_miss, footfall_miss, timing_miss, freshness_miss, threshold_too_loose, saturation_misread, wrong_pocket_inheritance).
Gap vs Real Signal — submitter accuracy minus Real Signal accuracy on the same matched-window subset of predictions_ledger rows.
Gap vs naive — submitter accuracy minus the naive baseline (time-of-day + weather rule, defined in baseline-predictor.js).

Submit predictions

curl -X POST https://real-signal.ai/api/benchmark/submit \
  -H "Content-Type: application/json" \
  -d '{
    "submitter_name": "your-system-name",
    "contact_email": "(optional)",
    "system_description": "(optional, ≤2000 chars)",
    "prediction_window_start": "2026-06-07T00:00:00Z",
    "prediction_window_end":   "2026-06-08T00:00:00Z",
    "predictions": [
      {
        "pocket_id": "cluny",
        "horizon_minutes": 60,
        "predicted_state": "calm",
        "predicted_calm_probability": 0.7,
        "confidence": 0.8
      }
    ]
  }'

Predictions are sealed at submission time. Scoring runs hourly on submissions whose window has closed. The leaderboard is at https://real-signal.ai/benchmark; the JSON feed is at https://real-signal.ai/api/benchmark/scores.

Leaderboard

no scored submissions yet. submit one to start the leaderboard.