A plugin for Claude Code · evaluate → plan → calibrate

Calibrate your Claude Code setup — and lock in standards that stick.

claude-calibration audits your whole setup — CLAUDE.md, rules, settings, skills, subagents, hooks, MCP and plugins — against a stated (or guessed) intent, fixes findings surgically through an approval gate, and — when a problem recurs — scaffolds a hook, rule, or wrapper skill so it can't come back.

How it works

Two things to calibrate, one loop

Point claude-calibration at your configuration or at a multi-step workflow. Either way it picks a target, evaluates against your intent, and reports — and on the configuration track, recurring findings get promoted from one-off fixes to scaffolded enforcement.

Flowchart: starting from your setup and a calibration goal, you pick a target. The configuration track optionally scopes which plugins to audit with a --plugins allow/block list, then runs plan, evaluate (a nine-way parallel fan-out), approve, calibrate, re-evaluate, and report; when a finding recurs the plan scaffolds enforcement — a hook, a path-scoped rule, or a wrapper skill. The workflows track drives golden fixtures, scores node, edge and flow behaviour, and returns a deterministic verdict.

/calibrate chains worker subagents — planner → evaluator (which fans out one audit per feature) → calibrator → a delta re-evaluation — and persists state in .claude/calibration/<run>/ so a run survives /clear. Its highest-leverage move: when the same finding recurs, the planner stops emitting one-off fixes and scaffolds a feature that enforces the standard so the problem can't return.

  1. A skill without disable-model-invocation, then another, then a third — the evaluator tags each with the same signature.
  2. The planner detects the recurrence and emits a kind: create row instead of three separate edits.
  3. You approve, and the calibrator scaffolds a PreToolUse hook that blocks any future skill saved without the flag. The standard is now enforced, not just documented.

Calibrate your configuration

A rubric-driven audit of every Claude Code surface, a prioritised plan with risk and scope, surgical fixes, and a before→after delta.

  • /calibrate
  • /calibration-audit
  • /calibration-diff
  • /calibration-track
  • /calibration-doctor

Calibrate your workflows

Drive a multi-step workflow over golden fixtures and score node recall/precision, edge handoff contracts, and end-to-end intent — with a deterministic verdict you can gate CI on.

  • /calibration-flow
  • node · edge · flow
  • deterministic scorer

What it audits

Nine features, one parallel pass

The evaluator fans out one audit per feature, so a full baseline is a single wall-clock pass. Each bundle also runs standalone — type /claude-calibration:calibrate-<feature> yourself.

claude-md

Flags committed secrets, oversized memory files, aspirational "must/always/never" prose with no enforcement, deep import chains, and contradictions across nested CLAUDE.md files.

rules

Catches oversized rule files, language- or area-specific rules missing paths: (always-on context cost), broken globs, and multi-step workflows that should be skills.

settings

Finds committed secrets, --dangerously-skip-permissions, blanket-destructive permissions, committed model pins, and values silently overridden by managed policy.

skills

Flags missing or vague frontmatter, descriptions over the routing budget, side-effecting verbs without disable-model-invocation, and bare CLI calls that should be wrapped.

subagents

Catches the big footgun — no tools: means it inherits every tool, including all MCP servers — plus omitted model: that silently inflates cost.

hooks

Flags bare-* matchers on hot events, scripts that exit 1 instead of exit 2, remote payload fetches, and heavy build/test work on the hot path.

mcp

Finds literal tokens in committed JSON, servers with no paired wrapper skill (always-on tool-set cost), over-broad surfaces, and dead servers.

plugins

Audits manifests for missing name/version, components misplaced under .claude-plugin/, legacy commands/ without skills/, and duplicate registrations. Scope a run to specific plugins with an allow/block list — --plugins foo,bar or --plugins -baz.

general

Rolls up estimated always-on context cost, nested-CLAUDE.md conflict risk, settings precedence surprises, and missing .gitignore coverage. Powers /calibrate cost.

What makes it calibration

recurrence → enforcement

A finding that recurs becomes a scaffolded hook, path-scoped rule, or wrapper skill — so the standard is enforced, not just noted.

parallel fan-out

Nine per-feature workers audit in parallel, merging into per-feature, cross-feature, and intent-flow reports in one pass.

deterministic tracking

/calibration-track measures real progress vs the last merge to main and the previous iteration — no circular self-grading.

behavioural flow eval

/calibration-flow grades whether a multi-agent pipeline actually catches what it should — scored by a deterministic oracle.

~zero idle cost

Every shipped skill is disable-model-invocation: true, so the plugin adds essentially no standing context when idle.

reversible & scoped

Project-scope edits are surgical and easy to git revert; user-scope changes are written up as recommendations, never applied.

Commands

Everything is a slash command you type

Every command is disable-model-invocation: true — Claude never auto-fires one; you invoke it by name. Run a full loop, a read-only check, or a single feature.

Command What it does
/calibrate "<goal>" The orchestrator. Start or resume a full run against your intent; with no goal it states a guessed one. Supports status / restart / --yes, plus built-in tighten / harden / cost modes. Scope which plugins a run audits with --plugins foo,bar (allow-list), --plugins -baz (block-list), or --plugins global|local.
/calibration Dispatcher — prints the menu, delegates to a flow, or forwards free text as the intent.
/calibration-audit Read-only baseline evaluation — no plan, no edits. Good as a CI gate.
/calibration-diff "What changed since the last run?" — re-evaluates against the previous baseline.
/calibration-track "Is calibration actually improving my setup?" — a deterministic snapshot vs the last merge to main and vs the previous iteration. Independent of /calibrate's built-in delta.
/calibration-flow "Does my workflow still behave?" — drives a multi-step workflow over golden fixtures and scores node recall/precision, edge handoff contracts, and flow intent. Verdict from a deterministic scorer.
/calibration-doctor ~5-second structural health check: JSON parses, hooks executable, frontmatter valid.
/calibration-onboarding First-time setup guide — detects your stack, recommends one next step.
/calibrate-<feature> Nine per-feature bundles you can run on their own: skills, subagents, claude-md, rules, settings, hooks, mcp, plugins, general.

Common intents: "reduce always-on context cost", "tighten standards", "make the TDD loop reliable", "fix obvious safety gaps", "prepare for a team audit".

Use cases

When to reach for it

Bloated, slow setup

/calibrate "reduce always-on context cost" finds the oversized CLAUDE.md, unconditional rules, and skills missing disable-model-invocation that load every session.

Automate a standard

Keep forgetting tools: on subagents? /calibrate "tighten standards" turns the recurring finding into a hook that blocks the omission for good.

Pass a team audit

/calibrate "prepare for a team audit" weights all scopes equally and writes user-scope changes as exact, copy-paste recommendations.

Focus on a few plugins

Many plugins installed but only auditing your own? /calibrate --plugins my-plugin (or --plugins -noisy-one to skip one) scopes the whole run to just the plugins you care about.

Did my edits fix it?

After hand-editing, /calibration-diff scores findings as resolved, partial, open, or new against the previous baseline.

Prove it's helping

/calibration-track gives a deterministic IMPROVED / REGRESSION verdict over time — independent of the plugin's own delta.

Pipeline misses bugs

/calibration-flow drives your PR-review pipeline over golden fixtures to find which node is dropping findings.

CI / pre-commit gate

Add /calibration-audit (read-only) or the ~5-second /calibration-doctor to catch config regressions before they ship.

Onboarding a teammate

/calibration-onboarding detects the stack and names one minimal next step — no need to read all the docs first.

Get started

Install in two commands

# In a Claude Code session — add the marketplace, then install
/plugin marketplace add odere-pro/claude-calibration
/plugin install claude-calibration@odere-pro

# Run your first calibration (no goal = a guessed intent)
/calibrate
/calibrate "reduce always-on context cost"
Safe by design. The plugin adds ~zero standing context cost when idle — every skill is disable-model-invocation. It only edits config after you approve a plan; project-scope edits are easy to git revert, and ~/.claude/** changes are written up as recommendations, never applied automatically.
# Or load a local checkout for one session (development)
claude --plugin-dir /path/to/claude-calibration

# Verify the install
/plugin        # confirms claude-calibration is enabled
/skills        # /calibrate, /calibration, /claude-calibration:* registered
/agents        # the calibration-* workers registered
/context       # confirms ~zero idle context cost

Uninstall

/plugin disable claude-calibration      # turn off without removing
/plugin uninstall claude-calibration    # remove entirely

# Run artifacts are your project's data — left in place. Clean up if you like:
rm -rf .claude/calibration/
echo '.claude/calibration/' >> .gitignore

Debug

When something looks off

Start with the doctor — a ~5-second structural check that tells you whether the config is sound (not whether it's good; that's /calibration-audit).

/claude-calibration:calibration-doctor

# Example output
BROKEN: .claude/settings.json — invalid JSON
WARN:   hooks/my-hook.sh — not executable
OK:     .mcp.json — valid
Symptom Likely cause & fix
/calibrate not in skill list Plugin not enabled or not loaded yet. Check /plugin; run /reload-plugins if just installed; restart if you added a new top-level skills/ dir.
Plugin costs thousands of idle tokens disable-model-invocation didn't load — a stale fork. Reinstall: /plugin uninstall then /plugin install claude-calibration@odere-pro.
Can't read docs/ / BUNDLES_DIR=UNKNOWN The plugin's skill-dir resolution failed. Run with --plugin-dir for one session; if that works, reinstall via the marketplace.
Calibrator wrote outside expected paths Safety hooks didn't load. Confirm hooks/hooks.json exists; /hooks should list the two claude-calibration write-guards.

Questions

Frequently asked questions

Does it edit my Claude Code configuration?
Only after you approve a plan. Project-scope changes (CLAUDE.md, .claude/**, the repo's .mcp.json) are applied surgically and are easy to git revert; user-scope (~/.claude/**) changes are written up as recommendations for you to apply by hand. /calibration-audit is fully read-only.
What is the recurrence-to-enforcement move?
When the same finding recurs, the planner stops emitting one-off fixes and instead scaffolds a feature that enforces the standard: a PreToolUse hook, a path-scoped rule, or a wrapper skill. The problem then can't come back — that's the difference between a linter and a calibration system.
Will the plugin bloat my context window?
No. Every shipped skill is disable-model-invocation: true, so its description is removed from context and the plugin adds ~zero standing cost when idle. Worker subagents only cost tokens when a run spawns them, and the two rules are paths:-scoped.
Can it grade a multi-step workflow, not just static config?
Yes. /calibration-flow drives a workflow (e.g. a PR code-review pipeline) over golden fixtures and scores node recall/precision, edge handoff contracts, and end-to-end intent. The verdict comes from a deterministic scorer, so it's repeatable and can gate CI.
How do I know calibration is actually improving things?
/calibration-track produces a deterministic snapshot vs the last PR merged to main and vs the previous iteration, with an IMPROVED / REGRESSION / no-change verdict — independent of /calibrate's built-in delta, so there's no circular grading.
Can I calibrate only certain plugins?
Yes. Pass an allow-list or block-list: /calibrate --plugins foo,bar audits only those plugins, --plugins -baz audits everything except baz, and --plugins global|local restricts by install scope. The filter spans globally-installed and locally-loaded plugins and applies across the whole audit (skills, subagents, hooks and MCP those plugins ship, not just the manifest). Persist a default in .claude/calibration/config.json.
How do I uninstall it?
/plugin disable claude-calibration turns it off; /plugin uninstall claude-calibration removes it. Run artifacts under .claude/calibration/ are your project's data and are left in place — delete with rm -rf .claude/calibration/ and add the path to .gitignore if you don't want runs committed.