Does claude-calibration edit my Claude Code configuration?

Yes, but only after you approve a plan. Project-scope changes (CLAUDE.md, .claude/**, the repo's .mcp.json) are applied surgically and are easy to git revert; user-scope (~/.claude/**) changes are written up as recommendations for you to apply by hand. You can also run /claude-calibration:calibration-audit for a read-only pass that never edits anything.

What is the recurrence-to-enforcement move?

When the same finding recurs — say several skills missing disable-model-invocation, or several subagents missing tools — the planner stops emitting one-off fixes and instead scaffolds a feature that enforces the standard: a PreToolUse hook, a path-scoped rule, or a wrapper skill. The problem then can't come back. That is the difference between a linter and a calibration system.

Will the plugin bloat my context window?

No. Every shipped skill is disable-model-invocation: true, so its description is removed from context and the plugin adds ~zero standing context cost when idle. The worker subagents only cost tokens when a run spawns them, and the two rules are path-scoped so they load only when a run folder or bundle directory is open.

Can it grade a multi-step workflow, not just static config?

Yes. /claude-calibration:calibration-flow drives a multi-step workflow (for example a PR code-review pipeline) over a case set of golden fixtures and scores node recall/precision, edge handoff contracts, and end-to-end flow intent. The verdict comes from a deterministic scorer, not the model, so it is repeatable and can be a CI gate.

How do I know calibration is actually improving my setup?

/claude-calibration:calibration-track produces a deterministic snapshot compared against two baselines: the last PR merged onto main (long-term progress) and the previous iteration (immediate change). Its verdict is IMPROVED, REGRESSION, or no change of note — independent of /calibrate's own built-in delta, so there is no circular grading.

How do I uninstall claude-calibration?

Run /plugin disable claude-calibration to turn it off, or /plugin uninstall claude-calibration to remove it entirely. Run artifacts under .claude/calibration/ are your project's data and are left in place; delete them with rm -rf .claude/calibration/ and add that path to .gitignore if you don't want runs committed.

claude-calibration — audit, fix & enforce your Claude Code setup

How it works

Two things to calibrate, one loop

Point claude-calibration at your configuration or at a multi-step workflow. Either way it picks a target, evaluates against your intent, and reports — and on the configuration track, recurring findings get promoted from one-off fixes to scaffolded enforcement.

flowchart TD
    start(["your setup + a calibration goal"])
    pick{"pick a target"}
    start --> pick

    pick -->|configuration| cfg["calibrate configuration
9 features, parallel audit"]
    pick -->|workflows| flow["calibrate workflows
/calibration-flow"]

    cfg --> scope["scope plugins
--plugins allow/block (optional)"]
    scope --> plan["plan"]
    plan --> evaluate["evaluate · 9-way fan-out"]
    evaluate --> approve["approve"]
    approve --> calibrateStep["calibrate"]
    calibrateStep --> reEvaluate["re-evaluate"]
    reEvaluate --> report(["report"])
    plan -.->|recurring finding?| scaffold["scaffold enforcement
hook · rule · wrapper skill"]

    flow --> fixtures["drive golden fixtures"]
    fixtures --> score["score node · edge · flow"]
    score --> verdict(["deterministic verdict"])

    classDef accent fill:#3a2a20,stroke:#e0855f,color:#f3e7df;
    class scaffold,verdict accent;

Flowchart: starting from your setup and a calibration goal, you pick a target. The configuration track optionally scopes which plugins to audit with a --plugins allow/block list, then runs plan, evaluate (a nine-way parallel fan-out), approve, calibrate, re-evaluate, and report; when a finding recurs the plan scaffolds enforcement — a hook, a path-scoped rule, or a wrapper skill. The workflows track drives golden fixtures, scores node, edge and flow behaviour, and returns a deterministic verdict.

/calibrate chains worker subagents — planner → evaluator (which fans out one audit per feature) → calibrator → a delta re-evaluation — and persists state in .claude/calibration/<run>/ so a run survives /clear. Its highest-leverage move: when the same finding recurs, the planner stops emitting one-off fixes and scaffolds a feature that enforces the standard so the problem can't return.

A skill without disable-model-invocation, then another, then a third — the evaluator tags each with the same signature.
The planner detects the recurrence and emits a kind: create row instead of three separate edits.
You approve, and the calibrator scaffolds a PreToolUse hook that blocks any future skill saved without the flag. The standard is now enforced, not just documented.

Calibrate your configuration

A rubric-driven audit of every Claude Code surface, a prioritised plan with risk and scope, surgical fixes, and a before→after delta.

/calibrate
/calibration-audit
/calibration-diff
/calibration-track
/calibration-doctor

Calibrate your workflows

Drive a multi-step workflow over golden fixtures and score node recall/precision, edge handoff contracts, and end-to-end intent — with a deterministic verdict you can gate CI on.

/calibration-flow
node · edge · flow
deterministic scorer

What it audits

Nine features, one parallel pass

The evaluator fans out one audit per feature, so a full baseline is a single wall-clock pass. Each bundle also runs standalone — type /claude-calibration:calibrate-<feature> yourself.

claude-md

Flags committed secrets, oversized memory files, aspirational "must/always/never" prose with no enforcement, deep import chains, and contradictions across nested CLAUDE.md files.

rules

Catches oversized rule files, language- or area-specific rules missing paths: (always-on context cost), broken globs, and multi-step workflows that should be skills.

settings

Finds committed secrets, --dangerously-skip-permissions, blanket-destructive permissions, committed model pins, and values silently overridden by managed policy.

skills

Flags missing or vague frontmatter, descriptions over the routing budget, side-effecting verbs without disable-model-invocation, and bare CLI calls that should be wrapped.

subagents

Catches the big footgun — no tools: means it inherits every tool, including all MCP servers — plus omitted model: that silently inflates cost.

hooks

Flags bare-* matchers on hot events, scripts that exit 1 instead of exit 2, remote payload fetches, and heavy build/test work on the hot path.

mcp

Finds literal tokens in committed JSON, servers with no paired wrapper skill (always-on tool-set cost), over-broad surfaces, and dead servers.

plugins

Audits manifests for missing name/version, components misplaced under .claude-plugin/, legacy commands/ without skills/, and duplicate registrations. Scope a run to specific plugins with an allow/block list — --plugins foo,bar or --plugins -baz.

general

Rolls up estimated always-on context cost, nested-CLAUDE.md conflict risk, settings precedence surprises, and missing .gitignore coverage. Powers /calibrate cost.

What makes it calibration

recurrence → enforcement

A finding that recurs becomes a scaffolded hook, path-scoped rule, or wrapper skill — so the standard is enforced, not just noted.

parallel fan-out

Nine per-feature workers audit in parallel, merging into per-feature, cross-feature, and intent-flow reports in one pass.

deterministic tracking

/calibration-track measures real progress vs the last merge to main and the previous iteration — no circular self-grading.

behavioural flow eval

/calibration-flow grades whether a multi-agent pipeline actually catches what it should — scored by a deterministic oracle.

~zero idle cost

Every shipped skill is disable-model-invocation: true, so the plugin adds essentially no standing context when idle.

reversible & scoped

Project-scope edits are surgical and easy to git revert; user-scope changes are written up as recommendations, never applied.

Commands

Everything is a slash command you type

Every command is disable-model-invocation: true — Claude never auto-fires one; you invoke it by name. Run a full loop, a read-only check, or a single feature.

Command	What it does
`/calibrate "<goal>"`	The orchestrator. Start or resume a full run against your intent; with no goal it states a guessed one. Supports `status` / `restart` / `--yes`, plus built-in `tighten` / `harden` / `cost` modes. Scope which plugins a run audits with `--plugins foo,bar` (allow-list), `--plugins -baz` (block-list), or `--plugins global\|local`.
`/calibration`	Dispatcher — prints the menu, delegates to a flow, or forwards free text as the intent.
`/calibration-audit`	Read-only baseline evaluation — no plan, no edits. Good as a CI gate.
`/calibration-diff`	"What changed since the last run?" — re-evaluates against the previous baseline.
`/calibration-track`	"Is calibration actually improving my setup?" — a deterministic snapshot vs the last merge to `main` and vs the previous iteration. Independent of `/calibrate`'s built-in delta.
`/calibration-flow`	"Does my workflow still behave?" — drives a multi-step workflow over golden fixtures and scores node recall/precision, edge handoff contracts, and flow intent. Verdict from a deterministic scorer.
`/calibration-doctor`	~5-second structural health check: JSON parses, hooks executable, frontmatter valid.
`/calibration-onboarding`	First-time setup guide — detects your stack, recommends one next step.
`/calibrate-<feature>`	Nine per-feature bundles you can run on their own: `skills`, `subagents`, `claude-md`, `rules`, `settings`, `hooks`, `mcp`, `plugins`, `general`.

Common intents: "reduce always-on context cost", "tighten standards", "make the TDD loop reliable", "fix obvious safety gaps", "prepare for a team audit".

Use cases

When to reach for it

Bloated, slow setup

/calibrate "reduce always-on context cost" finds the oversized CLAUDE.md, unconditional rules, and skills missing disable-model-invocation that load every session.

Automate a standard

Keep forgetting tools: on subagents? /calibrate "tighten standards" turns the recurring finding into a hook that blocks the omission for good.

Pass a team audit

/calibrate "prepare for a team audit" weights all scopes equally and writes user-scope changes as exact, copy-paste recommendations.

Focus on a few plugins

Many plugins installed but only auditing your own? /calibrate --plugins my-plugin (or --plugins -noisy-one to skip one) scopes the whole run to just the plugins you care about.

Did my edits fix it?

After hand-editing, /calibration-diff scores findings as resolved, partial, open, or new against the previous baseline.

Prove it's helping

/calibration-track gives a deterministic IMPROVED / REGRESSION verdict over time — independent of the plugin's own delta.

Pipeline misses bugs

/calibration-flow drives your PR-review pipeline over golden fixtures to find which node is dropping findings.

CI / pre-commit gate

Add /calibration-audit (read-only) or the ~5-second /calibration-doctor to catch config regressions before they ship.

Onboarding a teammate

/calibration-onboarding detects the stack and names one minimal next step — no need to read all the docs first.

Get started

Install in two commands

# In a Claude Code session — add the marketplace, then install
/plugin marketplace add odere-pro/claude-calibration
/plugin install claude-calibration@odere-pro

# Run your first calibration (no goal = a guessed intent)
/calibrate
/calibrate "reduce always-on context cost"

Safe by design. The plugin adds ~zero standing context cost when idle — every skill is disable-model-invocation. It only edits config after you approve a plan; project-scope edits are easy to git revert, and ~/.claude/** changes are written up as recommendations, never applied automatically.

# Or load a local checkout for one session (development)
claude --plugin-dir /path/to/claude-calibration

# Verify the install
/plugin        # confirms claude-calibration is enabled
/skills        # /calibrate, /calibration, /claude-calibration:* registered
/agents        # the calibration-* workers registered
/context       # confirms ~zero idle context cost

Uninstall

/plugin disable claude-calibration      # turn off without removing
/plugin uninstall claude-calibration    # remove entirely

# Run artifacts are your project's data — left in place. Clean up if you like:
rm -rf .claude/calibration/
echo '.claude/calibration/' >> .gitignore

Debug

When something looks off

Start with the doctor — a ~5-second structural check that tells you whether the config is sound (not whether it's good; that's /calibration-audit).

/claude-calibration:calibration-doctor

# Example output
BROKEN: .claude/settings.json — invalid JSON
WARN:   hooks/my-hook.sh — not executable
OK:     .mcp.json — valid

Symptom	Likely cause & fix
`/calibrate` not in skill list	Plugin not enabled or not loaded yet. Check `/plugin`; run `/reload-plugins` if just installed; restart if you added a new top-level `skills/` dir.
Plugin costs thousands of idle tokens	`disable-model-invocation` didn't load — a stale fork. Reinstall: `/plugin uninstall` then `/plugin install claude-calibration@odere-pro`.
Can't read `docs/` / `BUNDLES_DIR=UNKNOWN`	The plugin's skill-dir resolution failed. Run with `--plugin-dir` for one session; if that works, reinstall via the marketplace.
Calibrator wrote outside expected paths	Safety hooks didn't load. Confirm `hooks/hooks.json` exists; `/hooks` should list the two `claude-calibration` write-guards.

Questions

Frequently asked questions

Does it edit my Claude Code configuration?: Only after you approve a plan. Project-scope changes (CLAUDE.md, .claude/**, the repo's .mcp.json) are applied surgically and are easy to git revert; user-scope (~/.claude/**) changes are written up as recommendations for you to apply by hand. /calibration-audit is fully read-only.
What is the recurrence-to-enforcement move?: When the same finding recurs, the planner stops emitting one-off fixes and instead scaffolds a feature that enforces the standard: a PreToolUse hook, a path-scoped rule, or a wrapper skill. The problem then can't come back — that's the difference between a linter and a calibration system.
Will the plugin bloat my context window?: No. Every shipped skill is disable-model-invocation: true, so its description is removed from context and the plugin adds ~zero standing cost when idle. Worker subagents only cost tokens when a run spawns them, and the two rules are paths:-scoped.
Can it grade a multi-step workflow, not just static config?: Yes. /calibration-flow drives a workflow (e.g. a PR code-review pipeline) over golden fixtures and scores node recall/precision, edge handoff contracts, and end-to-end intent. The verdict comes from a deterministic scorer, so it's repeatable and can gate CI.
How do I know calibration is actually improving things?: /calibration-track produces a deterministic snapshot vs the last PR merged to main and vs the previous iteration, with an IMPROVED / REGRESSION / no-change verdict — independent of /calibrate's built-in delta, so there's no circular grading.
Can I calibrate only certain plugins?: Yes. Pass an allow-list or block-list: /calibrate --plugins foo,bar audits only those plugins, --plugins -baz audits everything except baz, and --plugins global|local restricts by install scope. The filter spans globally-installed and locally-loaded plugins and applies across the whole audit (skills, subagents, hooks and MCP those plugins ship, not just the manifest). Persist a default in .claude/calibration/config.json.
How do I uninstall it?: /plugin disable claude-calibration turns it off; /plugin uninstall claude-calibration removes it. Run artifacts under .claude/calibration/ are your project's data and are left in place — delete with rm -rf .claude/calibration/ and add the path to .gitignore if you don't want runs committed.

Calibrate your Claude Code setup — and lock in standards that stick.