Calibrate your configuration
A rubric-driven audit of every Claude Code surface, a prioritised plan with risk and scope, surgical fixes, and a before→after delta.
A plugin for Claude Code · evaluate → plan → calibrate
claude-calibration audits your whole setup — CLAUDE.md, rules, settings, skills, subagents, hooks, MCP and plugins — against a stated (or guessed) intent, fixes findings surgically through an approval gate, and — when a problem recurs — scaffolds a hook, rule, or wrapper skill so it can't come back.
How it works
Point claude-calibration at your configuration or at a multi-step workflow. Either way it picks a target, evaluates against your intent, and reports — and on the configuration track, recurring findings get promoted from one-off fixes to scaffolded enforcement.
/calibrate chains worker subagents — planner → evaluator
(which fans out one audit per feature) → calibrator → a delta
re-evaluation — and persists state in
.claude/calibration/<run>/ so a run survives
/clear. Its highest-leverage move: when the same finding
recurs, the planner stops emitting one-off fixes and
scaffolds a feature that enforces the standard so the
problem can't return.
disable-model-invocation, then another,
then a third — the evaluator tags each with the same signature.
kind: create row instead of three separate edits.
PreToolUse hook that blocks any future skill saved
without the flag. The standard is now enforced, not just documented.
A rubric-driven audit of every Claude Code surface, a prioritised plan with risk and scope, surgical fixes, and a before→after delta.
Drive a multi-step workflow over golden fixtures and score node recall/precision, edge handoff contracts, and end-to-end intent — with a deterministic verdict you can gate CI on.
What it audits
The evaluator fans out one audit per feature, so a full baseline is a
single wall-clock pass. Each bundle also runs standalone — type
/claude-calibration:calibrate-<feature> yourself.
Flags committed secrets, oversized memory files, aspirational "must/always/never" prose with no enforcement, deep import chains, and contradictions across nested CLAUDE.md files.
Catches oversized rule files, language- or area-specific rules
missing paths: (always-on context cost), broken globs,
and multi-step workflows that should be skills.
Finds committed secrets, --dangerously-skip-permissions,
blanket-destructive permissions, committed model pins, and values
silently overridden by managed policy.
Flags missing or vague frontmatter, descriptions over the routing
budget, side-effecting verbs without
disable-model-invocation, and bare CLI calls that should
be wrapped.
Catches the big footgun — no tools: means it inherits
every tool, including all MCP servers — plus omitted
model: that silently inflates cost.
Flags bare-* matchers on hot events, scripts that
exit 1 instead of exit 2, remote payload
fetches, and heavy build/test work on the hot path.
Finds literal tokens in committed JSON, servers with no paired wrapper skill (always-on tool-set cost), over-broad surfaces, and dead servers.
Audits manifests for missing name/version,
components misplaced under .claude-plugin/, legacy
commands/ without skills/, and duplicate
registrations. Scope a run to specific plugins with an allow/block
list — --plugins foo,bar or --plugins -baz.
Rolls up estimated always-on context cost, nested-CLAUDE.md conflict
risk, settings precedence surprises, and missing
.gitignore coverage. Powers /calibrate cost.
What makes it calibration
A finding that recurs becomes a scaffolded hook, path-scoped rule, or wrapper skill — so the standard is enforced, not just noted.
Nine per-feature workers audit in parallel, merging into per-feature, cross-feature, and intent-flow reports in one pass.
/calibration-track measures real progress vs the last
merge to main and the previous iteration — no circular
self-grading.
/calibration-flow grades whether a multi-agent pipeline
actually catches what it should — scored by a deterministic oracle.
Every shipped skill is disable-model-invocation: true, so
the plugin adds essentially no standing context when idle.
Project-scope edits are surgical and easy to git revert;
user-scope changes are written up as recommendations, never applied.
Commands
Every command is disable-model-invocation: true — Claude
never auto-fires one; you invoke it by name. Run a full loop, a read-only
check, or a single feature.
| Command | What it does |
|---|---|
/calibrate "<goal>" |
The orchestrator. Start or resume a full run against your intent;
with no goal it states a guessed one. Supports
status / restart / --yes,
plus built-in tighten / harden /
cost modes. Scope which plugins a run audits with
--plugins foo,bar (allow-list),
--plugins -baz (block-list), or
--plugins global|local.
|
/calibration |
Dispatcher — prints the menu, delegates to a flow, or forwards free text as the intent. |
/calibration-audit |
Read-only baseline evaluation — no plan, no edits. Good as a CI gate. |
/calibration-diff |
"What changed since the last run?" — re-evaluates against the previous baseline. |
/calibration-track |
"Is calibration actually improving my setup?" — a deterministic
snapshot vs the last merge to main and vs the
previous iteration. Independent of /calibrate's
built-in delta.
|
/calibration-flow |
"Does my workflow still behave?" — drives a multi-step workflow over golden fixtures and scores node recall/precision, edge handoff contracts, and flow intent. Verdict from a deterministic scorer. |
/calibration-doctor |
~5-second structural health check: JSON parses, hooks executable, frontmatter valid. |
/calibration-onboarding |
First-time setup guide — detects your stack, recommends one next step. |
/calibrate-<feature> |
Nine per-feature bundles you can run on their own:
skills, subagents,
claude-md, rules, settings,
hooks, mcp, plugins,
general.
|
Common intents: "reduce always-on context cost",
"tighten standards",
"make the TDD loop reliable",
"fix obvious safety gaps",
"prepare for a team audit".
Use cases
/calibrate "reduce always-on context cost" finds the
oversized CLAUDE.md, unconditional rules, and skills missing
disable-model-invocation that load every session.
Keep forgetting tools: on subagents?
/calibrate "tighten standards" turns the recurring
finding into a hook that blocks the omission for good.
/calibrate "prepare for a team audit" weights all scopes
equally and writes user-scope changes as exact, copy-paste
recommendations.
Many plugins installed but only auditing your own?
/calibrate --plugins my-plugin (or
--plugins -noisy-one to skip one) scopes the whole run to
just the plugins you care about.
After hand-editing, /calibration-diff scores findings as
resolved, partial, open, or new against the previous baseline.
/calibration-track gives a deterministic IMPROVED /
REGRESSION verdict over time — independent of the plugin's own delta.
/calibration-flow drives your PR-review pipeline over
golden fixtures to find which node is dropping findings.
Add /calibration-audit (read-only) or the ~5-second
/calibration-doctor to catch config regressions before
they ship.
/calibration-onboarding detects the stack and names one
minimal next step — no need to read all the docs first.
Get started
# In a Claude Code session — add the marketplace, then install
/plugin marketplace add odere-pro/claude-calibration
/plugin install claude-calibration@odere-pro
# Run your first calibration (no goal = a guessed intent)
/calibrate
/calibrate "reduce always-on context cost"
disable-model-invocation. It only edits config
after you approve a plan; project-scope edits are easy to
git revert, and ~/.claude/** changes are
written up as recommendations, never applied automatically.
# Or load a local checkout for one session (development)
claude --plugin-dir /path/to/claude-calibration
# Verify the install
/plugin # confirms claude-calibration is enabled
/skills # /calibrate, /calibration, /claude-calibration:* registered
/agents # the calibration-* workers registered
/context # confirms ~zero idle context cost
Uninstall
/plugin disable claude-calibration # turn off without removing
/plugin uninstall claude-calibration # remove entirely
# Run artifacts are your project's data — left in place. Clean up if you like:
rm -rf .claude/calibration/
echo '.claude/calibration/' >> .gitignore
Debug
Start with the doctor — a ~5-second structural check that tells you
whether the config is sound (not whether it's good; that's
/calibration-audit).
/claude-calibration:calibration-doctor
# Example output
BROKEN: .claude/settings.json — invalid JSON
WARN: hooks/my-hook.sh — not executable
OK: .mcp.json — valid
| Symptom | Likely cause & fix |
|---|---|
/calibrate not in skill list |
Plugin not enabled or not loaded yet. Check /plugin;
run /reload-plugins if just installed; restart if
you added a new top-level skills/ dir.
|
| Plugin costs thousands of idle tokens |
disable-model-invocation didn't load — a stale
fork. Reinstall: /plugin uninstall then
/plugin install claude-calibration@odere-pro.
|
Can't read docs/ / BUNDLES_DIR=UNKNOWN |
The plugin's skill-dir resolution failed. Run with
--plugin-dir for one session; if that works,
reinstall via the marketplace.
|
| Calibrator wrote outside expected paths |
Safety hooks didn't load. Confirm hooks/hooks.json
exists; /hooks should list the two
claude-calibration write-guards.
|
Questions
.claude/**, the repo's .mcp.json) are applied
surgically and are easy to git revert; user-scope
(~/.claude/**) changes are written up as recommendations
for you to apply by hand. /calibration-audit is fully
read-only.
PreToolUse hook, a path-scoped rule, or a wrapper skill. The
problem then can't come back — that's the difference between a linter and
a calibration system.
disable-model-invocation: true, so
its description is removed from context and the plugin adds ~zero standing
cost when idle. Worker subagents only cost tokens when a run spawns them,
and the two rules are paths:-scoped.
/calibration-flow drives a workflow (e.g. a PR
code-review pipeline) over golden fixtures and scores node
recall/precision, edge handoff contracts, and end-to-end intent. The
verdict comes from a deterministic scorer, so it's repeatable and can gate
CI.
/calibration-track produces a deterministic snapshot vs the
last PR merged to main and vs the previous iteration, with an
IMPROVED / REGRESSION / no-change verdict — independent of
/calibrate's built-in delta, so there's no circular grading.
/calibrate --plugins foo,bar audits only those plugins,
--plugins -baz audits everything except baz, and
--plugins global|local restricts by install scope. The filter
spans globally-installed and locally-loaded plugins and applies across the
whole audit (skills, subagents, hooks and MCP those plugins ship, not just
the manifest). Persist a default in
.claude/calibration/config.json.
/plugin disable claude-calibration turns it off;
/plugin uninstall claude-calibration removes it. Run
artifacts under .claude/calibration/ are your project's data
and are left in place — delete with rm -rf .claude/calibration/
and add the path to .gitignore if you don't want runs
committed.