Agentic Engineering — Implementation Checklist

1. Set up the rule file

Create a CLAUDE.md / AGENTS.md in the repo root. Start with 10 lines.
Cover four things:
- Stack and versions
- Conventions (folder structure, naming, patterns you actually use)
- Hard rules the agent must never break (forbidden packages, secrets handling, layering)
- Workflow to follow before generating code
Add a new rule every time the agent does something you don't want repeated.
List the tools the agent may call and when to use each (APIs, scripts, DB schemas).
Make architecture decisions yourself; let the agent implement them, not choose them.

Decide what is static (always loaded) vs dynamic (loaded on demand):
- Static: rule files, system instructions, global memory
- Dynamic: skills, tool results, retrieved docs, recent history
Keep static context short and high-signal. Cut anything the agent doesn't need every call.
Move repeatable know-how into skills that load only when the task matches.
Never paste a whole repo into the prompt. Retrieve what's relevant.

Pick a mode per task:
- Conductor (real-time, in-IDE) for complex logic, debugging, unfamiliar code
- Orchestrator (async, delegate and review) for bug fixes, migrations, test generation
Pick the agent location per task:
- Editor agent — in-flow edits and suggestions
- Terminal agent — multi-file work, run-and-react
- Background agent — paragraph-spec tasks you can walk away from
Run code generation inside a sandbox, using only approved tools.
Handle the last 20% yourself: edge cases, error handling, integration points, business logic. The code that "looks right" is where the bugs hide.

Use the agent as a first-pass reviewer (bugs, style, security, performance).
Review every line that ships:
- Be skeptical of clever code
- Confirm imported packages are real
- Check error handling for realistic failures
Add hooks at commit/edit points (e.g. block commits with hard-coded secrets).
Turn on observability: traces, eval results, token/latency/cost, drift.
Point the agent at legacy work you've been avoiding: refactors, migrations, deprecated APIs.

Measure total cost of ownership, not just speed.
Raise first-pass success with a tight rule file to avoid retry loops.
Route models by task:
- Frontier models for architecture and hard implementation
- Cheap models for test generation, review, CI monitoring
Use dynamic context and skills so you only pay for tokens you need.

Decide what you're building:
- A script — the agent is the endpoint
- A product for real users — the agent needs a substrate
For products, add: persistent memory, scoped permissions, eval coverage in CI, full-run tracing.
Use a skills bundle so your existing coding agent handles build → evaluate → deploy → observe.
For multi-agent setups, coordinate via shared state, MCP for tools, A2A for delegation.

Version rule files, prompts, eval suites, and skills. Review them in PRs. Assign owners.
Gate shipping on a passing eval suite with a clear rubric, not a working demo.
Train reviewers on how generated code fails.
Make the prototype-vs-production boundary explicit (which repos, branches, environments).
Build the harness once and keep refining it.
Hire and promote for judgment: specification, evaluation, architecture.