An on-call morning digest that compresses hand-overs from an hour to minutes.

Hub composition: Tools and Background heavy across /it-ops.

THE SITUATION

What the team was already living with.

A managed-services team running a defined set of services across multiple customers. Runbooks lived in three different drives. On-call shifts started with a half-hour archaeology dig before any real work could begin.

Leadership wanted runbook execution to be auditable, on-call hand-overs to compress, and triage decisions to be visible after the fact.

WHAT WE BUILT

The agents that shipped.

01
/it-ops Tools in Claude Code. Executes named runbooks with parameters; logs the run with the policy applied and the steps taken.
02
/it-ops Q&A on Slack. Answers questions about service health, recent incidents, and ownership.
03
/it-ops Background. A morning digest summarising the overnight: what fired, what was suppressed, what is open at hand-over.

A WORKING EXCHANGE

Real questions. Real answers.

$ /it-ops run failover-drill on cluster-east runbook → Loaded failover-drill v3. Pre-checks pass. Step 1/6: drained traffic. Step 2/6: promoted standby. Health check 200. Run logged as DRL-2904. $ /it-ops what changed overnight digest → 2 alerts fired, both auto-suppressed per policy. 1 ticket open at hand-over: customer cluster-west, latency spike 0240-0312 UTC, currently green.

THE OUTCOME

What changed, concretely.

Outcome 01
Runbook auditability
Every runbook execution carries a logged record of the steps taken and the policy applied.
Outcome 02
On-call hand-over time
Compressed from a manual archaeology dig to a generated digest at the start of each shift. [TBD: average minutes saved per shift.]
Outcome 03
Triage visibility
Triage decisions are reviewable after the fact; alerts that were auto-suppressed are visible in the digest with the policy that suppressed them.

All engagements

An on-call morning digest that compresses hand-overs from an hour to minutes.

What the team was already living with.

The agents that shipped.

Real questions. Real answers.

What changed, concretely.

Runbook auditability

On-call hand-over time

Triage visibility