The single biggest engineering choice in a production internal agent is what runs on code and what runs on a model.
Get it right and the agent is auditable, predictable, and trusted. Get it wrong and the agent drifts under load, hallucinates a policy, paged the wrong owner, fabricated a number.
The rule we use. Anything that maps to a written company policy runs on deterministic code. Anything that requires judgment over unstructured context runs on the LLM. The agent itself is the orchestration layer between the two.
Pricing exceptions. Code checks the discount against the auto-approval band. The model summarises the request context for the approver. Splitting these keeps the policy enforcement provable and the summary helpful.
On-call paging. Code evaluates the error-rate threshold and the duration. The model writes the human-readable incident summary. Splitting these keeps the page logic auditable and the writeup useful.
Vendor amendments. Code parses the amendment into structured deltas. The model assesses whether each delta is material. Splitting these keeps the structural extraction reliable and the materiality call defensible.
Where teams fail is in inverting this split. They put policy on the model because the model is faster to ship. The agent passes a demo and fails an audit. Or they put judgment on code because code is more comfortable. The agent is rigid where it needs to compose context.
The two questions we run on every internal capability before shipping it. One. Is there a written policy this maps to. If yes, deterministic. Two. Does the answer require composing unstructured context. If yes, agentic. Most capabilities have both, separated cleanly.
Code runs the policy. The model runs the judgment. The agent stitches them together. That is the production shape.