Operations
Guardrails
Guardrails are the limits and verification steps that make a run stop for a known reason instead of running until a person notices.
Use guardrails on every loop that can spend money, wait on another system, or change something outside Mobius. Start with a run budget and a wall-clock timeout, then add checks and breakers when the run can make risky decisions.
The guardrail model
Every guardrail answers one operator question:
| Question | Mechanism | What you inspect |
|---|---|---|
| What can this run spend? | Run budgets and loop daily budgets | Budget rail, usage.recorded, run.budget_warning, run.budget_exceeded |
| How long can this run occupy runtime state? | wall_clock_timeout, wait timeouts, retries | Stop reason, step status, wait events |
| How much agent work is allowed? | max_agent_turns, step max_turns | Turn count and stop reason turn_limit_reached |
| Did the run prove its result? | check steps with verdicts and evidence | check.passed, check.failed, proof rows, gate interactions |
| Is this loop failing repeatedly? | Duplicate-tool-call and consecutive-failure breakers | run.progress_stalled, loop.auto_paused, loop status paused |
The run timeline is the source of truth. A guardrail stop is not a generic
failure. budget_exceeded, turn_limit_reached, progress_stalled, and
wall_clock_exceeded mean the configured bound worked.
Configure run budgets
Run budgets live in spec.limits. Use dollars when you think in account spend,
or credits when you think in Mobius usage. One credit is $0.01, and you set
exactly one unit.
limits:
budget_usd: 10limits:
credit_budget: 1000Mobius stores the ceiling as milli-credits and shows both units in the run
budget rail. A run emits run.budget_warning once when it crosses 80 percent
of the run budget. When spend reaches the ceiling, Mobius halts at the next
checkpoint with stop_reason: budget_exceeded and emits
run.budget_exceeded.
Enforcement granularity is one model call or one metered action. Mobius does not corrupt in-flight work by killing a call halfway through. That means a run with a 1000-credit budget can finish at 1000 credits plus the one call or action that was already in flight.
Trial-plan runs get a default 100-credit budget when neither the loop nor the start request sets one. Paid-plan runs are unbounded unless you set a budget.
Set loop daily budgets
Loop daily budgets cap rolling-24-hour platform-billed spend across all runs of one loop:
limits:
daily_budget_usd: 25limits:
daily_credit_budget: 2500Mobius checks the daily window before starting a run. If the window is already
exhausted, the start is refused with billing_cap_reached and kind
loop_daily_budget. For platform-funded calls inside a running run, Mobius also
checks the loop daily window at the funding gate and halts the run with
budget_exceeded if the ceiling is crossed.
Use a daily budget for loops that run from schedules or high-volume event triggers. The per-run budget bounds one execution; the daily budget bounds the fleet behavior of that loop.
BYOK budget semantics
Bring-your-own-key (BYOK) usage follows two different budget rules on purpose:
| Budget | Counts BYOK spend? | Why |
|---|---|---|
Per-run budget, budget_usd or credit_budget | Yes | A run budget measures consumption. It is the user's halting bound, regardless of who pays the provider. |
Per-loop daily budget, daily_budget_usd or daily_credit_budget | No | A daily loop window measures platform-billed spend, like org burst caps. |
The practical result: a BYOK run can halt mid-run when its per-run budget is reached, because that budget measures all metered consumption. A BYOK start can also be refused when platform-billed spend has already exhausted the loop's daily window, but BYOK calls do not move that daily counter while the run is in flight.
Bound time and retries
Use limits.wall_clock_timeout for the run-level clock:
limits:
wall_clock_timeout: 30mWhen the deadline passes, the reaper fails the run with
stop_reason: wall_clock_exceeded. The bound protects runtime state even when a
step executor is still busy or a process misses a normal completion path.
Step retry policy bounds transient failures:
retry:
max_attempts: 3
delay: 30smax_attempts is the total number of attempts, so 1 means no retry. The
server caps retries at 10. Retries help with network and provider failures;
they do not make destructive side effects safe. Use idempotency keys or an
interaction step before destructive actions.
Waiting steps can set timeout.duration:
timeout:
duration: 2h
on_timeout: failToday timeout.on_timeout supports fail. A wait_for_event timeout emits
wait.timed_out, fails the step, and fails the run. Interaction cancellation
also closes the wait so obsolete runs do not stay suspended.
Limit agent turns
limits.max_agent_turns caps agent turns across the whole run:
limits:
max_agent_turns: 20Use this when a loop has multiple agent steps, retries, or judge checks. The cap
is separate from a step's max_turns, which bounds tool iterations inside one
agent turn. When the run-wide cap is reached, Mobius halts with
stop_reason: turn_limit_reached.
Add checks and evidence
A check step evaluates assertions over run input and saved step context. It
records a verdict and routes on failure:
steps:
- key: test
kind: action
config:
action_name: ci.run_tests
parameters:
repo: acme/api
save_as: test_run
- key: verify
kind: check
config:
checks:
- name: tests_passed
kind: expr
expr: context.test_run.exit_code == 0
evidence: [test]
- name: review_summary
kind: agent
prompt: "Decide whether the test output supports shipping this change."
evidence: [test]
on_fail: gate
gate:
targets: ["user:lead@acme.example"]
prompt: "Review the failed checks before this run continues."kind: expr is deterministic. Use it for exact predicates over action outputs,
structured results, and run input. kind: agent runs a bounded judge turn that
returns a strict { pass, reason } verdict. Omit agent to use the built-in
platform reviewer, mobius-reviewer.
Evidence is data, not narrative. Prefer action outputs, artifacts, test logs, diffs, and structured results over an agent's self-description. A check that asserts "the test action exited 0" is stronger than a check that asks the agent that wrote the code whether tests passed.
on_fail controls the route:
on_fail | Result |
|---|---|
fail | Stop the run with stop_reason: check_failed. |
continue | Continue the run and keep the red verdict on the timeline. |
gate | Open a request_approval interaction carrying the failed assertions and evidence. Approval resumes the run; rejection stops it with gate_rejected. |
Use breakers for runaways
The duplicate-tool-call breaker stops an agent turn that repeats the same tool call arguments too many times:
limits:
max_duplicate_tool_calls: 3The step retry policy still applies. If the run fails from the breaker, it stops
with progress_stalled and emits run.progress_stalled.
The loop circuit breaker pauses a loop after consecutive failed runs:
limits:
pause_after_consecutive_failures: 3Completed runs reset the streak. Cancelled runs are neutral. When the breaker
trips, Mobius auto-pauses the loop, emits loop.auto_paused, and refuses new
starts until an operator resumes the loop.
From the app
In the loop editor, use Settings to set Budget and Turns for the common run-level guardrails. The run page shows the budget rail, stop reason, per-step cost rows, proof rows, and reason-specific stop banners as the run progresses.
Advanced fields such as daily loop budgets, duplicate-call breaker thresholds, and circuit-breaker thresholds are API/spec fields today.
From the CLI
Use the CLI to start and watch guarded runs:
mobius runs start morning-brief --inputs '{"scope":"today"}'
mobius runs stream run_01...The launch CLI can inspect loops, start runs, stream events, and cancel runs.
Loop-authoring commands still use the legacy automations command group while
the SDK rename rolls through, so prefer the app or HTTP API when documenting or
sharing new guardrail configuration.
From the API
Send guardrails under limits in the authored spec:
POST /v1/projects/{project}/loops/{id}/versionsYou can also override the per-run budget at start:
POST /v1/projects/{project}/loops/{id}/runs{
"inputs": {
"scope": "today"
},
"budget_usd": 2.5
}Start-request budgets affect only that run. They do not change the loop's published spec or its daily budget.
Stop reasons
stop_reason explains why a terminal run stopped:
| Stop reason | Meaning |
|---|---|
completed | The run finished successfully. |
step_failed | A step failed without remaining retries. |
check_failed | A check failed and on_fail: fail stopped the run. |
gate_rejected | A check gate or interaction was rejected. |
cancelled | A user or system cancelled the run. |
replaced | A concurrency policy replaced the run. |
wall_clock_exceeded | The run exceeded limits.wall_clock_timeout. |
budget_exceeded | A run budget or loop daily budget halted the run. |
turn_limit_reached | The run exceeded limits.max_agent_turns. |
progress_stalled | The duplicate-tool-call breaker halted the run. |
step_limit_reached | The run exceeded the plan's per-run step cap. |
When a run surprises you, read the stop reason first, then open the timeline event that caused it.
What operators should monitor
run.budget_warningandrun.budget_exceededby loop.check.failedgrouped by check name.- Runs in
failedwith guardrail stop reasons. - Loops that auto-pause from
loop.auto_paused. - Runs that stay
suspendedlonger than their expected wait window. usage.recordedgrouped by API key, loop, and step.- Artifact quota before enabling loops that produce large files.
Before production
- Set a per-run budget on every loop that can call a model or metered action.
- Set
limits.wall_clock_timeout. - Set
timeout.durationonwait_for_eventandinteractionsteps. - Keep retry counts small until actions are proven idempotent.
- Add a
checkstep before risky actions or approvals. - Use a daily loop budget for schedules and high-volume event triggers.
- Watch the first runs from the Timeline tab before increasing scope.
FAQ
Why can a run spend slightly more than its budget?
Mobius enforces budgets at checkpoints. A checkpoint happens before the next model call, after a metered action records usage, at step boundaries, and between agent tool iterations. The call already in flight is allowed to finish, so the final spend can include one extra call or action.
Should I use a check or an interaction?
Use a check when Mobius can evaluate evidence and record a verdict. Use an
interaction when a human or agent must supply information, approval, or a
review. When in doubt, use a check with on_fail: gate: the run records a
verdict and opens a human approval only for red evidence.
Next
- Read the stop-state model in runs.
- Configure step retries and waits in steps.
- Follow every guardrail event in the event catalog.
- Attribute machine usage with API keys.