Operations

Guardrails

Guardrails are the limits and verification steps that make a run stop for a known reason instead of running until a person notices.

Use guardrails on every loop that can spend money, wait on another system, or change something outside Mobius. Start with a run budget and a wall-clock timeout, then add checks and breakers when the run can make risky decisions.

The guardrail model

Every guardrail answers one operator question:

Question	Mechanism	What you inspect
What can this run spend?	Run budgets and loop daily budgets	Budget rail, `usage.recorded`, `run.budget_exceeded`
How long can this run occupy runtime state?	`wall_clock_timeout`, wait timeouts, retries	Stop reason, step status, wait events
How much agent work is allowed?	`max_agent_turns`, step `max_turns`	Turn count and stop reason `turn_limit_reached`
Did the run prove its result?	`check` steps with verdicts and evidence	`check.passed`, `check.failed`, proof rows, gate interactions
Is this loop failing repeatedly?	Duplicate-tool-call and consecutive-failure breakers	`run.progress_stalled`, `loop.auto_paused`, loop status `paused`

The run timeline is the source of truth. A guardrail stop is not a generic failure. budget_exceeded, turn_limit_reached, progress_stalled, and wall_clock_exceeded mean the configured bound worked.

Configure run budgets

Run budgets live in spec.limits. Use dollars when you think in account spend, or credits when you think in Mobius usage. One credit is $0.01, and you set exactly one unit.

limits:
  budget_usd: 10

limits:
  credit_budget: 1000

Mobius shows both units in the run budget rail, and reports credits as decimals so a sub-credit operation reads as 0.4 rather than rounding to zero. The budget is a hard limit. When spend reaches the ceiling, Mobius halts at the next checkpoint with stop_reason: budget_exceeded and emits run.budget_exceeded.

Enforcement granularity is one model call or one metered action. Mobius does not corrupt in-flight work by killing a call halfway through. That means a run with a 1000-credit budget can finish at 1000 credits plus the one call or action that was already in flight.

Trial-plan runs get a default 100-credit budget when neither the loop nor the start request sets one. Paid-plan runs are unbounded unless you set a budget.

Set loop daily budgets

Loop daily budgets cap rolling-24-hour platform-billed spend across all runs of one loop:

limits:
  daily_budget_usd: 25

limits:
  daily_credit_budget: 2500

Mobius checks the daily window before starting a run. If the window is already exhausted, the start is refused with billing_cap_reached and kind loop_daily_budget. For platform-funded calls inside a running run, Mobius also checks the loop daily window at the funding gate and halts the run with budget_exceeded if the ceiling is crossed.

Use a daily budget for loops that run from schedules or high-volume event triggers. The per-run budget bounds one execution; the daily budget bounds the fleet behavior of that loop.

BYOK budget semantics

Bring-your-own-key (BYOK) usage can create two bills: your provider bills upstream usage directly, and Mobius bills platform processing credits. Mobius budgets only govern the Mobius-side credit spend.

Budget	Counts BYOK spend?	Why
Per-run budget, `budget_usd` or `credit_budget`	Yes, Mobius processing credits	A run budget measures Mobius consumption inside the run. Provider-side invoices stay outside Mobius.
Per-loop daily budget, `daily_budget_usd` or `daily_credit_budget`	Yes, Mobius processing credits	A daily loop window measures platform-billed spend, including BYOK processing.

The practical result: a BYOK run can halt mid-run when its per-run budget is reached, because that budget measures Mobius-billed processing. A BYOK start can also be refused when platform-billed spend has already exhausted the loop's daily window. Your provider's direct bill is not counted in either Mobius budget.

Bound time and retries

Use limits.wall_clock_timeout for the run-level clock:

limits:
  wall_clock_timeout: 30m

When the deadline passes, the reaper fails the run with stop_reason: wall_clock_exceeded. The bound protects runtime state even when a step executor is still busy or a process misses a normal completion path.

Step retry policy bounds transient failures:

retry:
  max_attempts: 3
  delay: 30s

max_attempts is the total number of attempts, so 1 means no retry. The server caps retries at 10. Retries help with network and provider failures; they do not make destructive side effects safe. Use idempotency keys or an interaction step before destructive actions.

Waiting steps can set timeout.duration:

timeout:
  duration: 2h
  on_timeout: fail

Today timeout.on_timeout supports fail. A wait_for_event timeout emits wait.timed_out, fails the step, and fails the run. Interaction cancellation also closes the wait so obsolete runs do not stay suspended.

Limit agent turns

limits.max_agent_turns caps agent turns across the whole run:

limits:
  max_agent_turns: 20

Use this when a loop has multiple agent steps, retries, or judge checks. The cap is separate from a step's max_turns, which bounds tool iterations inside one agent turn. When the run-wide cap is reached, Mobius halts with stop_reason: turn_limit_reached.

Add checks and evidence

A check step evaluates assertions over the run's inputs and earlier step outputs. It records a verdict and routes on failure:

steps:
  - id: test
    kind: action
    config:
      action_name: ci.run_tests
      parameters:
        repo: acme/api
 
  - id: verify
    kind: check
    config:
      checks:
        - name: tests_passed
          kind: expr
          expr: steps.test.output.exit_code == 0
          evidence: [test]
        - name: review_summary
          kind: agent
          prompt: "Decide whether the test output supports shipping this change."
          evidence: [test]
      on_fail: gate
      gate:
        targets: ["user:lead@acme.example"]
        prompt: "Review the failed checks before this run continues."

kind: expr is deterministic. Use it for exact predicates over action outputs, structured results, and run input. kind: agent runs a bounded judge turn that returns a strict { pass, reason } verdict. Omit agent to use the built-in platform reviewer, mobius-reviewer.

Evidence is data, not narrative. Prefer action outputs, artifacts, test logs, diffs, and structured results over an agent's self-description. A check that asserts "the test action exited 0" is stronger than a check that asks the agent that wrote the code whether tests passed.

on_fail controls the route:

`on_fail`	Result
`fail`	Stop the run with `stop_reason: check_failed`.
`continue`	Continue the run and keep the red verdict on the timeline.
`gate`	Open a `request_approval` interaction carrying the failed assertions and evidence. Approval resumes the run; rejection stops it with `gate_rejected`.

Use breakers for runaways

The duplicate-tool-call breaker stops an agent turn that repeats the same tool call arguments too many times:

limits:
  max_duplicate_tool_calls: 10

The step retry policy still applies. If the run fails from the breaker, it stops with progress_stalled and emits run.progress_stalled.

The loop circuit breaker pauses a loop after consecutive failed runs:

limits:
  pause_after_consecutive_failures: 3

Completed runs reset the streak. Cancelled runs are neutral. When the breaker trips, Mobius auto-pauses the loop, emits loop.auto_paused, and refuses new starts until an operator resumes the loop.

From the app

In the loop editor, use Settings to set Budget and Turns for the common run-level guardrails. The run page shows the budget rail, stop reason, per-step cost rows, proof rows, and reason-specific stop banners as the run progresses.

Advanced fields such as daily loop budgets, duplicate-call breaker thresholds, and circuit-breaker thresholds are API/spec fields today.

From the CLI

Use the CLI to start and watch guarded runs:

mobius runs start morning-brief --inputs '{"scope":"today"}'
mobius runs stream run_01...

The launch CLI can inspect loops, start runs, stream events, and cancel runs. Loop-authoring commands still use the legacy automations command group while the SDK rename rolls through, so prefer the app or HTTP API when documenting or sharing new guardrail configuration.

From the API

Send guardrails under limits in the authored spec:

POST /v1/projects/{project}/loops/{id}/versions

You can also override the per-run budget at start:

POST /v1/projects/{project}/loops/{id}/runs

{
  "inputs": {
    "scope": "today"
  },
  "budget_usd": 2.5
}

Start-request budgets affect only that run. They do not change the loop's published spec or its daily budget.

Stop reasons

stop_reason explains why a terminal run stopped:

Stop reason	Meaning
`completed`	The run finished successfully.
`step_failed`	A step failed without remaining retries.
`check_failed`	A check failed and `on_fail: fail` stopped the run.
`gate_rejected`	A check gate or interaction was rejected.
`cancelled`	A user or system cancelled the run.
`replaced`	A concurrency policy replaced the run.
`wall_clock_exceeded`	The run exceeded `limits.wall_clock_timeout`.
`budget_exceeded`	A run budget or loop daily budget halted the run.
`turn_limit_reached`	The run exceeded `limits.max_agent_turns`.
`progress_stalled`	The duplicate-tool-call breaker halted the run.
`step_limit_reached`	The run exceeded the plan's per-run step cap.

When a run surprises you, read the stop reason first, then open the timeline event that caused it.

What operators should monitor

run.budget_exceeded by loop.
check.failed grouped by check name.
Runs in failed with guardrail stop reasons.
Loops that auto-pause from loop.auto_paused.
Runs that stay suspended longer than their expected wait window.
usage.recorded grouped by API key, loop, and step.
Artifact quota before enabling loops that produce large files.

Before production

Set a per-run budget on every loop that can call a model or metered action.
Set limits.wall_clock_timeout.
Set timeout.duration on wait_for_event and interaction steps.
Keep retry counts small until actions are proven idempotent.
Add a check step before risky actions or approvals.
Use a daily loop budget for schedules and high-volume event triggers.
Watch the first runs from the Timeline tab before increasing scope.

FAQ

Why can a run spend slightly more than its budget?

Mobius enforces budgets at checkpoints. A checkpoint happens before the next model call, after a metered action records usage, at step boundaries, and between agent tool iterations. The call already in flight is allowed to finish, so the final spend can include one extra call or action.

Should I use a check or an interaction?

Use a check when Mobius can evaluate evidence and record a verdict. Use an interaction when a human or agent must supply information, approval, or a review. When in doubt, use a check with on_fail: gate: the run records a verdict and opens a human approval only for red evidence.

Read the stop-state model in runs.
Configure step retries and waits in steps.
Follow every guardrail event in the event catalog.
Attribute machine usage with API keys.