Operations

Guardrails

Guardrails are the limits and verification steps that make a run stop for a known reason instead of running until a person notices.

Use guardrails on every loop that can spend money, wait on another system, or change something outside Mobius. Start with a run budget and a wall-clock timeout, then add checks and breakers when the run can make risky decisions.

The guardrail model

Every guardrail answers one operator question:

QuestionMechanismWhat you inspect
What can this run spend?Run budgets and loop daily budgetsBudget rail, usage.recorded, run.budget_warning, run.budget_exceeded
How long can this run occupy runtime state?wall_clock_timeout, wait timeouts, retriesStop reason, step status, wait events
How much agent work is allowed?max_agent_turns, step max_turnsTurn count and stop reason turn_limit_reached
Did the run prove its result?check steps with verdicts and evidencecheck.passed, check.failed, proof rows, gate interactions
Is this loop failing repeatedly?Duplicate-tool-call and consecutive-failure breakersrun.progress_stalled, loop.auto_paused, loop status paused

The run timeline is the source of truth. A guardrail stop is not a generic failure. budget_exceeded, turn_limit_reached, progress_stalled, and wall_clock_exceeded mean the configured bound worked.

Configure run budgets

Run budgets live in spec.limits. Use dollars when you think in account spend, or credits when you think in Mobius usage. One credit is $0.01, and you set exactly one unit.

limits:
  budget_usd: 10
limits:
  credit_budget: 1000

Mobius stores the ceiling as milli-credits and shows both units in the run budget rail. A run emits run.budget_warning once when it crosses 80 percent of the run budget. When spend reaches the ceiling, Mobius halts at the next checkpoint with stop_reason: budget_exceeded and emits run.budget_exceeded.

Enforcement granularity is one model call or one metered action. Mobius does not corrupt in-flight work by killing a call halfway through. That means a run with a 1000-credit budget can finish at 1000 credits plus the one call or action that was already in flight.

Trial-plan runs get a default 100-credit budget when neither the loop nor the start request sets one. Paid-plan runs are unbounded unless you set a budget.

Set loop daily budgets

Loop daily budgets cap rolling-24-hour platform-billed spend across all runs of one loop:

limits:
  daily_budget_usd: 25
limits:
  daily_credit_budget: 2500

Mobius checks the daily window before starting a run. If the window is already exhausted, the start is refused with billing_cap_reached and kind loop_daily_budget. For platform-funded calls inside a running run, Mobius also checks the loop daily window at the funding gate and halts the run with budget_exceeded if the ceiling is crossed.

Use a daily budget for loops that run from schedules or high-volume event triggers. The per-run budget bounds one execution; the daily budget bounds the fleet behavior of that loop.

BYOK budget semantics

Bring-your-own-key (BYOK) usage follows two different budget rules on purpose:

BudgetCounts BYOK spend?Why
Per-run budget, budget_usd or credit_budgetYesA run budget measures consumption. It is the user's halting bound, regardless of who pays the provider.
Per-loop daily budget, daily_budget_usd or daily_credit_budgetNoA daily loop window measures platform-billed spend, like org burst caps.

The practical result: a BYOK run can halt mid-run when its per-run budget is reached, because that budget measures all metered consumption. A BYOK start can also be refused when platform-billed spend has already exhausted the loop's daily window, but BYOK calls do not move that daily counter while the run is in flight.

Bound time and retries

Use limits.wall_clock_timeout for the run-level clock:

limits:
  wall_clock_timeout: 30m

When the deadline passes, the reaper fails the run with stop_reason: wall_clock_exceeded. The bound protects runtime state even when a step executor is still busy or a process misses a normal completion path.

Step retry policy bounds transient failures:

retry:
  max_attempts: 3
  delay: 30s

max_attempts is the total number of attempts, so 1 means no retry. The server caps retries at 10. Retries help with network and provider failures; they do not make destructive side effects safe. Use idempotency keys or an interaction step before destructive actions.

Waiting steps can set timeout.duration:

timeout:
  duration: 2h
  on_timeout: fail

Today timeout.on_timeout supports fail. A wait_for_event timeout emits wait.timed_out, fails the step, and fails the run. Interaction cancellation also closes the wait so obsolete runs do not stay suspended.

Limit agent turns

limits.max_agent_turns caps agent turns across the whole run:

limits:
  max_agent_turns: 20

Use this when a loop has multiple agent steps, retries, or judge checks. The cap is separate from a step's max_turns, which bounds tool iterations inside one agent turn. When the run-wide cap is reached, Mobius halts with stop_reason: turn_limit_reached.

Add checks and evidence

A check step evaluates assertions over run input and saved step context. It records a verdict and routes on failure:

steps:
  - key: test
    kind: action
    config:
      action_name: ci.run_tests
      parameters:
        repo: acme/api
    save_as: test_run
 
  - key: verify
    kind: check
    config:
      checks:
        - name: tests_passed
          kind: expr
          expr: context.test_run.exit_code == 0
          evidence: [test]
        - name: review_summary
          kind: agent
          prompt: "Decide whether the test output supports shipping this change."
          evidence: [test]
      on_fail: gate
      gate:
        targets: ["user:lead@acme.example"]
        prompt: "Review the failed checks before this run continues."

kind: expr is deterministic. Use it for exact predicates over action outputs, structured results, and run input. kind: agent runs a bounded judge turn that returns a strict { pass, reason } verdict. Omit agent to use the built-in platform reviewer, mobius-reviewer.

Evidence is data, not narrative. Prefer action outputs, artifacts, test logs, diffs, and structured results over an agent's self-description. A check that asserts "the test action exited 0" is stronger than a check that asks the agent that wrote the code whether tests passed.

on_fail controls the route:

on_failResult
failStop the run with stop_reason: check_failed.
continueContinue the run and keep the red verdict on the timeline.
gateOpen a request_approval interaction carrying the failed assertions and evidence. Approval resumes the run; rejection stops it with gate_rejected.

Use breakers for runaways

The duplicate-tool-call breaker stops an agent turn that repeats the same tool call arguments too many times:

limits:
  max_duplicate_tool_calls: 3

The step retry policy still applies. If the run fails from the breaker, it stops with progress_stalled and emits run.progress_stalled.

The loop circuit breaker pauses a loop after consecutive failed runs:

limits:
  pause_after_consecutive_failures: 3

Completed runs reset the streak. Cancelled runs are neutral. When the breaker trips, Mobius auto-pauses the loop, emits loop.auto_paused, and refuses new starts until an operator resumes the loop.

From the app

In the loop editor, use Settings to set Budget and Turns for the common run-level guardrails. The run page shows the budget rail, stop reason, per-step cost rows, proof rows, and reason-specific stop banners as the run progresses.

Advanced fields such as daily loop budgets, duplicate-call breaker thresholds, and circuit-breaker thresholds are API/spec fields today.

From the CLI

Use the CLI to start and watch guarded runs:

mobius runs start morning-brief --inputs '{"scope":"today"}'
mobius runs stream run_01...

The launch CLI can inspect loops, start runs, stream events, and cancel runs. Loop-authoring commands still use the legacy automations command group while the SDK rename rolls through, so prefer the app or HTTP API when documenting or sharing new guardrail configuration.

From the API

Send guardrails under limits in the authored spec:

POST /v1/projects/{project}/loops/{id}/versions

You can also override the per-run budget at start:

POST /v1/projects/{project}/loops/{id}/runs
{
  "inputs": {
    "scope": "today"
  },
  "budget_usd": 2.5
}

Start-request budgets affect only that run. They do not change the loop's published spec or its daily budget.

Stop reasons

stop_reason explains why a terminal run stopped:

Stop reasonMeaning
completedThe run finished successfully.
step_failedA step failed without remaining retries.
check_failedA check failed and on_fail: fail stopped the run.
gate_rejectedA check gate or interaction was rejected.
cancelledA user or system cancelled the run.
replacedA concurrency policy replaced the run.
wall_clock_exceededThe run exceeded limits.wall_clock_timeout.
budget_exceededA run budget or loop daily budget halted the run.
turn_limit_reachedThe run exceeded limits.max_agent_turns.
progress_stalledThe duplicate-tool-call breaker halted the run.
step_limit_reachedThe run exceeded the plan's per-run step cap.

When a run surprises you, read the stop reason first, then open the timeline event that caused it.

What operators should monitor

  • run.budget_warning and run.budget_exceeded by loop.
  • check.failed grouped by check name.
  • Runs in failed with guardrail stop reasons.
  • Loops that auto-pause from loop.auto_paused.
  • Runs that stay suspended longer than their expected wait window.
  • usage.recorded grouped by API key, loop, and step.
  • Artifact quota before enabling loops that produce large files.

Before production

  1. Set a per-run budget on every loop that can call a model or metered action.
  2. Set limits.wall_clock_timeout.
  3. Set timeout.duration on wait_for_event and interaction steps.
  4. Keep retry counts small until actions are proven idempotent.
  5. Add a check step before risky actions or approvals.
  6. Use a daily loop budget for schedules and high-volume event triggers.
  7. Watch the first runs from the Timeline tab before increasing scope.

FAQ

Why can a run spend slightly more than its budget?

Mobius enforces budgets at checkpoints. A checkpoint happens before the next model call, after a metered action records usage, at step boundaries, and between agent tool iterations. The call already in flight is allowed to finish, so the final spend can include one extra call or action.

Should I use a check or an interaction?

Use a check when Mobius can evaluate evidence and record a verdict. Use an interaction when a human or agent must supply information, approval, or a review. When in doubt, use a check with on_fail: gate: the run records a verdict and opens a human approval only for red evidence.

Next

  • Read the stop-state model in runs.
  • Configure step retries and waits in steps.
  • Follow every guardrail event in the event catalog.
  • Attribute machine usage with API keys.