Concepts
Runs
A run is one execution of an automation. You'll spend most of your operator time here: watching which step is suspended, replaying the event log, resuming a waiting step, or cancelling.
Runs are durable. If the Mobius runtime restarts mid-execution, the run resumes from the last persisted step. Step boundaries are checkpoints; some steps (agent turns especially) checkpoint inside the step too.
Status machine
| Status | When |
|---|---|
queued | Waiting for capacity or a concurrency slot. |
running | At least one step is actively progressing. |
suspended | Waiting on a sleep, an event, or a worker job. |
completed | All required steps finished successfully. |
failed | A step or cleanup path failed without recovery. |
cancelled | A user or system cancellation stopped the run. |
queued, running, and suspended are the live states. Terminal status
is one of completed, failed, cancelled. A run can be suspended
while a single step is busy: an agent step is suspended while its
worker job runs the LLM call, a wait_for_event step is suspended
waiting for a matching source event, a sleep step is suspended until
the wake time.
Watching a run
mobius runs list --status running
mobius runs list --status suspended
mobius runs get run_01...
mobius runs list-steps run_01...
mobius runs stream run_01...The stream prints events as they happen:
event: step.started { step_key: "classify", kind: "agent" }
event: turn.started { agent_id: "agt_triager" }
event: tool.called { name: "github.get_issue" }
event: turn.completed { token_usage: {...} }
event: step.completed { step_key: "classify" }
event: step.started { step_key: "label", kind: "action" }
event: action.started { action: "github.add_label" }
event: action.completed { status: "completed" }
event: step.completed { step_key: "label" }
event: run.completedThe full vocabulary covers run.*, step.*, turn.*, tool.*,
action.*, artifact.*, environment.*, and cleanup.*. If you
reconnect mid-run, the CLI replays from the latest sequence number it
saw so you don't lose events.
Resuming A Waiting Step
A suspended run step can be resumed by key from a CI job, a webhook handler, or by hand. The payload becomes that step's output:
mobius runs signal run_01... \
--step-key deploy_complete \
--result '{"commit":"abc1234","ok":true}'The step receives the payload and the run continues.
Cancelling
mobius runs cancel run_01... --reason "no longer needed"Cancellation is cooperative. In-flight agent turns finish their current tool call before unwinding. Cleanup always runs, whether the run completed, failed, or was cancelled.
What "durable" means in practice
Mobius persists the run and its inputs, each step's status and result, the full event log (sequence-numbered), worker jobs for action and LLM-generation work, sleeps, and event subscriptions. If the runtime dies, a replica picks up suspended runs by their wake time or pending wait records. A step lease/heartbeat mechanism stops two replicas from running the same step at once.
What this does NOT do: roll back external side effects. Slack messages already sent, GitHub labels already applied, database rows already written. Idempotency for those belongs at the action level, not the run level.
Cleanup
Every run runs cleanup at the end, regardless of outcome. Cleanup releases Mobius-allocated resources: managed environments, temporary credentials, event subscriptions, ephemeral artifacts. External side effects keep their state.
A practical operator loop
If a step looks stuck, the diagnostic sequence is usually:
mobius runs getto confirm the run issuspended.mobius runs list-stepsto find which step.- If it's a
wait_for_eventstep, check whether the upstream system produced the expected event. - If it's an
actionstep with a worker job, checkmobius worker-sessions listto confirm a worker is online and heartbeating. - Resume by signal, by restarting the worker, or by cancelling and starting a fresh run.
Related
- Automations define what runs do.
- Jobs are the worker-transport records created during action and generation steps.
- Webhooks can deliver
run.completed,run.failed, and other run events to your systems.