Concepts

Runs

A run is one execution of an automation. You'll spend most of your operator time here: watching which step is suspended, replaying the event log, resuming a waiting step, or cancelling.

Runs are durable. If the Mobius runtime restarts mid-execution, the run resumes from the last persisted step. Step boundaries are checkpoints; some steps (agent turns especially) checkpoint inside the step too.

Status machine

Status	When
`queued`	Waiting for capacity or a concurrency slot.
`running`	At least one step is actively progressing.
`suspended`	Waiting on a sleep, an event, or a worker job.
`completed`	All required steps finished successfully.
`failed`	A step or cleanup path failed without recovery.
`cancelled`	A user or system cancellation stopped the run.

queued, running, and suspended are the live states. Terminal status is one of completed, failed, cancelled. A run can be suspended while a single step is busy: an agent step is suspended while its worker job runs the LLM call, a wait_for_event step is suspended waiting for a matching source event, a sleep step is suspended until the wake time.

Watching a run

mobius runs list --status running
mobius runs list --status suspended
mobius runs get run_01...
mobius runs list-steps run_01...
mobius runs stream run_01...

The stream prints events as they happen:

event: step.started   { step_key: "classify", kind: "agent" }
event: turn.started   { agent_id: "agt_triager" }
event: tool.called    { name: "github.get_issue" }
event: turn.completed { token_usage: {...} }
event: step.completed { step_key: "classify" }
event: step.started   { step_key: "label", kind: "action" }
event: action.started { action: "github.add_label" }
event: action.completed { status: "completed" }
event: step.completed { step_key: "label" }
event: run.completed

The full vocabulary covers run.*, step.*, turn.*, tool.*, action.*, artifact.*, environment.*, and cleanup.*. If you reconnect mid-run, the CLI replays from the latest sequence number it saw so you don't lose events.

Resuming A Waiting Step

A suspended run step can be resumed by key from a CI job, a webhook handler, or by hand. The payload becomes that step's output:

mobius runs signal run_01... \
  --step-key deploy_complete \
  --result '{"commit":"abc1234","ok":true}'

The step receives the payload and the run continues.

Cancelling

mobius runs cancel run_01... --reason "no longer needed"

Cancellation is cooperative. In-flight agent turns finish their current tool call before unwinding. Cleanup always runs, whether the run completed, failed, or was cancelled.

What "durable" means in practice

Mobius persists the run and its inputs, each step's status and result, the full event log (sequence-numbered), worker jobs for action and LLM-generation work, sleeps, and event subscriptions. If the runtime dies, a replica picks up suspended runs by their wake time or pending wait records. A step lease/heartbeat mechanism stops two replicas from running the same step at once.

What this does NOT do: roll back external side effects. Slack messages already sent, GitHub labels already applied, database rows already written. Idempotency for those belongs at the action level, not the run level.

Cleanup

Every run runs cleanup at the end, regardless of outcome. Cleanup releases Mobius-allocated resources: managed environments, temporary credentials, event subscriptions, ephemeral artifacts. External side effects keep their state.

A practical operator loop

If a step looks stuck, the diagnostic sequence is usually:

mobius runs get to confirm the run is suspended.
mobius runs list-steps to find which step.
If it's a wait_for_event step, check whether the upstream system produced the expected event.
If it's an action step with a worker job, check mobius worker-sessions list to confirm a worker is online and heartbeating.
Resume by signal, by restarting the worker, or by cancelling and starting a fresh run.

Automations define what runs do.
Jobs are the worker-transport records created during action and generation steps.
Webhooks can deliver run.completed, run.failed, and other run events to your systems.