Stages, jobs, and steps
Loom workflows are organized into three levels: stages, jobs, and steps. Together, they control execution order, environment isolation, and the structure of runtime diagnostics — so when something fails, you can pinpoint the exact command without scanning aggregated logs.
Why this hierarchy matters
When a CI pipeline fails, the first question is always "where did it break?" Loom's three-level hierarchy gives you a direct answer:
- Stages define execution phases — build before test, test before deploy.
- Jobs are the unit of scheduling, isolation, and provider selection.
- Steps are individual commands with their own event streams, so you can jump straight to the failing command's output.
This hierarchy also maps directly to runtime log paths, which means you can follow a structured pointer from a failure notification to the exact step's event stream — no grep required.
The three levels
| Level | What it is | YAML location | Required |
|---|---|---|---|
| Stage | An execution phase that groups jobs | stages: [build, test, deploy] at root | Yes (at least one) |
| Job | A scheduled unit of work with its own environment | Top-level key matching ^[a-z][a-z0-9_-]{0,63}$ | Yes (at least one) |
| Step | A single command inside a job's script: | Each entry in a job's script: list | Yes (at least one per job) |
How they nest
workflow.yml
├── stages: [build, test] ← execution phases
│
├── compile: ← job (build stage)
│ ├── stage: build
│ ├── target: linux
│ └── script:
│ ├── step 0: make deps ← step
│ └── step 1: make build ← step
│
├── unit: ← job (test stage)
│ ├── stage: test
│ ├── target: linux
│ └── script:
│ └── step 0: go test ./... ← step
│
└── lint: ← job (test stage)
├── stage: test
├── target: linux
└── script:
└── step 0: golangci-lint run ← step
Corresponding YAML
version: v1
stages: [build, test]
compile:
stage: build
target: linux
script:
- make deps
- make build
unit:
stage: test
target: linux
script:
- go test ./...
lint:
stage: test
target: linux
script:
- golangci-lint run
Execution ordering
Loom determines job execution order using a topological sort that respects stage order and explicit needs dependencies.
Stage ordering
Stages execute in the order they appear in the stages: list. All jobs in an earlier stage complete before any job in a later stage begins.
In the example above: compile (build stage) finishes before both unit and lint (test stage) start.
Job ordering within a stage
Within a stage, jobs without mutual needs dependencies are ordered alphabetically by job name. In the current executor, jobs run sequentially — parallelism within a stage is planned but not yet implemented.
In the example above: lint runs before unit within the test stage (alphabetical order).
Explicit dependencies with needs
Use needs to declare that a job depends on specific other jobs:
integration:
stage: test
needs: [compile]
target: linux
script:
- go test -tags=integration ./...
The integration job waits for compile to finish regardless of stage boundaries. This is useful for:
- Dependencies between jobs in the same stage.
- Making implicit stage-order dependencies explicit for readability.
- Cross-stage dependencies that the default ordering wouldn't capture.
Step ordering
Steps within a job run sequentially in the order listed. If any step exits with a non-zero code, the job fails immediately and remaining steps are skipped.
Each step must be a non-empty string. Multi-line commands within a single step entry are not allowed — use one command per script: entry.
Ordering summary
| Rule | Guarantee |
|---|---|
| Stages | Run in declaration order — all jobs in stage N complete before stage N+1 starts |
| Jobs within a stage | Alphabetical by name; sequential execution |
Jobs with needs | Wait for all named dependencies to finish before starting |
| Steps within a job | Sequential, in list order; fail-fast on non-zero exit |
Required job keys
Every executable (non-template) job must have these keys:
| Key | Type | Description |
|---|---|---|
stage | string | Must match a name declared in root stages: |
target | string | Must be "linux" (the only supported target) |
script | sequence of strings | At least one non-empty command string |
Template jobs (names starting with .) require at least one of script or extends.
Optional job keys
| Key | Purpose |
|---|---|
image | Container image — triggers Docker provider (see Providers) |
variables | Job-scoped environment variables (see Variables) |
secrets | Job-scoped secret references (see Secrets) |
needs | Explicit dependency on other jobs |
extends | Inherit configuration from a template job |
services | Sidecar containers (requires image to be set) |
cache | Cache configuration for the job |
runner_pool | Runner pool assignment |
invariant | Invariant metadata |
The default block
The default block sets inherited defaults for all jobs. Job-level values override these defaults through a deep merge — nested mappings (like variables) are merged key-by-key, not replaced wholesale.
default:
target: linux
variables:
LANG: "en_US.UTF-8"
Allowed keys in default:
| Key | Purpose |
|---|---|
target | Default target for all jobs |
image | Default container image |
runner_pool | Default runner pool |
variables | Default variables (merged with job variables) |
invariant | Default invariant metadata |
cache | Default cache configuration |
services | Default sidecar services |
secrets is not allowed in default — secrets must be declared per job because they are job-scoped.
Diagnostics: from failure to root cause
The three-level hierarchy maps directly to runtime log paths. When a job fails, Loom writes structured pointers that let you jump straight to the failing step without reading aggregated logs.
Log path structure
.loom/.runtime/logs/<run_id>/
├── pipeline/
│ ├── summary.json ← overall run status
│ └── manifest.json ← pointers to failing job(s)
└── jobs/
└── <job_id>/
├── summary.json ← job status + exit code
├── manifest.json ← pointer to failing step
├── user/
│ └── script/
│ ├── 00/
│ │ └── events.jsonl ← step 0 event stream
│ └── 01/
│ └── events.jsonl ← step 1 event stream
└── system/
└── provider/
└── events.jsonl ← provider-level events
Following a failure pointer
- Read the job manifest at
.loom/.runtime/logs/<run_id>/jobs/<job_id>/manifest.json. - Find the
failing_step_events_pathfield — it points directly to the failing step's event stream. - Read the step events file — it contains the exact command, exit code, and stderr excerpt.
Example manifest:
{
"failing_section": "script",
"failing_step_index": 2,
"failing_step_events_path": "jobs/check-pnpm/user/script/02/events.jsonl",
"system_sections": [
{
"system_section": "provider",
"events_path": "jobs/check-pnpm/system/provider/events.jsonl"
}
]
}
This structured pointer path replaces searching through full stdout/stderr dumps and gives you failure evidence in seconds.
Common pitfalls
Missing required keys
- Symptom:
loom checkreports errors likeadd required key stage for this joboradd required key target with value "linux". - Fix: Ensure every executable job has
stage,target, andscript. Template jobs need at leastscriptorextends.
Can't find the failure in logs
- Symptom: You're reading full stdout/stderr logs but can't identify what failed.
- Fix: Follow the structured pointer path: pipeline manifest → job manifest →
failing_step_events_path→ stepevents.jsonl. See the Diagnostics ladder for the full triage flow.
Unexpected execution order
- Symptom: Jobs run in a different order than expected.
- Fix: Stage order comes from the
stages:list (declaration order). Within a stage, jobs sort alphabetically by name. Useneedsfor explicit ordering. Runloom compileto inspect the resolved execution graph.
What to read next
- Workflow structure: Workflow, Workflow schema v1
- Execution environments: Providers
- Parameterize jobs: Variables
- Failure triage: Diagnostics ladder, Runtime logs contract
- Sharing debug info: What to share