diff --git a/adr/20260514-benchmark-process-directive.md b/adr/20260514-benchmark-process-directive.md new file mode 100644 index 0000000000..1fc8fe70e9 --- /dev/null +++ b/adr/20260514-benchmark-process-directive.md @@ -0,0 +1,272 @@ +# `benchmark` process directive for per-task performance metrics + +- Authors: Edmund Miller +- Status: draft +- Deciders: +- Date: 2026-05-14 +- Tags: directive, trace, metrics, benchmark + +## Summary + +Introduce a `benchmark` process directive that emits per-task runtime metrics +(wall time, peak memory, CPU usage, I/O) to a user-specified file, with an +optional `repeats` option for noise-reduction via repeated execution. + +## Problem Statement + +Nextflow already collects rich per-task runtime metrics through `TraceRecord` — +`realtime`, `peak_rss`, `peak_vmem`, `%cpu`, `rchar`, `wchar`, context switches, +and roughly 40 other fields — and surfaces them through the workflow-wide +`trace.txt`, the HTML `report.html`, and `timeline.html`. These artifacts are +useful for whole-pipeline post-mortem analysis but are not designed for +per-process benchmarking workflows: + +- The trace file mixes all tasks across the whole run; isolating measurements + for a single process requires post-processing. + +- There is no first-class way to capture a per-task metrics artifact at a + stable, predictable path that downstream processes can consume as input. + +- There is no built-in support for repeated execution of a task to obtain + multiple samples and reduce measurement noise — a common requirement when + benchmarking algorithms or hardware. + +- Users who want to benchmark a third-party pipeline (e.g. nf-core workflows) + currently have no way to enable metrics capture for a specific process + without forking the pipeline source. + +Snakemake addresses the same need with a `benchmark` rule keyword that records +wall time, memory, I/O and CPU usage to a per-job TSV (or JSONL) file, with an +optional `repeat(file, N)` wrapper for multiple samples. Nextflow has no +equivalent. + +## Goals + +- Provide a declarative, per-process way to capture performance metrics as a + named pipeline artifact. + +- Make the directive settable from `nextflow.config` (via `withName` and + `withLabel` selectors) so users can benchmark an existing pipeline without + modifying its source. + +- Reuse the existing `TraceRecord` collection — no new probes, no changes to + the bash wrapper. + +- Support repeated execution by spawning N real task executions and aggregating + their metrics, not by looping inside a single task. + +- Make the benchmark file a regular workflow artifact with a deterministic + schema, usable as input to downstream processes. + +- Multi-format output driven by file extension (`.tsv` by default, + `.jsonl` opt-in). + +## Non-goals + +- Aggregating benchmarks across tasks — the existing `trace.txt` and + `report.html` already cover whole-workflow summaries. + +- Collecting metrics beyond what `TraceRecord` already records. Snakemake's + `max_uss` and `max_pss` are not exposed by Nextflow's `nxf_stat` probe and + are documented as gaps rather than added in this ADR. + +- Reliable benchmarking inside `exec:` blocks. As with Snakemake's `run:` + body, the Groovy-side execution time is intertwined with Nextflow's process + supervision and cannot be measured cleanly. + +- Changes to the trace, report, or timeline subsystems. + +## Decision + +Introduce a `benchmark` process directive that accepts either a path string or +a map of options. Each task writes its own metrics file at the resolved path. +When `repeats: N` is set, Nextflow submits N independent task executions and +aggregates their `TraceRecord`s into a single output file. + +## Core Capabilities + +### Syntax + +The `benchmark` directive accepts a string shorthand or a map of options, +mirroring the `publishDir` convention: + +```groovy +// process definition — shorthand +process align { + cpus 8 + memory '16 GB' + benchmark "benchmarks/align/${sample}.tsv" + + input: + tuple val(sample), path(reads) + + script: + """ + bwa mem -t ${task.cpus} ref.fa ${reads} > out.bam + """ +} +``` + +```groovy +// process definition — map form +process align { + benchmark file: "benchmarks/align/${sample}.jsonl", repeats: 3 + // ... +} +``` + +The path is templated with task variables (`task.process`, `task.tag`, +`task.hash`, plus any input variables in scope) and resolved relative to +`workflow.launchDir`, the same as `publishDir`. + +### Config-level usage + +Because `benchmark` is registered in the standard process directive list, it +is settable from `nextflow.config` via the usual `withName` and `withLabel` +selectors. This is a primary use case: benchmarking an existing pipeline (for +example, an nf-core workflow) without editing its source. + +```groovy +// nextflow.config — benchmark one named process, 5 repeats +process { + withName: 'ALIGN' { + benchmark = [file: "benchmarks/align/${task.tag}.tsv", repeats: 5] + } +} +``` + +```groovy +// nextflow.config — benchmark every process matching a label +process { + withLabel: 'heavy' { + benchmark = "benchmarks/${task.process}/${task.tag}.tsv" + } +} +``` + +```groovy +// nextflow.config — benchmark all processes (high cost; documented caveat) +process { + benchmark = [file: "benchmarks/${task.process}/${task.tag}.jsonl", repeats: 3] +} +``` + +### Options + +| Option | Type | Default | Description | +|--|--|--|--| +| `file` | String (templated) | _required_ | Destination path. Templated with task variables; resolved against `workflow.launchDir`. | +| `repeats` | Integer | `1` | Number of independent task executions. See _Repeat semantics_ below. | +| `fields` | List\ | sensible default subset of `TraceRecord.FIELDS` | Which columns to emit. Unknown names are rejected at parse time. | +| `format` | `'tsv'` \| `'jsonl'` | inferred from `file` extension | Explicit override when the extension is ambiguous. | + +### Schema + +Column names match `TraceRecord.FIELDS` directly. This keeps benchmark files +consistent with `trace.txt` and `report.html` so users can correlate them +without translation. + +Default columns: + +`task_id`, `hash`, `process`, `tag`, `attempt`, `realtime`, `%cpu`, +`peak_rss`, `peak_vmem`, `rchar`, `wchar`, `vol_ctxt`, `inv_ctxt` + +Users can override via `fields:`. Any name in `TraceRecord.FIELDS` is +accepted. + +TSV files have a header row followed by one data row per repeat. JSONL files +contain one JSON object per repeat, with keys matching the column names. + +### Repeat semantics + +`repeats: N` does not loop inside the wrapper script. Nextflow submits N +independent task executions (each with its own workdir, its own +`.command.trace`, its own `TraceRecord`), and aggregates them into a single +benchmark file. This stays within Nextflow's existing execution model and +avoids introducing new wrapper-script behavior. + +Implications: + +- Each repeat is a full, isolated task run — independent scheduling, retry, + and failure handling under the configured `errorStrategy`. + +- The benchmark file is written exactly once per logical task invocation, + with N rows (TSV) or N JSON records (JSONL), one per repeat. Each row's + `task_id` and `hash` identifies the underlying execution so users can + cross-reference with `trace.txt`. + +- For output channels, the first successful repeat's outputs are emitted + downstream; other repeats' workdirs are retained on disk for inspection but + their outputs are not emitted. This avoids inventing new semantics around + duplicate outputs and matches Snakemake's behavior where repeated runs do + not change the rule's output graph. + +- All N repeats must reach a terminal state (success or terminal failure) + before the benchmark file is written. + +### Field coverage compared to Snakemake + +Snakemake's columns map cleanly onto Nextflow's native fields. The ADR +chooses to expose Nextflow names directly rather than aliasing, but the +mapping is recorded here for users porting workflows: + +| Snakemake | Nextflow `TraceRecord` | +|--|--| +| `s` | `realtime` | +| `h:m:s` | derived from `realtime` | +| `max_rss` | `peak_rss` | +| `max_vms` | `peak_vmem` | +| `max_uss`, `max_pss` | not collected — gap, documented | +| `io_in` | `rchar` | +| `io_out` | `wchar` | +| `mean_load`, `cpu_time` | derived from `%cpu` and `realtime` | + +### Cloud and remote executors + +`TraceRecord` collection already works for AWS Batch, Google Batch, and +Kubernetes through the existing bash wrapper, so no executor-specific work +is required. The benchmark file is materialized to the resolved path using +the same path-handling code that `publishDir` uses, which supports remote +launch directories (S3, GCS, Azure Blob) without special-casing. + +## Implementation surface + +The change is small and self-contained: + +- `modules/nextflow/src/main/groovy/nextflow/script/dsl/ProcessBuilder.groovy` — + add `'benchmark'` to `DIRECTIVES`. + +- `modules/nextflow/src/main/groovy/nextflow/script/dsl/ProcessConfigBuilder.groovy` — + parse the string-shorthand and map forms. + +- New `modules/nextflow/src/main/groovy/nextflow/processor/Benchmark.groovy` — + mirrors `PublishDir`; fields: `file`, `repeats`, `fields`, `format`. + +- `modules/nextflow/src/main/groovy/nextflow/processor/TaskProcessor.groovy` — + after all repeats complete, serialize the collected `TraceRecord`s to + TSV/JSONL at the resolved path. + +- `modules/nextflow/src/main/groovy/nextflow/trace/TraceRecord.groovy` — no + schema changes; `FIELDS` is reused. + +- `docs/reference/process.md` — new `benchmark` section in the directive + reference. + +No edits to `command-trace.txt` or any wrapper template. + +## Open questions + +- Exact mechanism for fanning out N repeats: reuse the existing + attempt/retry path (treating repeats as forced retries with success) or a + dedicated fan-out scheduler hook. The ADR commits to the user-visible + contract; the mechanism is left to follow-up implementation design. + +- Whether to warn when a workflow-level `benchmark` directive is combined + with caching (`cache true`) — repeated runs against a cached task will + return cached metrics rather than fresh measurements. Tentative default: + emit a warning the first time a benchmarked task hits a cache. + +## Links + +- Snakemake benchmark rules: +- Related directive ADR: [hints process directive](20260323-hints-process-directive.md)