Skip to content

ADR: benchmark process directive for per-task performance metrics#7146

Open
edmundmiller wants to merge 1 commit into
masterfrom
claude/nextflow-benchmark-adr-JTtRv
Open

ADR: benchmark process directive for per-task performance metrics#7146
edmundmiller wants to merge 1 commit into
masterfrom
claude/nextflow-benchmark-adr-JTtRv

Conversation

@edmundmiller

Copy link
Copy Markdown
Member

Summary

This ADR proposes a new benchmark process directive that enables per-task performance metrics collection with support for repeated execution to reduce measurement noise. The directive reuses Nextflow's existing TraceRecord infrastructure and integrates seamlessly with the configuration system.

Overview

The benchmark directive addresses a gap in Nextflow's metrics capabilities by providing:

  • Per-process metrics capture: A declarative way to emit task runtime metrics (wall time, peak memory, CPU usage, I/O) to a user-specified file
  • Configuration-driven benchmarking: Settable via nextflow.config using withName and withLabel selectors, enabling benchmarking of existing pipelines without source modification
  • Repeated execution support: Optional repeats: N parameter that spawns N independent task executions and aggregates their metrics
  • Multiple output formats: TSV (default) and JSONL output driven by file extension
  • Deterministic schema: Column names match TraceRecord.FIELDS for consistency with existing trace artifacts

Key Design Decisions

  • Reuses existing infrastructure: Leverages TraceRecord collection through the existing bash wrapper—no new probes or wrapper changes required
  • Independent repeats: Each repeat is a full, isolated task execution with its own workdir and scheduling, not a loop inside the wrapper script
  • First-repeat outputs: For repeated tasks, only the first successful repeat's outputs are emitted downstream; other repeats' workdirs are retained for inspection
  • Cloud-native: Works with AWS Batch, Google Batch, Kubernetes, and remote launch directories (S3, GCS, Azure Blob) through existing path-handling code
  • Consistent naming: Uses Nextflow's native TraceRecord field names rather than aliasing to Snakemake conventions

Directive Syntax

// Shorthand string form
process align {
    benchmark "benchmarks/align/${sample}.tsv"
    // ...
}

// Map form with options
process align {
    benchmark file: "benchmarks/align/${sample}.jsonl", repeats: 3
    // ...
}

// Config-level usage
process {
    withName: 'ALIGN' {
        benchmark = [file: "benchmarks/align/${task.tag}.tsv", repeats: 5]
    }
}

Implementation Scope

The implementation is self-contained and minimal:

  • Add benchmark to the standard process directive list
  • Parse string shorthand and map forms in ProcessConfigBuilder
  • Create new Benchmark class mirroring PublishDir structure
  • Serialize aggregated TraceRecords to TSV/JSONL after all repeats complete
  • No changes to trace, report, timeline, or wrapper subsystems

Non-Goals

  • Aggregating benchmarks across tasks (existing trace.txt and report.html cover this)
  • Collecting metrics beyond TraceRecord (e.g., max_uss, max_pss are documented gaps)
  • Reliable benchmarking inside exec: blocks
  • Changes to existing trace/report/timeline infrastructure

https://claude.ai/code/session_01RW1UD9eQWkDLfR97WdegXk

Proposes a benchmark directive that emits per-task TraceRecord metrics
to a user-specified TSV or JSONL file, with an optional repeats option
that fans out into N independent task executions. Settable from
nextflow.config so existing pipelines can be benchmarked without source
changes.

Signed-off-by: Claude <noreply@anthropic.com>
@netlify

netlify Bot commented May 14, 2026

Copy link
Copy Markdown

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit 347f8db
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/6a062f62f2250000081c4767

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants