Skip to content

Surface allocated resources to self-sizing task commands (nf-seqera)#7199

Draft
pditommaso wants to merge 1 commit into
masterfrom
resource-allocation-placeholders
Draft

Surface allocated resources to self-sizing task commands (nf-seqera)#7199
pditommaso wants to merge 1 commit into
masterfrom
resource-allocation-placeholders

Conversation

@pditommaso
Copy link
Copy Markdown
Member

What & why

Companion client change to seqeralabs/sched#493 — together they fix the qr/v2 request-vs-allocation OOM (seqeralabs/sched#492).

When the Seqera scheduler's optimizer reduces a task's memory allocation below its request, tools that self-size from the requested memory over-commit and get OOM-killed. The motivating case: nf-core BCFTOOLS_SORT renders --max-mem 33177.6M (= task.memory × 0.90) from a 36,864 MiB request, but the container cgroup enforced only 8,704 MiB — bcftools tried to reserve ~29.8 GB in ~8.5 GB and died with exit 255. Affects any self-sizing tool (bcftools/samtools --max-mem, JVM -Xmx, STAR --limitBAMsortRAM, skesa --memory, …).

The command is rendered before the scheduler decides the allocation, so it can't be patched afterward. This change captures the resource-derived command values and turns them into shell placeholders; the scheduler (sched#493) fills them with the allocated value at launch — keeping the memory cost savings while eliminating the OOM.

How it works (Seqera executor only; no re-execution of the user script)

At task materialization:

  1. TaskRun retains the live command GString (before it's flattened to a string), on the plain scriptlet path only — a minimal hook so the executor can inspect which command values were interpolated.
  2. ResourceInterpolator (new) locates resource-derived values from three signals — the live command GString (exact positions, recursing into nested GStrings), the raw body.source (the interpolation expressions), and the AST valRefs (a task-referenced gate). A value is memory/cpus-derived when its expression references task.memory/task.cpus directly or through one level of local-variable indirection. Each such numeric value is replaced with ${SEQERA_TASK_MEM_n} / ${SEQERA_TASK_CPUS_n}.
  3. SeqeraTaskHandler rewrites task.script before the launcher builds .command.sh, ships the bindings on the task, and seeds each placeholder with its as-requested value as an environment default.

The placeholder is a plain shell variable expanded by bash at runtime — .command.sh is never rewritten by the scheduler.

Safety (never worse than today)

The parsed command template is validated against the live GString (literal text must match, ignoring whitespace/escaping) before any attribution, so a mis-identified template cannot produce wrong bindings. The original command is returned unchanged on: a placeholder that would land inside shell single-quotes (where it wouldn't expand), a joint memory+cpus expression, a non-numeric value, or any parse/validation failure. And because the client seeds the placeholders with the request value, backends that don't resolve bindings (or an older scheduler) still produce a valid command identical to today's behavior.

Dependency / merge order

  • nf-seqera is bumped to io.seqera:sched-client:0.60.0-SNAPSHOT, which provides the ResourceBinding API added in seqeralabs/sched#493. This PR builds once that artifact is published, so it depends on sched#493 landing first. Draft until then.

Tests

  • ResourceInterpolatorTest — 21 cases: direct self-sizing (bcftools, -Xmx giga, STAR bytes), cpus arithmetic, multi-line commands with continuations, two memory refs, double-quote expansion, local-variable indirection (skesa ternary), joint mem+cpus left untouched, and the safety fallbacks (single-quote, template mismatch, unrecognized form, anchored task.memoryUsage).
  • SeqeraTaskHandlerTestprepareLauncher rewrites task.script and captures bindings; no-op when nothing is resource-derived.

🤖 Generated with Claude Code

…(sched#492)

Companion client change to seqeralabs/sched#493. Fixes the qr/v2
request-vs-allocation OOM: when the scheduler reduces a task's memory allocation
below its request, tools that self-size from the requested memory (bcftools
--max-mem, JVM -Xmx, STAR, etc.) over-commit and get OOM-killed, because the
command was rendered from the request before the allocation was decided.

At materialization (no re-execution of the user script) the executor now:
- retains the live command GString on TaskRun (scriptlet path only);
- ResourceInterpolator captures command values derived from task.memory/task.cpus
  using the live GString + the raw body.source + the AST `task` reference, validates
  the parsed template against the GString, and replaces each resource-derived numeric
  value with a ${SEQERA_TASK_MEM_n}/${SEQERA_TASK_CPUS_n} shell placeholder;
- SeqeraTaskHandler rewrites task.script, ships the bindings on the task, and seeds
  each placeholder with its as-requested value as an env default (so backends that
  don't resolve bindings, or an older scheduler, still produce a valid command).

The scheduler (sched#493) fills the placeholders with value*allocated/requested at
launch. Safety: a mis-identified template, a placeholder that would land inside shell
single-quotes, a joint memory+cpus expression, or any parse failure all fall back to
the original command — never worse than today.

Note: the nf-seqera dependency is bumped to sched-client:0.60.0-SNAPSHOT; this PR
builds once that artifact (the API in sched#493) is published.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
@netlify
Copy link
Copy Markdown

netlify Bot commented Jun 2, 2026

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit 61715b2
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/6a1f3d56d566dd00080c85e1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant