ADR: TaskReadinessGate plugin extension point#7151
Conversation
Adds an architecture decision record proposing a new TaskReadinessGate plugin extension point that defers task submission until external preconditions are met. Replaces the current pattern of subclassing an executor and its task handler purely to override TaskHandler.isReady(), enabling drop-in plugin behavior and removing executor coupling. Status: draft. Signed-off-by: Rob Syme <rob.syme@gmail.com>
✅ Deploy Preview for nextflow-docs-staging canceled.
|
Proposal: collapse the SPI to a single blocking
|
| Current ADR | This proposal | |
|---|---|---|
| Methods on SPI | boolean isReady(TaskRun) |
void prepare(TaskHandler) |
| Plugin must be non-blocking | Yes (contract) | No (runs on managed executor) |
Call sites in TaskPollingMonitor |
Two (schedule fire-once + canSubmit poll) |
One (schedule submit + canSubmit future-poll) |
| Plugin-author cognitive load | "Must return promptly; first call kicks off, subsequent calls report; must be idempotent" | "Write blocking code; throw if hopeless; honor interrupts" |
| Async runtime ownership | Plugin's problem | Core's problem |
| Helper class needed | Yes (deferred to "later") | No |
| Foot-guns | Multiple (idempotency, blocking, polling cost) | One (gate must honor interrupts) |
Net: smaller SPI, fewer call sites, no foot-guns about scheduler-thread blocking, and the resulting plugin code is ~5 lines for the Glacier case instead of ~40. Worth the bounded extra work in core (the gate executor + future cache, ~50 lines).
Happy to draft the actual diff against TaskPollingMonitor.groovy if there's interest in moving this direction.
|
Yeah, I like this suggestion. simpler interface is much nicer. Complexity in core and much smaller effort for plugin authors. @pditommaso - when we come to the actual plugin, are you ok with the a plugin-specific process directive as suggested in the comment? |
|
Think this should go through via the new |
|
FYI, I updated the implementation PR to merge into this branch Please try to keep the ADR up to date with the latest decisions, as it looks like the impl has diverged a good bit |
The ADR originally described a polling boolean isReady(TaskRun) contract. Review on the implementation PR (#7158) led to substantive changes that need to be reflected here: - SPI is blocking void prepare(TaskHandler) throws InterruptedException, not polling. - Orchestration lives in a new TaskGateManager class; TaskPollingMonitor delegates via three one-liners. - No executor.gateMaxWait config option — plugins own timeout policy via the existing hints directive (e.g. glacier/maxWait). - Per-process opt-out via hints (no new directive). - Exception unwrap preserves ProcessException identity and wraps everything else, so ProcessRetryableException markers reach resumeOrDie via cause. - AbstractAsyncReadinessGate helper is permanently a non-goal (not just deferred), since the blocking design needs no async wrapper. Status moved from draft to accepted. Signed-off-by: Rob Syme <rob.syme@gmail.com>
|
Updated the ADR to match the design that landed in #7158:
Status moved from draft to accepted. |
Summary
Adds an architecture decision record proposing a new
TaskReadinessGateplugin extension point that defers task submission until external preconditions are met (e.g. restoring S3 objects from Glacier before an AWS Batch job tries to stage them).The motivation is that plugins needing this capability today must subclass an executor and its task handler purely to override
TaskHandler.isReady(), which:process.executor = '<plugin-specific-name>'instead of a drop-inplugins { id '...' };@CompileStaticis incompatible withAwsBatchTaskHandlersubclassing due to its proxy dispatch);The proposed SPI is consulted by
TaskPollingMonitorbefore submitting any task, works uniformly across every executor, and is a small additive change to core (one interface + a handful of lines at the call site). Behavior is bit-identical when no plugin registers a gate.The ADR documents the contract (polling-based, must return promptly, exceptions signal permanent failure), the call-site integration, considered alternatives (channel operator, S3 NIO interception — both rejected with reasoning), explicit non-goals, and follow-ups left for later (
AbstractAsyncReadinessGatehelper, per-process scoping, gate ordering).Status in the ADR is
draft— opening for review and discussion before implementation.Test plan
nextflow.processorrather thannf-commons.TaskPollingMonitor, Spock specs, and developer-docs section.