Add nextflow logfile command#7177
Conversation
A new CLI command that prints the contents of a Nextflow log file, resolved either by run name (via HistoryFile), session id, or direct file path. Features: - Log-level filtering (`-level TRACE|DEBUG|INFO|WARN|ERROR`); continuation lines (e.g. stack traces) inherit their parent's level - `-n N` last-N entries (multi-line entries kept intact, not raw lines) - `-f` follow mode with rotation detection - ANSI stripping from log content by default; `-keep-ansi` to preserve - Coloured output by default on a TTY (`-no-ansi` / `NO_COLOR` to disable): per-level indicator block + syntax highlighting via a port of the nextflow-log TextMate grammar from vscode-language-nextflow - Pager integration by default on a TTY (`-no-pager` to disable): \$NXF_PAGER then \$PAGER then `less -FR` fallback; bare `less` augmented with `-FR` so mouse-wheel scrolling works in alt-screen mode Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Phil Ewels <phil.ewels@seqera.io>
✅ Deploy Preview for nextflow-docs-staging canceled.
|
|
Oh this awesome @ewels, nice work! would def help reduce context with agents. |
|
@bentsherman I'd love to try and get this in so that I can finish up the blog post + video including this together with the VS Code equivalent. Any first-impression thoughts on how much work you think it needs? 😅 Hopefully it's fairly isolated from the rest of the codebase.. |
|
The feature is isolated which is good, but we must also consider the cost of maintaining the feature in perpetuity. From the video it seems pretty useful, more than just a wrapper around Can you explain the "rotation detection" and "pager integration" ? If @pditommaso approves then we'll merge it |
|
The rotation is just about working with both Pager integration - if the log file is more lines than the visible terminal, it uses |
pditommaso
left a comment
There was a problem hiding this comment.
Thanks for this — nicely structured and well-commented work. I ran a deep review of CmdLogFile.groovy and LogFileFormatter.groovy. Most findings are edge cases and reuse opportunities, but there are two real robustness bugs in the formatter and a behavior gap vs the PR description. Details below.
🔴 Correctness / robustness
1. LogFileFormatter.groovy:220 — uncaught StackOverflowError on long path-like content (reproduced)
The path rule ~/(?<![\w.\/@-])\/(?:[\w.-]+\/)*[\w.-]+\/?/ recurses in the JDK regex engine. I reproduced a java.lang.StackOverflowError on a contiguous slash-path of a few thousand segments. format() has no surrounding try/catch in printAll/printLastN/followFile, so a single pathological line (a dumped PATH/module path, a deeply nested staging path) aborts the whole command. Suggest making the inner group atomic/possessive ((?:[\w.-]+/)*+) or bounding the match length.
2. LogFileFormatter.groovy:320 — tokenizeBody is O(rules × n²) on long lines (~12s on 40 KB)
The loop re-creates a Matcher for all ~18 BODY_RULES and re-runs find(pos) on every position advance. A single long log line (inline serialized map, captured command output) freezes the formatter for seconds before the pager even sees it. Consider caching matchers + region(), or capping per-line highlighting length.
3. CmdLogFile.groovy:360 — follow mode does not detect rename+recreate rotation
The PR advertises "follow mode with rotation detection," but the only check is raf.length() < raf.getFilePointer() (truncation). On Unix the standard rotate (rename .nextflow.log→.1, create a fresh .nextflow.log) leaves the RandomAccessFile bound to the old inode; length never shrinks, so the reopen never fires and follow silently stalls. Either detect inode change (re-stat the path) or soften the claim.
4. CmdLogFile.groovy:119 — -f -n N can drop lines (TOCTOU)
printLastN reads its own BufferedReader to EOF, then followFile(path, …, Files.size(path)) seeks by a byte length sampled afterward. Lines appended in that window are printed by neither path. Capture the byte offset once and pass it to both, or have printLastN return where it stopped.
5. CmdLogFile.groovy:250 — fileContainsSessionId 200-line cap yields spurious "cannot find log"
At -trace/-debug, verbose plugin/classpath logging can push the session UUID past line 200, so resolution-by-run-name fails on the correct file and tells the user to pass the path explicitly. The cap is a reasonable perf guard but the failure mode is confusing.
6. CmdLogFile.groovy:232 — null sessionId stringifies to "null" and matches anything
If a history record carries a null sessionId, record.sessionId.toString() is "null", and fileContainsSessionId then matches the literal substring null (common in logs, e.g. value=null), resolving to the wrong run's log. Guard for null before scanning.
🟡 Reuse / cleanup
7. CmdLogFile.groovy:136 — resolveUseColor reinvents ColorUtil.isAnsiEnabled() and ignores NXF_ANSI_LOG
ColorUtil.isAnsiEnabled() is the canonical color gate (NO_COLOR → NXF_ANSI_LOG → Ansi.isEnabled()). The bespoke logic honors NO_COLOR but not NXF_ANSI_LOG, so a user who disables color globally still gets colored logfile output. Suggest reusing the helper.
8. CmdLogFile.groovy:212 — run-name/session resolution hand-rolled instead of the CacheBase trait
CacheBase (used by CmdLog/CmdClean) already centralizes history init from HistoryFile.DEFAULT, the empty-history abort, and findByIdOrName. CmdLogFile redeclares its own history field, a diverging empty-history message (line 225), and DEFAULT_QUERY='last'. Adopting the trait removes the duplication and the redundant exists()/empty() pre-check (line 222 — findByIdOrName('last') already returns empty for empty history).
9. CmdLogFile.groovy:64 — ANSI_CSI overlaps AnsiLogObserver.stripAnsi and misses OSC-8 hyperlinks
(The pattern does include the ESC prefix, so ordinary [...] text is safe.) The existing AnsiLogObserver.ANSI_ESCAPE also strips OSC-8 hyperlink sequences (ESC ] … BEL); the CSI-only pattern leaves those in. Minor, but worth consolidating to one shared stripper.
Happy to help with the high-confidence items (#1 atomic regex, #2 matcher caching, #7 ColorUtil reuse) if useful.
🤖 Generated with Claude Code
Address PR review findings on CmdLogFile/LogFileFormatter: - Fix StackOverflowError on long path-like content by making the path body rule fully iterative with possessive quantifiers (matching is unchanged for real paths). - Make tokenizeBody ~O(rules·n): allocate one Matcher per rule per line and re-scan a rule only when the cursor overtakes its cached match, instead of re-allocating all matchers at every position. - Close a TOCTOU gap in `-f -n N`: snapshot the byte offset once and bound the tail read to it so no appended lines are dropped between the tail read and the follow seek. - Guard against a null session id when resolving a run name. - Honor NXF_ANSI_LOG (not just NO_COLOR) when resolving color output. - Raise the session-id scan cap to 1000 lines to avoid spurious "cannot find log" resolutions under verbose -trace/-debug logging. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Phil Ewels <phil.ewels@seqera.io>
|
Addressed most of the review in Three items I deliberately left out, with reasoning: #3 — follow-mode rename+recreate rotation detectionValid gap: only truncation ( #8 — adopt the
|
pditommaso
left a comment
There was a problem hiding this comment.
Thanks for the thorough turnaround — ad42c89 addresses everything substantive. I re-checked the fixes:
- #1
StackOverflowError— verified the possessive-quantifier path rule ((?:[\w.-]++/?)++) no longer recurses on a pathological multi-thousand-segment path, and produces byte-identical matches to the old pattern across real paths (/a/b/c,/a/b/c/,/work/..., embedded and non-path cases). 👍 - #2
tokenizeBody— the per-rule cached-Matcherrewrite is amortized O(rules·n); tie-breaking and overlap handling preserve the original output semantics. - #4
-f -n N— snapshotting the size once and bounding the tail read viaBoundedInputStreamcleanly closes the TOCTOU gap. - #5 / #6 / #7 — scan-cap bump, null-
sessionIdguard, andNXF_ANSI_LOGhandling all look good; theNXF_ANSI_LOGparsing matchesColorUtil.isAnsiEnabled().
The reasoning for declining #8 (CacheBase) and #9 (OSC-8) is sound — agreed on both.
Approving. One nice-to-have, not blocking: for #3 the comment at CmdLogFile.groovy:417 (// file was truncated or rotated — reopen from the start) and the PR description still mention rotation, while the code only handles truncation. Worth aligning the wording to "reopens on truncation" when convenient.
🤖 Generated with Claude Code
|
Great, I will merge this after I merge a few bug fixes for 26.04 |
|
(once we merge this PR we will have to actually start using a |
A new CLI command that prints the contents of a Nextflow log file, resolved either by run name (via HistoryFile), session id, or direct file path.
Mirrors nextflow-io/vscode-language-nextflow#197 in functionality.
Main motivation is to save tokens and context for LLM usage by deterministically filtering logs:
Log files can be thousands of lines long, printing from this command can return just a few lines about warnings and errors. Coincidentally, the same behaviour is also useful for Humans :)
All the fancy behaviour is turned off automatically if there's no tty (eg. if called by an agent, or if being piped into another tool).
Features:
-level TRACE|DEBUG|INFO|WARN|ERROR); continuation lines (e.g. stack traces) inherit their parent's level-n Nlast-N entries (multi-line entries kept intact, not raw lines)-ffollow mode with rotation detection-keep-ansito preserve-no-ansi/NO_COLORto disable): per-level indicator block + syntax highlighting via a port of the nextflow-log TextMate grammar from vscode-language-nextflow-no-pagerto disable): $NXF_PAGER then $PAGER thenless -FRfallback; barelessaugmented with-FRso mouse-wheel scrolling works in alt-screen modeCleanShot.2026-05-27.at.00.42.11-converted.mp4