Skip to content

Add nextflow logfile command#7177

Open
ewels wants to merge 4 commits into
nextflow-io:masterfrom
ewels:nextflow-logfile
Open

Add nextflow logfile command#7177
ewels wants to merge 4 commits into
nextflow-io:masterfrom
ewels:nextflow-logfile

Conversation

@ewels
Copy link
Copy Markdown
Member

@ewels ewels commented May 26, 2026

A new CLI command that prints the contents of a Nextflow log file, resolved either by run name (via HistoryFile), session id, or direct file path.

Mirrors nextflow-io/vscode-language-nextflow#197 in functionality.

Main motivation is to save tokens and context for LLM usage by deterministically filtering logs:

nextflow logfile .nextflow.log.4 -level ERROR

Log files can be thousands of lines long, printing from this command can return just a few lines about warnings and errors. Coincidentally, the same behaviour is also useful for Humans :)

All the fancy behaviour is turned off automatically if there's no tty (eg. if called by an agent, or if being piped into another tool).

Features:

  • Log-level filtering (-level TRACE|DEBUG|INFO|WARN|ERROR); continuation lines (e.g. stack traces) inherit their parent's level
  • -n N last-N entries (multi-line entries kept intact, not raw lines)
  • -f follow mode with rotation detection
  • ANSI stripping from log content by default; -keep-ansi to preserve
  • Coloured output by default on a TTY (-no-ansi / NO_COLOR to disable): per-level indicator block + syntax highlighting via a port of the nextflow-log TextMate grammar from vscode-language-nextflow
  • Pager integration by default on a TTY (-no-pager to disable): $NXF_PAGER then $PAGER then less -FR fallback; bare less augmented with -FR so mouse-wheel scrolling works in alt-screen mode
CleanShot.2026-05-27.at.00.42.11-converted.mp4

A new CLI command that prints the contents of a Nextflow log file,
resolved either by run name (via HistoryFile), session id, or direct
file path. Features:

- Log-level filtering (`-level TRACE|DEBUG|INFO|WARN|ERROR`);
  continuation lines (e.g. stack traces) inherit their parent's level
- `-n N` last-N entries (multi-line entries kept intact, not raw lines)
- `-f` follow mode with rotation detection
- ANSI stripping from log content by default; `-keep-ansi` to preserve
- Coloured output by default on a TTY (`-no-ansi` / `NO_COLOR` to
  disable): per-level indicator block + syntax highlighting via a port
  of the nextflow-log TextMate grammar from vscode-language-nextflow
- Pager integration by default on a TTY (`-no-pager` to disable):
  \$NXF_PAGER then \$PAGER then `less -FR` fallback; bare `less`
  augmented with `-FR` so mouse-wheel scrolling works in alt-screen mode

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Phil Ewels <phil.ewels@seqera.io>
@ewels ewels requested a review from a team as a code owner May 26, 2026 22:45
@netlify
Copy link
Copy Markdown

netlify Bot commented May 26, 2026

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit ad42c89
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/6a1d3fd20842110008aececa

@vpatel85
Copy link
Copy Markdown

Oh this awesome @ewels, nice work! would def help reduce context with agents.

@ewels
Copy link
Copy Markdown
Member Author

ewels commented May 30, 2026

@bentsherman I'd love to try and get this in so that I can finish up the blog post + video including this together with the VS Code equivalent.

Any first-impression thoughts on how much work you think it needs? 😅 Hopefully it's fairly isolated from the rest of the codebase..

@bentsherman bentsherman requested a review from pditommaso May 30, 2026 14:58
@bentsherman
Copy link
Copy Markdown
Member

The feature is isolated which is good, but we must also consider the cost of maintaining the feature in perpetuity.

From the video it seems pretty useful, more than just a wrapper around grep and tail. I'm thinking of the ANSI foreground / background

Can you explain the "rotation detection" and "pager integration" ?

If @pditommaso approves then we'll merge it

@ewels
Copy link
Copy Markdown
Member Author

ewels commented May 31, 2026

The rotation is just about working with both .nextflow.log and .nextflow.log.1, .nextflow.log.2 etc.

Pager integration - if the log file is more lines than the visible terminal, it uses less (or the configured $PAGER to render the logs instead of just spitting thousands of lines to the terminal. It's modelled on how git log and other git commands work.

Copy link
Copy Markdown
Member

@pditommaso pditommaso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this — nicely structured and well-commented work. I ran a deep review of CmdLogFile.groovy and LogFileFormatter.groovy. Most findings are edge cases and reuse opportunities, but there are two real robustness bugs in the formatter and a behavior gap vs the PR description. Details below.

🔴 Correctness / robustness

1. LogFileFormatter.groovy:220 — uncaught StackOverflowError on long path-like content (reproduced)
The path rule ~/(?<![\w.\/@-])\/(?:[\w.-]+\/)*[\w.-]+\/?/ recurses in the JDK regex engine. I reproduced a java.lang.StackOverflowError on a contiguous slash-path of a few thousand segments. format() has no surrounding try/catch in printAll/printLastN/followFile, so a single pathological line (a dumped PATH/module path, a deeply nested staging path) aborts the whole command. Suggest making the inner group atomic/possessive ((?:[\w.-]+/)*+) or bounding the match length.

2. LogFileFormatter.groovy:320tokenizeBody is O(rules × n²) on long lines (~12s on 40 KB)
The loop re-creates a Matcher for all ~18 BODY_RULES and re-runs find(pos) on every position advance. A single long log line (inline serialized map, captured command output) freezes the formatter for seconds before the pager even sees it. Consider caching matchers + region(), or capping per-line highlighting length.

3. CmdLogFile.groovy:360 — follow mode does not detect rename+recreate rotation
The PR advertises "follow mode with rotation detection," but the only check is raf.length() < raf.getFilePointer() (truncation). On Unix the standard rotate (rename .nextflow.log.1, create a fresh .nextflow.log) leaves the RandomAccessFile bound to the old inode; length never shrinks, so the reopen never fires and follow silently stalls. Either detect inode change (re-stat the path) or soften the claim.

4. CmdLogFile.groovy:119-f -n N can drop lines (TOCTOU)
printLastN reads its own BufferedReader to EOF, then followFile(path, …, Files.size(path)) seeks by a byte length sampled afterward. Lines appended in that window are printed by neither path. Capture the byte offset once and pass it to both, or have printLastN return where it stopped.

5. CmdLogFile.groovy:250fileContainsSessionId 200-line cap yields spurious "cannot find log"
At -trace/-debug, verbose plugin/classpath logging can push the session UUID past line 200, so resolution-by-run-name fails on the correct file and tells the user to pass the path explicitly. The cap is a reasonable perf guard but the failure mode is confusing.

6. CmdLogFile.groovy:232 — null sessionId stringifies to "null" and matches anything
If a history record carries a null sessionId, record.sessionId.toString() is "null", and fileContainsSessionId then matches the literal substring null (common in logs, e.g. value=null), resolving to the wrong run's log. Guard for null before scanning.

🟡 Reuse / cleanup

7. CmdLogFile.groovy:136resolveUseColor reinvents ColorUtil.isAnsiEnabled() and ignores NXF_ANSI_LOG
ColorUtil.isAnsiEnabled() is the canonical color gate (NO_COLOR → NXF_ANSI_LOGAnsi.isEnabled()). The bespoke logic honors NO_COLOR but not NXF_ANSI_LOG, so a user who disables color globally still gets colored logfile output. Suggest reusing the helper.

8. CmdLogFile.groovy:212 — run-name/session resolution hand-rolled instead of the CacheBase trait
CacheBase (used by CmdLog/CmdClean) already centralizes history init from HistoryFile.DEFAULT, the empty-history abort, and findByIdOrName. CmdLogFile redeclares its own history field, a diverging empty-history message (line 225), and DEFAULT_QUERY='last'. Adopting the trait removes the duplication and the redundant exists()/empty() pre-check (line 222 — findByIdOrName('last') already returns empty for empty history).

9. CmdLogFile.groovy:64ANSI_CSI overlaps AnsiLogObserver.stripAnsi and misses OSC-8 hyperlinks
(The pattern does include the ESC prefix, so ordinary [...] text is safe.) The existing AnsiLogObserver.ANSI_ESCAPE also strips OSC-8 hyperlink sequences (ESC ] … BEL); the CSI-only pattern leaves those in. Minor, but worth consolidating to one shared stripper.


Happy to help with the high-confidence items (#1 atomic regex, #2 matcher caching, #7 ColorUtil reuse) if useful.

🤖 Generated with Claude Code

Address PR review findings on CmdLogFile/LogFileFormatter:

- Fix StackOverflowError on long path-like content by making the path
  body rule fully iterative with possessive quantifiers (matching is
  unchanged for real paths).
- Make tokenizeBody ~O(rules·n): allocate one Matcher per rule per line
  and re-scan a rule only when the cursor overtakes its cached match,
  instead of re-allocating all matchers at every position.
- Close a TOCTOU gap in `-f -n N`: snapshot the byte offset once and
  bound the tail read to it so no appended lines are dropped between the
  tail read and the follow seek.
- Guard against a null session id when resolving a run name.
- Honor NXF_ANSI_LOG (not just NO_COLOR) when resolving color output.
- Raise the session-id scan cap to 1000 lines to avoid spurious
  "cannot find log" resolutions under verbose -trace/-debug logging.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Phil Ewels <phil.ewels@seqera.io>
@ewels
Copy link
Copy Markdown
Member Author

ewels commented Jun 1, 2026

Addressed most of the review in ad42c89 (#1 StackOverflowError, #2 tokenizeBody perf, #4 follow-mode TOCTOU, #5 scan cap, #6 null sessionId, #7 NXF_ANSI_LOG).

Three items I deliberately left out, with reasoning:

#3 — follow-mode rename+recreate rotation detection

Valid gap: only truncation (raf.length() < getFilePointer()) is detected, not the Unix rename+recreate rotate (rename .nextflow.log.1, create a fresh inode). In practice this only triggers if a new Nextflow run starts in the same directory mid-follow, since Nextflow rotates the log at launch, not during a run. Rather than add inode (fileKey) re-stat polling on every cycle — extra cost and complexity for an edge case — I'd prefer to soften the wording to "reopens on truncation." Happy to add inode detection instead if the rename case is worth covering.

#8 — adopt the CacheBase trait

Declined. CacheBase is built around -before/-after/-but date filtering and a listIds() that returns a list of records across those filters; logfile has none of that and wants a single resolved record. Adopting it would mean implementing getBut/getBefore/getAfter as no-ops and bending listIds() — added coupling without removing meaningful duplication. The explicit exists()/empty() pre-check is also deliberate: it yields "no pipeline was executed" rather than the generic "Unknown run name or session id: last". I can align the empty-history message wording with CacheBase's if consistency is preferred, but the trait itself isn't a clean fit.

#9 — OSC-8 hyperlink stripping

Left as-is. OSC-8 (ESC ] … BEL) sequences essentially never appear in Nextflow log content (which is plain CSI color codes at most), and AnsiLogObserver's stripper isn't reusable from here (private pattern, protected method in a different package). The extension would be a one-line alternation (…|ESC\][^\007]*\007), easy to add later if a real case surfaces. Low value for now.

Copy link
Copy Markdown
Member

@pditommaso pditommaso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough turnaround — ad42c89 addresses everything substantive. I re-checked the fixes:

  • #1 StackOverflowError — verified the possessive-quantifier path rule ((?:[\w.-]++/?)++) no longer recurses on a pathological multi-thousand-segment path, and produces byte-identical matches to the old pattern across real paths (/a/b/c, /a/b/c/, /work/..., embedded and non-path cases). 👍
  • #2 tokenizeBody — the per-rule cached-Matcher rewrite is amortized O(rules·n); tie-breaking and overlap handling preserve the original output semantics.
  • #4 -f -n N — snapshotting the size once and bounding the tail read via BoundedInputStream cleanly closes the TOCTOU gap.
  • #5 / #6 / #7 — scan-cap bump, null-sessionId guard, and NXF_ANSI_LOG handling all look good; the NXF_ANSI_LOG parsing matches ColorUtil.isAnsiEnabled().

The reasoning for declining #8 (CacheBase) and #9 (OSC-8) is sound — agreed on both.

Approving. One nice-to-have, not blocking: for #3 the comment at CmdLogFile.groovy:417 (// file was truncated or rotated — reopen from the start) and the PR description still mention rotation, while the code only handles truncation. Worth aligning the wording to "reopens on truncation" when convenient.

🤖 Generated with Claude Code

@bentsherman
Copy link
Copy Markdown
Member

Great, I will merge this after I merge a few bug fixes for 26.04

@bentsherman
Copy link
Copy Markdown
Member

(once we merge this PR we will have to actually start using a STABLE-26.04.x branch)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants