Skip to content

Add nextflow lineage validate command (first pass)#7168

Draft
edmundmiller wants to merge 10 commits into
lineage-validate-adrfrom
lineage-validate
Draft

Add nextflow lineage validate command (first pass)#7168
edmundmiller wants to merge 10 commits into
lineage-validate-adrfrom
lineage-validate

Conversation

@edmundmiller
Copy link
Copy Markdown
Member

@edmundmiller edmundmiller commented May 21, 2026

Summary

First-pass implementation of nextflow lineage validate per the design in #7167 (stacked on that PR). Ships the v1 surface so we can use it in practice and feed back into the design.

What's in:

  • LinNormalizer — encodes lineage records, strips ephemeral fields, supports --output-base relativization and --ignore-fields; provides recursive structural compare with diff paths
  • nextflow lineage validate <lid> --against <lid> — CLI command with -against, -ignore-fields, -output-base options; emits JGit-style unified diff and exits non-zero on divergence
  • LineageSnapshotter — Spock helper for pipeline tests; supports save-or-compare with UPDATE_SNAPSHOTS env var, plus direct assertEquivalent(lidA, lidB)
  • Integration tests — seven scenarios covering equivalent runs, output checksum drift, parameter drift, output count mismatch, nested directories, --ignore-fields, and diff rendering
  • Regression test for JCommander wiring — catches the option-parsing bug fixed in the last commit

Known gaps vs the ADR (deferred to follow-ups):

  • No shared LineageValidator core yet; CLI and LineageSnapshotter still have parallel logic
  • No snapshot-file baseline (--against <file.json>) or --save-snapshot
  • No --json / --summary / CI auto-detection
  • No diff categorisation, runtime-fingerprint handling, or one-hop causality enrichment
  • No LineageResolver SPI; only lid:// works today

Treat this as a working spike to validate the design against real lineage data — feedback expected to inform the v1.1 refactor.

Test plan

  • Run ./gradlew :nf-lineage:test --tests LinValidateIntegrationTest
  • Run ./gradlew :nextflow:test --tests LauncherTest (regression for JCommander wiring)
  • Manual: launch two pipeline runs against the local lineage store, nextflow lineage validate lid://<runA> -against lid://<runB>, confirm divergence on params/outputs and equivalence on identical runs
  • Try -ignore-fields commitId to verify field exclusion

🤖 Generated with Claude Code

Stack

  1. ADR: lineage validate semantic equivalence check #7167
  2. Add nextflow lineage validate command (first pass) #7168 👈 current

@edmundmiller edmundmiller marked this pull request as draft May 21, 2026 14:54
Normalizes lineage data by stripping ephemeral fields (sessionId,
timestamps, paths, LID refs) that vary between runs but don't
affect semantic equivalence. Used for comparing workflow runs.

Signed-off-by: Edmund Miller <edmund.miller@seqera.io>
Adds 'nextflow lineage validate <lid> --against <baseline>' command
to compare two workflow runs semantically. Ignores ephemeral fields
and shows diff on mismatch. Exits non-zero if runs differ.

Signed-off-by: Edmund Miller <edmund.miller@seqera.io>
Tests for CLI validate: equivalent runs, different checksums,
different params, missing --against arg, --ignore-fields option.

Signed-off-by: Edmund Miller <edmund.miller@seqera.io>
Provides snapshot testing for pipeline validation in Spock tests:
- assertMatchesSnapshot(lid, snapshotId) - compare against baseline
- assertEquivalent(lid1, lid2) - compare two runs directly
- UPDATE_SNAPSHOTS=true env var to update baselines

Signed-off-by: Edmund Miller <edmund.miller@seqera.io>
End-to-end tests exercising validate command with workflow runs:
equivalent runs with outputs, different checksums, different params,
different output counts, nested outputs, ignore-fields option.

Signed-off-by: Edmund Miller <edmund.miller@seqera.io>
Tests for 'nextflow lineage validate' command:
- lineage-validate-basic: identical runs should be equivalent
- lineage-validate-changes: detect param/output differences
- lineage-validate-multi: multi-process pipeline validation
- lineage-validate-resume: resume mode produces equivalent lineage

Tests use temp lineage stores, are conditional on lineage being
available, and follow existing tests/ structure with .checks scripts.

Signed-off-by: Edmund Miller <edmund.miller@seqera.io>
Asserts that `nextflow lineage validate <lid> -against <lid>` (and the
related -ignore-fields / -output-base) parse through Launcher's JCommander
wiring. Marked @PendingFeature because CmdLineage does not yet declare
these options, so JCommander rejects them with "Unknown option".

The existing CmdLineageTest and LinValidateIntegrationTest both bypass
JCommander by constructing commands directly, so they cannot catch this.

Signed-off-by: Edmund Miller <edmund.miller@seqera.io>
JCommander 1.35 (used by Nextflow) rejected `-against`, `-ignore-fields`
and `-output-base` because the options were not declared on CmdLineage.
Declare them with single-dash names (matching the rest of Nextflow's CLI)
and forward them into the sub-command args list so LinCommandImpl.validate
keeps its existing `--`-prefixed parser unchanged.

Removes the @PendingFeature markers added in the previous commit.

Signed-off-by: Edmund Miller <edmund.miller@seqera.io>
Add @title, @Narrative, @subject, and @see annotations to LinNormalizerTest,
LinValidateIntegrationTest, LineageSnapshotterTest, and the validate-specific
specs in LinCommandImplTest. Each method-level @see points at the ADR
decision section the spec exercises so future readers can trace any change
in behaviour back to the decision that motivates it.

Signed-off-by: Edmund Miller <edmund.miller@seqera.io>
Replace six hand-rolled three-pass log-filter chains with the existing
test.TestHelper.filterLogNoise helper (already imported in
LinCommandImplTest for the legacy tests). Cuts ~30 lines.

Leave TODO/FIXME markers on three pre-existing issues surfaced during
review but out of scope for this change: unused @tempdir in
LinNormalizerTest, @shared fields reassigned per-test in
LinValidateIntegrationTest, and the Workflow/WorkflowRun/FileOutput
fixture-construction pattern that repeats ~25 times.

Signed-off-by: Edmund Miller <edmund.miller@seqera.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant