Skip to content

OpenLineage: add Persistence Layer SPI#4826

Open
iting0321 wants to merge 15 commits into
apache:mainfrom
iting0321:ol-persistence-spi
Open

OpenLineage: add Persistence Layer SPI#4826
iting0321 wants to merge 15 commits into
apache:mainfrom
iting0321:ol-persistence-spi

Conversation

@iting0321

@iting0321 iting0321 commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Description

This PR adds the persistence lineage contract and runtime delegation needed for future OpenLineage storage implementations.
The link for Persistence Contract.

It defines the dataset, dataset-edge, and column-edge payloads used by persistence backends, adds a LineagePersistence SPI, and wires lineage ingest/query calls through the runtime service while keeping lineage disabled unless explicitly configured.

Changes

  • Add persistence-oriented lineage records:

    • LineageDataset
    • LineageEdge
    • LineageColumnEdge
    • LineageFieldReference
    • LineageIngestRequest
  • Add LineagePersistence with methods for:

    • upsertDatasets
    • replaceDatasetEdges
    • upsertColumnEdges
    • loadLineage
  • Add DisabledLineagePersistence as the default placeholder implementation.

  • Wire DefaultLineageService to delegate ingest/query operations to LineagePersistence.

  • Add unit tests for lineage persistence and service delegation.

Checklist

  • 🛡️ Don't disclose security issues! (contact security@apache.org)
  • 🔗 Clearly explained why the changes are needed, or linked related issues: Fixes #
  • 🧪 Added/updated tests with good coverage, or manually tested (and explained how)
  • 💡 Added comments for complex logic
  • 🧾 Updated CHANGELOG.md (if needed)
  • 📚 Updated documentation in site/content/in-dev/unreleased (if needed)

@github-project-automation github-project-automation Bot moved this to PRs In Progress in Basic Kanban Board Jun 18, 2026
@iting0321 iting0321 changed the title OpenLineage: Persistence Layer SPI OpenLineage: add Persistence Layer SPI Jun 20, 2026
@iting0321 iting0321 marked this pull request as ready for review June 22, 2026 04:26
Copilot AI review requested due to automatic review settings June 22, 2026 04:26

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new lineage persistence SPI in polaris-core and adds a runtime-service implementation that gates lineage operations behind config + a realm feature flag, with a default “disabled” persistence implementation and unit tests.

Changes:

  • Added core lineage request/response models plus a LineagePersistence SPI and LineageService boundary.
  • Added runtime wiring (DefaultLineageService) and a default placeholder persistence (DisabledLineagePersistence) with tests.
  • Added a new realm feature flag (ENABLE_LINEAGE) and documentation/config reference entries for lineage-related properties.

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
site/content/in-dev/unreleased/configuration/config-sections/smallrye-polaris_lineage.md Adds a new config-section page for polaris.lineage.* properties.
site/content/in-dev/unreleased/configuration/config-sections/flags-polaris_features.md Documents the new ENABLE_LINEAGE realm feature flag.
runtime/service/src/test/java/org/apache/polaris/service/lineage/DisabledLineagePersistenceTest.java Verifies the default disabled persistence throws as expected.
runtime/service/src/test/java/org/apache/polaris/service/lineage/DefaultLineageServiceTest.java Tests enablement gating and delegation behavior in DefaultLineageService.
runtime/service/src/main/java/org/apache/polaris/service/lineage/LineageConfiguration.java Introduces SmallRye config mapping for polaris.lineage.*.
runtime/service/src/main/java/org/apache/polaris/service/lineage/DisabledLineagePersistence.java Provides a default CDI bean implementing LineagePersistence that throws when used.
runtime/service/src/main/java/org/apache/polaris/service/lineage/DefaultLineageService.java Implements LineageService and delegates ingest/query to LineagePersistence when enabled.
polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageService.java Defines a core service boundary for lineage ingest/query operations.
polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageQueryRequest.java Adds a normalized lineage query request model.
polaris-core/src/main/java/org/apache/polaris/core/lineage/LineagePersistence.java Adds the persistence SPI contract for lineage backends.
polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageNodeType.java Adds enum for node kinds in lineage graphs.
polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageNode.java Adds lineage graph node model with optional field mappings.
polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageIngestRequest.java Adds extracted ingest payload model for persistence.
polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageGraph.java Adds normalized lineage graph response model.
polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageGranularity.java Adds enum for dataset vs. column granularity.
polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageFieldReference.java Adds dataset+field reference used for column-level edges.
polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageFieldMapping.java Adds source→target field mapping model for responses.
polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageEdge.java Adds dataset-level edge model.
polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageDirection.java Adds enum for upstream/downstream/both queries.
polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageDataset.java Adds dataset identity model (with optional Polaris entity id linkage).
polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageData.java Adds dataset metadata model returned in lineage responses.
polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageColumnEdge.java Adds column-level edge model.
polaris-core/src/main/java/org/apache/polaris/core/config/FeatureConfiguration.java Adds new FeatureConfiguration<Boolean> ENABLE_LINEAGE flag.

@flyingImer flyingImer left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, the layering is close to the correct arch. See my inline comments below

import java.util.Objects;

/** A field-level lineage relationship between two dataset columns. */
public record LineageColumnEdge(LineageFieldReference source, LineageFieldReference target) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel these data models should belong to extensions/ for SPI placement, tho I am fine to leave under core/, but currently slightly inclined to put under extensions/ with the rationale of lineage is somehow "optional" capability to Polaris being a Iceberg irc spec impl. To keep core/ lean and tight towards that vision, I suggest to place under extensions/

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your feedback!
I agreed with your point, I just moved these data models to extensions/lineage/src/main/java/org/apache/polaris/extensions/lineage/

import org.apache.polaris.core.lineage.LineageService;

@RequestScoped
public class DefaultLineageService implements LineageService {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why there is a default service? IMHO, it should be an default SPI impl under extensions/

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Keeping new features in distinct modules is allows for more flexibility and control in downstream projects and does not limit what Polaris can include by default.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved runtime/service/src/main/java/org/apache/polaris/service/lineage/DefaultLineageService.java to extensions/lineage/src/main/java/org/apache/polaris/extensions/lineage/ and renamed it to DefaultPolarisLineageHandler.java .

package org.apache.polaris.core.lineage;

/** Service boundary for lineage operations used by transport-layer adapters. */
public interface LineageService {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, the "Service" naming is confusing. At a glance, this is the right SPI placement.

2 cents on naming: PolarisLineage or PolarisLineageHandler or the convention. My rationale is that xxxService usually implies runtime, which should be under runtime/

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to PolarisLineageHandler.

* dataset resolution. Persistence backends persist dataset nodes, dataset edges, and column edges,
* and load normalized lineage graphs.
*/
public interface LineagePersistence {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is seems to be durable logic related, I would suggest to name to something similar to MetastoreManager, e.g., LineageStoreManager

given lineage default behavior should be no-op per last discussion, so it definitely does not belong to core/, but extensions/

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to LineageStoreManager and move all the data models from polaris-core/src/main/java/org/apache/polaris/core/lineage/ to extensions/lineage/src/main/java/org/apache/polaris/extensions/lineage/


/** Service boundary for lineage operations used by transport-layer adapters. */
public interface LineageService {
void ingest(LineageIngestRequest request);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is not use in polaris-core ATM. Will this class be used by other polaris-core classes in the future? What is the envisioned use case / call path? If not, I do not see a reason for it to be in polaris-core. I believe it should be in a new SPI module dedicated to the OL feature.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your feedback! I moved all the data models toextensions/lineage/src/main/java/org/apache/polaris/extensions/lineage/.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants