OpenLineage: add Persistence Layer SPI#4826
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a new lineage persistence SPI in polaris-core and adds a runtime-service implementation that gates lineage operations behind config + a realm feature flag, with a default “disabled” persistence implementation and unit tests.
Changes:
- Added core lineage request/response models plus a
LineagePersistenceSPI andLineageServiceboundary. - Added runtime wiring (
DefaultLineageService) and a default placeholder persistence (DisabledLineagePersistence) with tests. - Added a new realm feature flag (
ENABLE_LINEAGE) and documentation/config reference entries for lineage-related properties.
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| site/content/in-dev/unreleased/configuration/config-sections/smallrye-polaris_lineage.md | Adds a new config-section page for polaris.lineage.* properties. |
| site/content/in-dev/unreleased/configuration/config-sections/flags-polaris_features.md | Documents the new ENABLE_LINEAGE realm feature flag. |
| runtime/service/src/test/java/org/apache/polaris/service/lineage/DisabledLineagePersistenceTest.java | Verifies the default disabled persistence throws as expected. |
| runtime/service/src/test/java/org/apache/polaris/service/lineage/DefaultLineageServiceTest.java | Tests enablement gating and delegation behavior in DefaultLineageService. |
| runtime/service/src/main/java/org/apache/polaris/service/lineage/LineageConfiguration.java | Introduces SmallRye config mapping for polaris.lineage.*. |
| runtime/service/src/main/java/org/apache/polaris/service/lineage/DisabledLineagePersistence.java | Provides a default CDI bean implementing LineagePersistence that throws when used. |
| runtime/service/src/main/java/org/apache/polaris/service/lineage/DefaultLineageService.java | Implements LineageService and delegates ingest/query to LineagePersistence when enabled. |
| polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageService.java | Defines a core service boundary for lineage ingest/query operations. |
| polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageQueryRequest.java | Adds a normalized lineage query request model. |
| polaris-core/src/main/java/org/apache/polaris/core/lineage/LineagePersistence.java | Adds the persistence SPI contract for lineage backends. |
| polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageNodeType.java | Adds enum for node kinds in lineage graphs. |
| polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageNode.java | Adds lineage graph node model with optional field mappings. |
| polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageIngestRequest.java | Adds extracted ingest payload model for persistence. |
| polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageGraph.java | Adds normalized lineage graph response model. |
| polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageGranularity.java | Adds enum for dataset vs. column granularity. |
| polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageFieldReference.java | Adds dataset+field reference used for column-level edges. |
| polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageFieldMapping.java | Adds source→target field mapping model for responses. |
| polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageEdge.java | Adds dataset-level edge model. |
| polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageDirection.java | Adds enum for upstream/downstream/both queries. |
| polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageDataset.java | Adds dataset identity model (with optional Polaris entity id linkage). |
| polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageData.java | Adds dataset metadata model returned in lineage responses. |
| polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageColumnEdge.java | Adds column-level edge model. |
| polaris-core/src/main/java/org/apache/polaris/core/config/FeatureConfiguration.java | Adds new FeatureConfiguration<Boolean> ENABLE_LINEAGE flag. |
flyingImer
left a comment
There was a problem hiding this comment.
IMHO, the layering is close to the correct arch. See my inline comments below
| import java.util.Objects; | ||
|
|
||
| /** A field-level lineage relationship between two dataset columns. */ | ||
| public record LineageColumnEdge(LineageFieldReference source, LineageFieldReference target) { |
There was a problem hiding this comment.
I feel these data models should belong to extensions/ for SPI placement, tho I am fine to leave under core/, but currently slightly inclined to put under extensions/ with the rationale of lineage is somehow "optional" capability to Polaris being a Iceberg irc spec impl. To keep core/ lean and tight towards that vision, I suggest to place under extensions/
There was a problem hiding this comment.
Thanks for your feedback!
I agreed with your point, I just moved these data models to extensions/lineage/src/main/java/org/apache/polaris/extensions/lineage/
| import org.apache.polaris.core.lineage.LineageService; | ||
|
|
||
| @RequestScoped | ||
| public class DefaultLineageService implements LineageService { |
There was a problem hiding this comment.
why there is a default service? IMHO, it should be an default SPI impl under extensions/
There was a problem hiding this comment.
I agree. Keeping new features in distinct modules is allows for more flexibility and control in downstream projects and does not limit what Polaris can include by default.
There was a problem hiding this comment.
Moved runtime/service/src/main/java/org/apache/polaris/service/lineage/DefaultLineageService.java to extensions/lineage/src/main/java/org/apache/polaris/extensions/lineage/ and renamed it to DefaultPolarisLineageHandler.java .
| package org.apache.polaris.core.lineage; | ||
|
|
||
| /** Service boundary for lineage operations used by transport-layer adapters. */ | ||
| public interface LineageService { |
There was a problem hiding this comment.
I see, the "Service" naming is confusing. At a glance, this is the right SPI placement.
2 cents on naming: PolarisLineage or PolarisLineageHandler or the convention. My rationale is that xxxService usually implies runtime, which should be under runtime/
There was a problem hiding this comment.
Change to PolarisLineageHandler.
| * dataset resolution. Persistence backends persist dataset nodes, dataset edges, and column edges, | ||
| * and load normalized lineage graphs. | ||
| */ | ||
| public interface LineagePersistence { |
There was a problem hiding this comment.
This is seems to be durable logic related, I would suggest to name to something similar to MetastoreManager, e.g., LineageStoreManager
given lineage default behavior should be no-op per last discussion, so it definitely does not belong to core/, but extensions/
There was a problem hiding this comment.
Change to LineageStoreManager and move all the data models from polaris-core/src/main/java/org/apache/polaris/core/lineage/ to extensions/lineage/src/main/java/org/apache/polaris/extensions/lineage/
|
|
||
| /** Service boundary for lineage operations used by transport-layer adapters. */ | ||
| public interface LineageService { | ||
| void ingest(LineageIngestRequest request); |
There was a problem hiding this comment.
This class is not use in polaris-core ATM. Will this class be used by other polaris-core classes in the future? What is the envisioned use case / call path? If not, I do not see a reason for it to be in polaris-core. I believe it should be in a new SPI module dedicated to the OL feature.
There was a problem hiding this comment.
Thanks for your feedback! I moved all the data models toextensions/lineage/src/main/java/org/apache/polaris/extensions/lineage/.
Description
This PR adds the persistence lineage contract and runtime delegation needed for future OpenLineage storage implementations.
The link for Persistence Contract.
It defines the dataset, dataset-edge, and column-edge payloads used by persistence backends, adds a LineagePersistence SPI, and wires lineage ingest/query calls through the runtime service while keeping lineage disabled unless explicitly configured.
Changes
Add persistence-oriented lineage records:
Add LineagePersistence with methods for:
Add DisabledLineagePersistence as the default placeholder implementation.
Wire DefaultLineageService to delegate ingest/query operations to LineagePersistence.
Add unit tests for lineage persistence and service delegation.
Checklist
CHANGELOG.md(if needed)site/content/in-dev/unreleased(if needed)