Feat: Add OSI semantic-model API scaffolding#4816
Conversation
d0f7067 to
f9b8394
Compare
There was a problem hiding this comment.
Pull request overview
Wires the Open Semantic Interchange (OSI) semantic-model REST surface into Polaris behind a new ENABLE_SEMANTIC_MODELS feature flag (default false), including OpenAPI spec additions, endpoint advertisement in getConfig(), and a stub runtime adapter.
Changes:
- Add OpenAPI paths + component schemas/responses for semantic-model create/list/load/update/drop under namespaces.
- Introduce
ENABLE_SEMANTIC_MODELSfeature flag and advertise semantic-model endpoints in the IceberggetConfig()endpoints list when enabled. - Enable OpenAPI generation of
SemanticModelApiplus required models, and add a stubSemanticModelCatalogAdapterreturning501 Not Implemented.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| spec/polaris-catalog-service.yaml | Adds top-level path refs for semantic-model endpoints into the bundled service spec. |
| spec/polaris-catalog-apis/semantic-models-api.yaml | Defines the semantic-model API paths, schemas, responses, and examples. |
| runtime/service/src/main/java/org/apache/polaris/service/catalog/semanticmodel/SemanticModelCatalogAdapter.java | Adds a request-scoped stub adapter gated by ENABLE_SEMANTIC_MODELS that currently returns 501s. |
| runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java | Includes semantic-model endpoints in getConfig() when enabled. |
| polaris-core/src/main/java/org/apache/polaris/core/rest/PolarisResourcePaths.java | Adds resource path constants for semantic-model endpoints. |
| polaris-core/src/main/java/org/apache/polaris/core/rest/PolarisEndpoints.java | Adds endpoint constants + a helper to expose semantic-model endpoints only when enabled. |
| polaris-core/src/main/java/org/apache/polaris/core/config/FeatureConfiguration.java | Introduces ENABLE_SEMANTIC_MODELS feature flag with default false. |
| api/polaris-catalog-service/build.gradle.kts | Updates OpenAPI generator configuration to generate SemanticModelApi + semantic-model models. |
| .addAll(VIEW_ENDPOINTS) | ||
| .addAll(PolarisEndpoints.getSupportedGenericTableEndpoints(realmConfig())) | ||
| .addAll(PolarisEndpoints.getSupportedPolicyEndpoints(realmConfig())) | ||
| .addAll(PolarisEndpoints.getSupportedSemanticModelEndpoints(realmConfig())) |
| PolarisConfiguration.<Boolean>builder() | ||
| .key("ENABLE_SEMANTIC_MODELS") | ||
| .description("If true, the semantic-model (OSI) endpoints are enabled") | ||
| .defaultValue(false) // keep it to false until the implementation is done | ||
| .buildFeatureConfiguration(); |
f9b8394 to
8a7f9cc
Compare
| --- | ||
| paths: | ||
|
|
||
| /polaris/v1/{prefix}/namespaces/{namespace}/semantic-models: |
There was a problem hiding this comment.
Who owns this API definition? Polaris?
There was a problem hiding this comment.
Polaris owns it. The OSI doesn't specify them.
There was a problem hiding this comment.
OSI/Ossie defines the "content/payload", not the API.
There was a problem hiding this comment.
Well, this means that OSI/Ossie effectively defines the shape of JSON payloads in this API.
What's is the vision for evolving this Polaris API definition in conjunction with Ossie's evolution?
Are we going to add a new vN here for every Ossie spec version?
There was a problem hiding this comment.
So, the Detailed Design says:
OSI is an evolving specification. Polaris stores the document's declared OSI version (spec_version) and validates the document against the corresponding bundled schema version.
Polaris may support multiple OSI versions simultaneously, allowing users to upgrade Polaris independently of their semantic models. A model written as OSI 0.1.1 remains a 0.1.1 model until explicitly updated by the user.
Documents declaring an unsupported OSI version are rejected with 400 Bad Request. Polaris does not automatically migrate or rewrite semantic models during server upgrades. This follows the same principle as Iceberg format evolution: version upgrades are explicit user actions rather than side effects of infrastructure upgrades.
and
Validation
Schema validation: Writes validate against the bundled OSI JSON Schema. A new OsiDocumentValidator produces BadRequestException with JSON-Pointer field paths on failure. The validator is strict: unknown top-level fields cause a 400. Forward compatibility with newer OSI versions is handled by upgrading Polaris's bundled schema as an explicit, coordinated action, not by silently accepting unvalidated content. Vendor-specific or experimental fields belong in custom_extensions, which the schema does allow.
Source table validation: Validate that every dataset.source resolves to a Polaris TABLE_LIKE entity at write time; writes with unresolved sources fail with 400.
Source table column validation: Not supported in v1.
So, there won't be any change to the Polaris REST APIs when the OSI Spec changes since it's just passing back-and-forth JSON, but there will be some business logic changes whenever OSI is upgraded in Polaris.
| .withEndpoints( | ||
| ImmutableList.<Endpoint>builder() | ||
| .addAll(DEFAULT_ENDPOINTS) | ||
| .addAll(VIEW_ENDPOINTS) | ||
| .addAll(PolarisEndpoints.getSupportedGenericTableEndpoints(realmConfig())) | ||
| .addAll(PolarisEndpoints.getSupportedPolicyEndpoints(realmConfig())) | ||
| .addAll(PolarisEndpoints.getSupportedSemanticModelEndpoints(realmConfig())) |
There was a problem hiding this comment.
Why would Polaris return "semantic model" endpoints in response to an Iceberg REST Catalog config request?
There was a problem hiding this comment.
This follows the conventions of other Polaris specific endpoints like generic table and policy ones.
There was a problem hiding this comment.
If this is an issue to send these endpoints, then we should discuss on the mailing list to keep all or remove all IMO.
There was a problem hiding this comment.
Following an existing pattern does not automatically make the change correct 🤷
What is the rationale for exposing these new endpoints in the IRC config response?
There was a problem hiding this comment.
dev discussion: https://lists.apache.org/thread/bov6nhryqjmtrxw5nxhss0o0c4ntn6o0
There was a problem hiding this comment.
Replied to the thread, a Polaris client is also an IRC client, with more capabilities, which relies on these, e.g., PolarisEndpoints.getSupportedGenericTableEndpoints() to work correctly.
There was a problem hiding this comment.
Yeah, I wouldn't block the PR on figuring this out.
While I agree that advertising the other APIs in the IcebergCatalogHandler is unnecessary coupling and I do think that we should improve this by decoupling it, I do see the pattern here and we should take this as a separate discussion.
There was a problem hiding this comment.
IRC clients should not need to call non-IRC endpoints.
I'd prefer to avoid overloading the IRC config response. Dealing with existing non-IRC endpoints there can be deferred, of course, but I do not see a reason for adding new non-IRC endpoints there.
There was a problem hiding this comment.
There was a problem hiding this comment.
moved to a new module, please a look
| * subsequent phases. | ||
| */ | ||
| @RequestScoped | ||
| public class SemanticModelCatalogAdapter |
There was a problem hiding this comment.
Do we have consensus that the Semantic Model API is included into the Polaris service by default (which is the case in this PR)?
There was a problem hiding this comment.
This PR and dev mailing list thread(https://lists.apache.org/thread/f30wyywz0gtt72troct3849vbc4s4xyt) are part of consensus building. Feel free to chime in.
There was a problem hiding this comment.
I sent a dev email about this: https://lists.apache.org/thread/hfwt1vt7505w5ysqs7sm293fqzoob7xm
I believe OSI code should follow the modularization pattern established by #4115
There was a problem hiding this comment.
So, @dimas-b , just getting completely clear about your feedback. You are asking:
- Move the SemanticModelCatalogAdapter class to a separate Gradle module. I assume you are advocating for extensions/semanticmodels.
- Register this new module in runtime/server/build.gradle.kts with something like runtimeOnly(project(":polaris-extensions-semantic-models"))
There was a problem hiding this comment.
1: From my POV, all current changes in runtime/service should be in a new gradle module.
Additionally, changes in IcebergCatalogHandler (endpoints) should be deferred and the rationale for them discussed on dev.
2: yes, runtime/server is to have a runtime dep on the new module.
I'll post a dev email about this soon.
There was a problem hiding this comment.
Cool. @flyrain - You good with moving the adapter to the new module and having the runtime/server have a runtime dep on the new module?
For the IcebergCatalogHandler, I think we have that discussion in the separate thread.
There was a problem hiding this comment.
made the change per suggestion, please take a look
| .withEndpoints( | ||
| ImmutableList.<Endpoint>builder() | ||
| .addAll(DEFAULT_ENDPOINTS) | ||
| .addAll(VIEW_ENDPOINTS) | ||
| .addAll(PolarisEndpoints.getSupportedGenericTableEndpoints(realmConfig())) | ||
| .addAll(PolarisEndpoints.getSupportedPolicyEndpoints(realmConfig())) | ||
| .addAll(PolarisEndpoints.getSupportedSemanticModelEndpoints(realmConfig())) |
There was a problem hiding this comment.
If this is an issue to send these endpoints, then we should discuss on the mailing list to keep all or remove all IMO.
| schema: | ||
| $ref: '#/components/schemas/CreateSemanticModelRequest' | ||
| responses: | ||
| 200: |
There was a problem hiding this comment.
Should this potentially be 201 rather than 200?
There was a problem hiding this comment.
It's a good point. I was using 200 here to be consistent with the Policy/Iceberg convention in Polaris. Let me know if u strongly feel about it. We can discuss more.
| content: | ||
| application/json: | ||
| schema: | ||
| $ref: '#/components/schemas/LoadSemanticModelResponse' |
There was a problem hiding this comment.
Should this be a required property?
There was a problem hiding this comment.
Checked the IRC and other response. Looks like we never make it required, e.g.,
polaris/spec/iceberg-rest-catalog-open-api.yaml
Line 5046 in d654309
| version: | ||
| type: string | ||
| description: The OSI spec version. | ||
| example: "0.1.1" |
There was a problem hiding this comment.
Maybe we should go directly to 0.2.0.dev as we are close to release 0.2.0.
There was a problem hiding this comment.
That's a good point, I think using 0.2.0 makes sense. This is important when we introduce the validator. Here is just an example for the version string. Do you prefer a string like 0.2.0.dev? I can make the change if that's so.
| type: string | ||
| description: The OSI semantic model serialized as a JSON string. | ||
| example: | ||
| version: "0.1.1" |
There was a problem hiding this comment.
Maybe we should go directly to 0.2.0.dev as we are close to release 0.2.0.
| example: "0.1.1" | ||
| semantic_model: | ||
| type: string | ||
| description: The OSI semantic model serialized as a JSON string. |
There was a problem hiding this comment.
What is the value of including JSON in JSON as a string? Why not use a free-form JSON object?
There was a problem hiding this comment.
The service validator has to parse it anyways, I don't think a string is inappropriate. Although, I don't have any preference. Should I change it to object?
| RealmContext realmContext, | ||
| SecurityContext securityContext) { | ||
| ensureEnabled(); | ||
| return notImplemented(); |
There was a problem hiding this comment.
I appreciate that you are showing us how you are going to be moving forward with building this feature, however I would appreciate TODOs here. While the code does a fantastic job at documenting what the code does, it does not do a good enough job (in this case) for why it is doing that.
Could you add TODOs to mark that you will be following up here?
There was a problem hiding this comment.
Good call, added per-operation TODOs marking the follow-up work (auth, OSI-schema validation, and persistence) and why each returns 501 for now. Done in the new commit.
| document: | ||
| $ref: '#/components/schemas/SemanticModelDocument' | ||
|
|
||
| UpdateSemanticModelRequest: |
There was a problem hiding this comment.
How are we planning to handle to concurrent updates ?
what if we expose a entityVersion in this response to achieve OCC ? in this way polaris client would just expect to sent this entityVersion and CAS only succeeds when its eq to version sent ?
much like what Glue did for supporting non irc spec impl
There was a problem hiding this comment.
That's a good point. Exposing the entity version makes sense, I was trying to introduce versioning a bit later. Given it's critical for OCC, let's add it now.
Wire the Open Semantic Interchange (OSI) semantic-model REST endpoints behind the ENABLE_SEMANTIC_MODELS feature flag (default false until the implementation lands). - spec: semantic-models-api.yaml defining create/list/load/update/drop under namespaces, plus catalog-service wiring - generated models + API enabled in polaris-catalog-service build - feature flag, REST resource paths, and endpoint set - endpoints advertised in catalog config only when the flag is enabled - SemanticModelCatalogAdapter stub: feature-gated, returns 501 for every operation pending persistence/validation The document body carries an OSI `version` and the `semantic_model` serialized as a JSON string, stored verbatim. Update is last-writer-wins.
8a7f9cc to
67d53ea
Compare
Expose entity-version on create/load/update responses and accept an optional current-version on update for compare-and-swap, mirroring the Policy API's OCC pattern. - LoadSemanticModelResponse gains a required entity-version (0 after create, +1 per successful update) - UpdateSemanticModelRequest gains an optional current-version; when set, the update succeeds only if it matches the stored version, else 409 - restore the 409 SemanticModelVersionMismatch response/example - update adapter TODOs to reflect CAS semantics Addresses PR review feedback (entityVersion-based OCC).
- use a single consistent name `entity-version` on both the update request and the load/create/update responses (was current-version vs entity-version) - make entity-version required on update: updates always use optimistic concurrency (compare-and-swap), no last-writer-wins fallback - restore the 409 SemanticModelVersionMismatch example, matching the create 409 and the Policy API convention - drop the repeated "returns 501 until persistence lands" clause from the per-operation TODOs; the class javadoc already states it Addresses PR review feedback on update concurrency semantics.
Change entity-version from integer to an opaque string token on both the update request and the load/create/update responses. Clients treat it as a token to echo back for optimistic concurrency, not a number to interpret or order. This hides the versioning scheme so the server can back it with an entity counter, content hash, or storage-native revision id without changing the API contract. Drops the integer-specific "0 after create / +1 per update" prose from the operation and field descriptions.
Relocate SemanticModelCatalogAdapter from runtime/service into a new opt-in Gradle module, extensions/semantic-models/impl (:polaris-extensions-semantic-models), modeled on the existing auth/federation extensions. runtime/server depends on it via runtimeOnly, so the OSI adapter is discovered by CDI at assembly time without runtime/service taking a hard dependency on the implementation. Addresses PR review feedback (keep the OSI REST API impl out of runtime/service so downstreams aren't forced to bundle it).
# Conflicts: # gradle/projects.main.properties
singhpk234
left a comment
There was a problem hiding this comment.
LGTM, thanks @flyrain !
| entity-version: | ||
| type: string | ||
| description: | | ||
| An opaque string identifying the version the client last read for this semantic model. The update applies |
There was a problem hiding this comment.
minor : do we need to say its UTF8 and restrict the size ?
| type: array | ||
| uniqueItems: true | ||
| items: | ||
| $ref: '#/components/schemas/SemanticModelIdentifier' |
There was a problem hiding this comment.
wonder if we should send entity-version too ? we would be having that too ?
Wire the Open Semantic Interchange (OSI) semantic-model REST endpoints behind the ENABLE_SEMANTIC_MODELS feature flag (default false until the implementation lands).
Checklist
CHANGELOG.md(if needed)site/content/in-dev/unreleased(if needed)