Skip to content

Feat: Add OSI semantic-model API scaffolding#4816

Open
flyrain wants to merge 6 commits into
apache:mainfrom
flyrain:feat/osi-semantic-models
Open

Feat: Add OSI semantic-model API scaffolding#4816
flyrain wants to merge 6 commits into
apache:mainfrom
flyrain:feat/osi-semantic-models

Conversation

@flyrain

@flyrain flyrain commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Wire the Open Semantic Interchange (OSI) semantic-model REST endpoints behind the ENABLE_SEMANTIC_MODELS feature flag (default false until the implementation lands).

  • spec: semantic-models-api.yaml defining create/list/load/update/drop under namespaces, plus catalog-service wiring
  • generated models + API enabled in polaris-catalog-service build
  • feature flag, REST resource paths, and endpoint set

Checklist

  • 🛡️ Don't disclose security issues! (contact security@apache.org)
  • 🔗 Clearly explained why the changes are needed, or linked related issues: Fixes #
  • 🧪 Added/updated tests with good coverage, or manually tested (and explained how)
  • 💡 Added comments for complex logic
  • 🧾 Updated CHANGELOG.md (if needed)
  • 📚 Updated documentation in site/content/in-dev/unreleased (if needed)

Copilot AI review requested due to automatic review settings June 17, 2026 22:21
@github-project-automation github-project-automation Bot moved this to PRs In Progress in Basic Kanban Board Jun 17, 2026
@flyrain flyrain force-pushed the feat/osi-semantic-models branch from d0f7067 to f9b8394 Compare June 17, 2026 22:23
@flyrain flyrain changed the title Feat/osi semantic models Feat: Add OSI semantic-model API scaffolding Jun 17, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Wires the Open Semantic Interchange (OSI) semantic-model REST surface into Polaris behind a new ENABLE_SEMANTIC_MODELS feature flag (default false), including OpenAPI spec additions, endpoint advertisement in getConfig(), and a stub runtime adapter.

Changes:

  • Add OpenAPI paths + component schemas/responses for semantic-model create/list/load/update/drop under namespaces.
  • Introduce ENABLE_SEMANTIC_MODELS feature flag and advertise semantic-model endpoints in the Iceberg getConfig() endpoints list when enabled.
  • Enable OpenAPI generation of SemanticModelApi plus required models, and add a stub SemanticModelCatalogAdapter returning 501 Not Implemented.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
spec/polaris-catalog-service.yaml Adds top-level path refs for semantic-model endpoints into the bundled service spec.
spec/polaris-catalog-apis/semantic-models-api.yaml Defines the semantic-model API paths, schemas, responses, and examples.
runtime/service/src/main/java/org/apache/polaris/service/catalog/semanticmodel/SemanticModelCatalogAdapter.java Adds a request-scoped stub adapter gated by ENABLE_SEMANTIC_MODELS that currently returns 501s.
runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java Includes semantic-model endpoints in getConfig() when enabled.
polaris-core/src/main/java/org/apache/polaris/core/rest/PolarisResourcePaths.java Adds resource path constants for semantic-model endpoints.
polaris-core/src/main/java/org/apache/polaris/core/rest/PolarisEndpoints.java Adds endpoint constants + a helper to expose semantic-model endpoints only when enabled.
polaris-core/src/main/java/org/apache/polaris/core/config/FeatureConfiguration.java Introduces ENABLE_SEMANTIC_MODELS feature flag with default false.
api/polaris-catalog-service/build.gradle.kts Updates OpenAPI generator configuration to generate SemanticModelApi + semantic-model models.

Comment thread spec/polaris-catalog-apis/semantic-models-api.yaml
Comment thread spec/polaris-catalog-apis/semantic-models-api.yaml
Comment thread spec/polaris-catalog-apis/semantic-models-api.yaml
Comment thread spec/polaris-catalog-apis/semantic-models-api.yaml
Comment thread spec/polaris-catalog-apis/semantic-models-api.yaml
.addAll(VIEW_ENDPOINTS)
.addAll(PolarisEndpoints.getSupportedGenericTableEndpoints(realmConfig()))
.addAll(PolarisEndpoints.getSupportedPolicyEndpoints(realmConfig()))
.addAll(PolarisEndpoints.getSupportedSemanticModelEndpoints(realmConfig()))
Comment on lines +376 to +380
PolarisConfiguration.<Boolean>builder()
.key("ENABLE_SEMANTIC_MODELS")
.description("If true, the semantic-model (OSI) endpoints are enabled")
.defaultValue(false) // keep it to false until the implementation is done
.buildFeatureConfiguration();
@flyrain flyrain force-pushed the feat/osi-semantic-models branch from f9b8394 to 8a7f9cc Compare June 17, 2026 23:00
---
paths:

/polaris/v1/{prefix}/namespaces/{namespace}/semantic-models:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who owns this API definition? Polaris?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Polaris owns it. The OSI doesn't specify them.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OSI/Ossie defines the "content/payload", not the API.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this means that OSI/Ossie effectively defines the shape of JSON payloads in this API.

What's is the vision for evolving this Polaris API definition in conjunction with Ossie's evolution?

Are we going to add a new vN here for every Ossie spec version?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the Detailed Design says:

OSI is an evolving specification. Polaris stores the document's declared OSI version (spec_version) and validates the document against the corresponding bundled schema version.
Polaris may support multiple OSI versions simultaneously, allowing users to upgrade Polaris independently of their semantic models. A model written as OSI 0.1.1 remains a 0.1.1 model until explicitly updated by the user.
Documents declaring an unsupported OSI version are rejected with 400 Bad Request. Polaris does not automatically migrate or rewrite semantic models during server upgrades. This follows the same principle as Iceberg format evolution: version upgrades are explicit user actions rather than side effects of infrastructure upgrades.

and

Validation
Schema validation: Writes validate against the bundled OSI JSON Schema. A new OsiDocumentValidator produces BadRequestException with JSON-Pointer field paths on failure. The validator is strict: unknown top-level fields cause a 400. Forward compatibility with newer OSI versions is handled by upgrading Polaris's bundled schema as an explicit, coordinated action, not by silently accepting unvalidated content. Vendor-specific or experimental fields belong in custom_extensions, which the schema does allow.
Source table validation: Validate that every dataset.source resolves to a Polaris TABLE_LIKE entity at write time; writes with unresolved sources fail with 400.
Source table column validation: Not supported in v1.

So, there won't be any change to the Polaris REST APIs when the OSI Spec changes since it's just passing back-and-forth JSON, but there will be some business logic changes whenever OSI is upgraded in Polaris.

.withEndpoints(
ImmutableList.<Endpoint>builder()
.addAll(DEFAULT_ENDPOINTS)
.addAll(VIEW_ENDPOINTS)
.addAll(PolarisEndpoints.getSupportedGenericTableEndpoints(realmConfig()))
.addAll(PolarisEndpoints.getSupportedPolicyEndpoints(realmConfig()))
.addAll(PolarisEndpoints.getSupportedSemanticModelEndpoints(realmConfig()))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would Polaris return "semantic model" endpoints in response to an Iceberg REST Catalog config request?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This follows the conventions of other Polaris specific endpoints like generic table and policy ones.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is an issue to send these endpoints, then we should discuss on the mailing list to keep all or remove all IMO.

@dimas-b dimas-b Jun 19, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following an existing pattern does not automatically make the change correct 🤷

What is the rationale for exposing these new endpoints in the IRC config response?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replied to the thread, a Polaris client is also an IRC client, with more capabilities, which relies on these, e.g., PolarisEndpoints.getSupportedGenericTableEndpoints() to work correctly.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wouldn't block the PR on figuring this out.

While I agree that advertising the other APIs in the IcebergCatalogHandler is unnecessary coupling and I do think that we should improve this by decoupling it, I do see the pattern here and we should take this as a separate discussion.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IRC clients should not need to call non-IRC endpoints.

I'd prefer to avoid overloading the IRC config response. Dealing with existing non-IRC endpoints there can be deferred, of course, but I do not see a reason for adding new non-IRC endpoints there.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flyrain :

Replied to the thread [....]

Please see the latest proposal from @adutra in that thread.

From my POV, following the dev discussion, exposing OSI endpoints in the IRC config is acceptable.

However, runtime/service should not have hard dependencies on the OSI REST API implementation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to a new module, please a look

* subsequent phases.
*/
@RequestScoped
public class SemanticModelCatalogAdapter

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have consensus that the Semantic Model API is included into the Polaris service by default (which is the case in this PR)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR and dev mailing list thread(https://lists.apache.org/thread/f30wyywz0gtt72troct3849vbc4s4xyt) are part of consensus building. Feel free to chime in.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sent a dev email about this: https://lists.apache.org/thread/hfwt1vt7505w5ysqs7sm293fqzoob7xm

I believe OSI code should follow the modularization pattern established by #4115

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, @dimas-b , just getting completely clear about your feedback. You are asking:

  1. Move the SemanticModelCatalogAdapter class to a separate Gradle module. I assume you are advocating for extensions/semanticmodels.
  2. Register this new module in runtime/server/build.gradle.kts with something like runtimeOnly(project(":polaris-extensions-semantic-models"))

@dimas-b dimas-b Jun 23, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1: From my POV, all current changes in runtime/service should be in a new gradle module.

Additionally, changes in IcebergCatalogHandler (endpoints) should be deferred and the rationale for them discussed on dev.

2: yes, runtime/server is to have a runtime dep on the new module.

I'll post a dev email about this soon.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. @flyrain - You good with moving the adapter to the new module and having the runtime/server have a runtime dep on the new module?

For the IcebergCatalogHandler, I think we have that discussion in the separate thread.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made the change per suggestion, please take a look

.withEndpoints(
ImmutableList.<Endpoint>builder()
.addAll(DEFAULT_ENDPOINTS)
.addAll(VIEW_ENDPOINTS)
.addAll(PolarisEndpoints.getSupportedGenericTableEndpoints(realmConfig()))
.addAll(PolarisEndpoints.getSupportedPolicyEndpoints(realmConfig()))
.addAll(PolarisEndpoints.getSupportedSemanticModelEndpoints(realmConfig()))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is an issue to send these endpoints, then we should discuss on the mailing list to keep all or remove all IMO.

schema:
$ref: '#/components/schemas/CreateSemanticModelRequest'
responses:
200:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this potentially be 201 rather than 200?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good point. I was using 200 here to be consistent with the Policy/Iceberg convention in Polaris. Let me know if u strongly feel about it. We can discuss more.

content:
application/json:
schema:
$ref: '#/components/schemas/LoadSemanticModelResponse'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a required property?

@flyrain flyrain Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked the IRC and other response. Looks like we never make it required, e.g.,

@jbonofre jbonofre self-requested a review June 19, 2026 09:02
version:
type: string
description: The OSI spec version.
example: "0.1.1"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should go directly to 0.2.0.dev as we are close to release 0.2.0.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, I think using 0.2.0 makes sense. This is important when we introduce the validator. Here is just an example for the version string. Do you prefer a string like 0.2.0.dev? I can make the change if that's so.

type: string
description: The OSI semantic model serialized as a JSON string.
example:
version: "0.1.1"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should go directly to 0.2.0.dev as we are close to release 0.2.0.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as #4816 (comment)

example: "0.1.1"
semantic_model:
type: string
description: The OSI semantic model serialized as a JSON string.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the value of including JSON in JSON as a string? Why not use a free-form JSON object?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The service validator has to parse it anyways, I don't think a string is inappropriate. Although, I don't have any preference. Should I change it to object?

RealmContext realmContext,
SecurityContext securityContext) {
ensureEnabled();
return notImplemented();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate that you are showing us how you are going to be moving forward with building this feature, however I would appreciate TODOs here. While the code does a fantastic job at documenting what the code does, it does not do a good enough job (in this case) for why it is doing that.

Could you add TODOs to mark that you will be following up here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, added per-operation TODOs marking the follow-up work (auth, OSI-schema validation, and persistence) and why each returns 501 for now. Done in the new commit.

Comment thread spec/polaris-catalog-apis/semantic-models-api.yaml
document:
$ref: '#/components/schemas/SemanticModelDocument'

UpdateSemanticModelRequest:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we planning to handle to concurrent updates ?
what if we expose a entityVersion in this response to achieve OCC ? in this way polaris client would just expect to sent this entityVersion and CAS only succeeds when its eq to version sent ?

much like what Glue did for supporting non irc spec impl

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. Exposing the entity version makes sense, I was trying to introduce versioning a bit later. Given it's critical for OCC, let's add it now.

Wire the Open Semantic Interchange (OSI) semantic-model REST endpoints
behind the ENABLE_SEMANTIC_MODELS feature flag (default false until the
implementation lands).

- spec: semantic-models-api.yaml defining create/list/load/update/drop
  under namespaces, plus catalog-service wiring
- generated models + API enabled in polaris-catalog-service build
- feature flag, REST resource paths, and endpoint set
- endpoints advertised in catalog config only when the flag is enabled
- SemanticModelCatalogAdapter stub: feature-gated, returns 501 for every
  operation pending persistence/validation

The document body carries an OSI `version` and the `semantic_model`
serialized as a JSON string, stored verbatim. Update is last-writer-wins.
@flyrain flyrain force-pushed the feat/osi-semantic-models branch from 8a7f9cc to 67d53ea Compare June 24, 2026 20:59
flyrain added 5 commits June 24, 2026 14:04
Expose entity-version on create/load/update responses and accept an
optional current-version on update for compare-and-swap, mirroring the
Policy API's OCC pattern.

- LoadSemanticModelResponse gains a required entity-version (0 after
  create, +1 per successful update)
- UpdateSemanticModelRequest gains an optional current-version; when set,
  the update succeeds only if it matches the stored version, else 409
- restore the 409 SemanticModelVersionMismatch response/example
- update adapter TODOs to reflect CAS semantics

Addresses PR review feedback (entityVersion-based OCC).
- use a single consistent name `entity-version` on both the update request
  and the load/create/update responses (was current-version vs entity-version)
- make entity-version required on update: updates always use optimistic
  concurrency (compare-and-swap), no last-writer-wins fallback
- restore the 409 SemanticModelVersionMismatch example, matching the create
  409 and the Policy API convention
- drop the repeated "returns 501 until persistence lands" clause from the
  per-operation TODOs; the class javadoc already states it

Addresses PR review feedback on update concurrency semantics.
Change entity-version from integer to an opaque string token on both the
update request and the load/create/update responses. Clients treat it as a
token to echo back for optimistic concurrency, not a number to interpret or
order. This hides the versioning scheme so the server can back it with an
entity counter, content hash, or storage-native revision id without changing
the API contract.

Drops the integer-specific "0 after create / +1 per update" prose from the
operation and field descriptions.
Relocate SemanticModelCatalogAdapter from runtime/service into a new opt-in
Gradle module, extensions/semantic-models/impl
(:polaris-extensions-semantic-models), modeled on the existing auth/federation
extensions. runtime/server depends on it via runtimeOnly, so the OSI adapter is
discovered by CDI at assembly time without runtime/service taking a hard
dependency on the implementation.

Addresses PR review feedback (keep the OSI REST API impl out of runtime/service
so downstreams aren't forced to bundle it).

@singhpk234 singhpk234 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @flyrain !

entity-version:
type: string
description: |
An opaque string identifying the version the client last read for this semantic model. The update applies

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor : do we need to say its UTF8 and restrict the size ?

type: array
uniqueItems: true
items:
$ref: '#/components/schemas/SemanticModelIdentifier'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder if we should send entity-version too ? we would be having that too ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants