Skip to content

(feat) Implement metrics rest api#4115

Open
obelix74 wants to merge 40 commits into
apache:mainfrom
obelix74:implement_metrics_rest_api
Open

(feat) Implement metrics rest api#4115
obelix74 wants to merge 40 commits into
apache:mainfrom
obelix74:implement_metrics_rest_api

Conversation

@obelix74

@obelix74 obelix74 commented Apr 2, 2026

Copy link
Copy Markdown
Contributor

This is an implementation of the proposal in #4010. This uses the stable envelope design for the REST API instead of a flattened structure.

Checklist

  • 🛡️ Don't disclose security issues! (contact security@apache.org)
  • 🔗 Clearly explained why the changes are needed, or linked related issues: Fixes #
  • 🧪 Added/updated tests with good coverage, or manually tested (and explained how)
  • 💡 Added comments for complex logic
  • 🧾 Updated CHANGELOG.md (if needed)
  • 📚 Updated documentation in site/content/in-dev/unreleased (if needed)

@obelix74 obelix74 force-pushed the implement_metrics_rest_api branch from 1d37c50 to ac66c27 Compare April 10, 2026 14:26

@dimas-b dimas-b left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing this feature forward, @obelix74 !

The PR LGTM in general. Posting some comments about code organization, subject to discussion, of course.

Comment thread spec/metrics-reports-service.yml Outdated
Comment thread spec/metrics-reports-service.yml Outdated
Comment thread spec/metrics-reports-service.yml Outdated
Comment thread spec/metrics-reports-service.yml Outdated
Comment thread runtime/service/build.gradle.kts Outdated

@dimas-b dimas-b left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one minor remaining comment 😅

@sungwy , @sneethiraj : FYI about the new AuthZ operation.

Comment thread gradle/projects.main.properties Outdated
Comment thread CHANGELOG.md Outdated
Comment thread spec/metrics-reports-service.yml Outdated
@obelix74 obelix74 requested a review from dimas-b April 16, 2026 16:08

@flyingImer flyingImer left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The direction looks right to me

Two structural observations:

  • With this PR, MetricsPersistence grows from 2 write methods to 4 (read + write). It's marked @beta and the javadoc calls it a "Service Provider Interface." But it lives on BasePersistence, which only local DB backends implement. NoSqlMetaStoreManager and RemotePolarisMetaStoreManager go through empty BasePersistence implementations, so these methods are permanently no-op for them. Meanwhile, the actual SPI interfaces (PolarisMetricsReporter, PolarisMetricsManager) have no annotation at all. The @beta signal is on the wrong layer IIUC

  • The write path enters through PolarisMetricsManager on MetaStoreManager, but this read path bypasses that layer and goes straight to BasePersistence via callContext.getMetaStore(). If we want the metrics read API to work for non-JDBC backends, it would need a MetaStoreManager-level entry point, same as writes.

Not blocking on this. I think the question of where metrics persistence should sit architecturally is worth a discussion on dev@.

@obelix74

Copy link
Copy Markdown
Contributor Author

The direction looks right to me

Two structural observations:

  • With this PR, MetricsPersistence grows from 2 write methods to 4 (read + write). It's marked @beta and the javadoc calls it a "Service Provider Interface." But it lives on BasePersistence, which only local DB backends implement. NoSqlMetaStoreManager and RemotePolarisMetaStoreManager go through empty BasePersistence implementations, so these methods are permanently no-op for them. Meanwhile, the actual SPI interfaces (PolarisMetricsReporter, PolarisMetricsManager) have no annotation at all. The @beta signal is on the wrong layer IIUC
  • The write path enters through PolarisMetricsManager on MetaStoreManager, but this read path bypasses that layer and goes straight to BasePersistence via callContext.getMetaStore(). If we want the metrics read API to work for non-JDBC backends, it would need a MetaStoreManager-level entry point, same as writes.

Not blocking on this. I think the question of where metrics persistence should sit architecturally is worth a discussion on dev@.

Thank you. I have added @Beta annotation to PolarisMetricsManager and PolarisMetricsReporter.

About the second point, thank you. Would this mean a read method to PolarisMetricsManager and MetaStoreManager mirroring the write path? Should I do it in this PR or can this wait?

@flyingImer

Copy link
Copy Markdown
Collaborator

The direction looks right to me
Two structural observations:

  • With this PR, MetricsPersistence grows from 2 write methods to 4 (read + write). It's marked @beta and the javadoc calls it a "Service Provider Interface." But it lives on BasePersistence, which only local DB backends implement. NoSqlMetaStoreManager and RemotePolarisMetaStoreManager go through empty BasePersistence implementations, so these methods are permanently no-op for them. Meanwhile, the actual SPI interfaces (PolarisMetricsReporter, PolarisMetricsManager) have no annotation at all. The @beta signal is on the wrong layer IIUC
  • The write path enters through PolarisMetricsManager on MetaStoreManager, but this read path bypasses that layer and goes straight to BasePersistence via callContext.getMetaStore(). If we want the metrics read API to work for non-JDBC backends, it would need a MetaStoreManager-level entry point, same as writes.

Not blocking on this. I think the question of where metrics persistence should sit architecturally is worth a discussion on dev@.

Thank you. I have added @Beta annotation to PolarisMetricsManager and PolarisMetricsReporter.

About the second point, thank you. Would this mean a read method to PolarisMetricsManager and MetaStoreManager mirroring the write path? Should I do it in this PR or can this wait?

Thanks for adding @beta.

Reads should go through MetaStoreManager too, same as writes. If reads stay on BasePersistence, non-JDBC backends can't implement the read API at all. I'd prefer fixing that in this PR so the read path ships with the same layering as writes.

Separately, the persistence schema discussion on dev@ is still open. A follow-up issue linking to that thread would help track it.

@obelix74

Copy link
Copy Markdown
Contributor Author

Reads should go through MetaStoreManager too, same as writes. If reads stay on BasePersistence, non-JDBC backends can't implement the read API at all. I'd prefer fixing that in this PR so the read path ships with the same layering as writes.

Separately, the persistence schema discussion on dev@ is still open. A follow-up issue linking to that thread would help track it.

Pushed a commit (and rebased against updated main). listScanMetrics and listCommitMetrics are now on PolarisMetricsManager (and therefore MetaStoreManager), following the same pattern as the write methods. MetricsReportsService now injects PolarisMetaStoreManager and routes reads through it rather than calling callContext.getMetaStore() directly.

For the persistence schema discussion — I'll open a follow-up issue linking to the dev@ thread once there's a message to reference. Happy to do that now if you can share the thread link (I don't have it handy). Please let me know.

@obelix74 obelix74 requested a review from flyingImer April 28, 2026 18:00

@dimas-b dimas-b left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metrics API story LGTM 👍

I still have more general concerns related to SPI design and service wiring, but they are not specific to this feature.

Thanks for working on this @obelix74 !

@github-project-automation github-project-automation Bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board May 15, 2026

@flyingImer flyingImer left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning to merge as-is now that Dmitri approved, or is there another round? Asking because the May 7 metrics sync landed on a few directional items that touch the schema and SPI shape here. Left some questions inline.

@obelix74 obelix74 requested a review from dimas-b May 19, 2026 15:26
dimas-b
dimas-b previously approved these changes May 19, 2026
@dimas-b

dimas-b commented May 26, 2026

Copy link
Copy Markdown
Contributor

@obelix74 : it looks like this PR got a lot of conflicts 🤷

@obelix74

Copy link
Copy Markdown
Contributor Author

@obelix74 : it looks like this PR got a lot of conflicts 🤷

Resolved all conflicts and push. Rebased against main.

dimas-b
dimas-b previously approved these changes May 26, 2026

@flyingImer flyingImer left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for continuing to push this forward. The direction still looks good to me: exposing persisted metrics through a beta read API, using a stable response envelope, keeping the API in an extension module, and routing reads through table-scoped authz all make sense.

One thing I would still like to clarify before this merges is sequencing with the metrics SPI/schema work we discussed after the May 7 sync.

From the previous review thread, I think we already converged on a few points:

  • the current MetricsPersistence / PolarisMetricsManager layering is transitional, and #4397 is expected to move metrics persistence out of the old aggregated BasePersistence shape;
  • the current scan_metrics_report / commit_metrics_report split is also transitional, with the follow-up direction being a single metrics report model/table with a metric type discriminator;
  • listScanMetrics / listCommitMetrics are therefore likely interim API/SPI shapes, and may collapse or be rerouted when the schema/SPI consolidation happens.

I don't think this PR has to solve all of that before it can make progress. But I do think we should avoid accidentally standardizing the transitional shape just because this PR is ready first.

Could we make the sequencing explicit before merge? For example, either rebase on #4397 if that lands first, or link a concrete follow-up that tracks:

  1. moving metrics persistence to the standalone SPI shape,
  2. consolidating the metrics schema/model,
  3. deciding whether the per-type list methods remain public SPI surface or collapse behind a typed query API.

Comment thread spec/metrics-reports-service.yml
dimas-b
dimas-b previously approved these changes May 28, 2026
Anand Kumar Sankaran added 24 commits June 26, 2026 12:28
…polaris-core, remove MetricsPersistence from BasePersistence

Addresses architecture review feedback that the metrics reporting SPI was
defined at the wrong layer (CDI-coupled in runtime/service instead of
polaris-core), and that durable metrics persistence was bleeding through
BasePersistence into every metastore backend.

Scope 1 changes (this commit):
- Add stable CDI-agnostic IcebergMetricsReporter SPI to polaris-core
- Remove MetricsPersistence from BasePersistence extends clause
- Remove PolarisMetricsManager from PolarisMetaStoreManager extends clause
- Delete all durable-path code: JDBC models, converters, PersistingMetricsReporter
- Add no-op (default) and log-only reporters to extensions/metrics-reports/impl
- Stub REST read path in MetricsReportsService to return empty results
- Fix namespace decoding to use NamespaceUtils.splitNamespace (canonical)
- Add multi-level namespace test; fix CHANGELOG accuracy

Durable JDBC metrics persistence is deferred to a follow-up extension
module (extensions/metrics-reports/persistence/relational-jdbc) that
will back the read API without touching BasePersistence.
Main-branch merge brought in references to MetricsPersistence
(PolarisCallContext.getMetricsPersistence, MetaStoreManagerFactory
.getOrCreateMetricsPersistence, etc.) that Scope 1 had deleted.
Restore the six SPI types as standalone interfaces decoupled from
BasePersistence so all existing callers compile correctly.
… fix factory

JdbcBasePersistenceImpl still declared implements MetricsPersistence after
the Scope 1 refactor removed the import and all metric method bodies,
causing a cannot-find-symbol compile error at the class declaration line.

JdbcMetaStoreManagerFactory still passed JdbcBasePersistenceImpl to the
two-arg PolarisCallContext constructor (which requires P extends both
BasePersistence and MetricsPersistence) and returned it from
getOrCreateMetricsPersistence(), causing no-suitable-constructor and
incompatible-types errors.

Fixes:
- Remove MetricsPersistence from JdbcBasePersistenceImpl implements clause
- Use PolarisCallContext(realmContext, metaStore, new MetricsPersistence(){})
  in bootstrap, purge, and bootstrap-check code paths
- Return a no-op MetricsPersistence from getOrCreateMetricsPersistence();
  the real JDBC implementation is provided by the extension module in Scope 2
… catalogName

Java prohibits lambda parameters from shadowing enclosing-method parameters
of the same name. The no-op metricsReporter lambda introduced in the Scope 1
refactor used catalogName as a parameter name inside a createHandler override
whose method parameter is also catalogName, causing a compile error.
1. Remove stale MetricsReportToken$MetricsReportTokenType entry from
   persistence/relational-jdbc service file — the class was deleted in
   Scope 1 but ServiceLoader still found the registration, causing
   ServiceConfigurationError during pagination in LocalIcebergCatalog tests.

2. Fix PolarisCallContext constructor calls in JDBC test classes — both
   AtomicMetastoreManagerWithJdbcBasePersistenceImplTest and
   JdbcGrantRecordsIdempotencyTest used the two-arg constructor, which
   requires the second arg to implement MetricsPersistence. Now that
   JdbcBasePersistenceImpl no longer implements MetricsPersistence, the
   calls must explicitly pass a no-op MetricsPersistence.

3. Add polaris-extensions-metrics-reports as a runtimeOnly dependency to
   runtime/service — IcebergCatalogHandlerFactory injects IcebergMetricsReporter
   via CDI; without this module the NoOpMetricsReporter and LoggingMetricsReporter
   beans are absent, causing CDI injection failures in reportMetrics tests and
   integration tests.
…match application.properties

application.properties sets polaris.iceberg-metrics.reporting.type=default, but
LoggingMetricsReporter was annotated @Identifier("log") causing
UnsatisfiedResolutionException at runtime. Rename identifier to "default" and
update MetricsReportingConfiguration.type() default value to match.

Also regenerate config doc for MetricsReportingConfiguration.
…ics-reports/spi

IcebergMetricsReporter was never used inside polaris-core itself — it is
called from IcebergCatalogHandler in runtime/service, so polaris-core was
not the right home. Extract it into a new, minimal
extensions/metrics-reports/spi module so that:

* polaris-core has no dependency on (or knowledge of) the Iceberg metrics
  SPI
* downstream servers can opt out of the metrics extension without
  dragging in polaris-core machinery they do not need
* the SPI boundary is clear: runtime/service and the impl/jdbc extension
  modules all declare an explicit dep on polaris-extensions-metrics-reports-spi

MetricsRecordConverter (only used by the persisting JDBC reporter) is also
removed from polaris-core; it will live in the jdbc extension module on the
Scope 2 branch.

runtime/service: change runtimeOnly -> testRuntimeOnly for
polaris-extensions-metrics-reports so the impl is not silently pushed onto
the runtime classpath of every downstream server that depends on
runtime/service.
…/service

The @QuarkusIntegrationTest tests start a fully packaged application built
from runtimeClasspath. testRuntimeOnly excludes the dependency from that
packaged app, causing IcebergMetricsReporter CDI injection failures at
startup and testSendMetricsReport() to fail with a non-204 response.

runtimeOnly is required here so that LoggingMetricsReporter and
NoOpMetricsReporter are available when the Quarkus app is packaged and
started for integration tests.
The runtimeOnly dep on polaris-extensions-metrics-reports was leaking
into downstream consumers' runtime classpaths. Instead, make the
IcebergMetricsReporter CDI producer resilient: if no reporter is found
for the configured type, log a warning and fall back to a no-op.

For deployments, runtime/server declares its own runtimeOnly dep on the
impl module so LoggingMetricsReporter is always present there. For
runtime/service integration tests (@QuarkusIntegrationTest), the no-op
fallback satisfies CDI injection and the metrics endpoint returns 204.
…mments

- Move NoOpMetricsReporter from extensions/metrics-reports/impl to the
  SPI module so any project with polaris-extensions-metrics-reports-spi
  on its classpath gets a built-in no-op without reinventing one
- Add jandex + CDI/SmallRye annotation deps to the SPI module so Quarkus
  discovers the bean via classpath scanning
- Change MetricsReportingConfiguration code default to 'no-op' (was
  'default') so it always resolves to the SPI bean when no explicit type
  is set; production deployments continue to use type=default via
  application.properties (LoggingMetricsReporter)
- Override type=no-op in application-test.properties and
  application-it.properties so unit and integration tests in
  runtime/service (which do not have the impl module on runtimeClasspath)
  always get the SPI no-op without needing the impl module
- Revert ServiceProducers.metricsReporter() to simple .get() — the SPI
  no-op guarantees the lookup always succeeds for the configured defaults
- Remove redundant testImplementation(jakarta.ws.rs.api) from impl build
  (already covered by implementation() at line 35)
- Delete comments-only META-INF/services registration left after
  MetricsReportToken was removed in Scope 1
- Move MetricsReportsService from impl to runtime/service (better layering)
- Return HTTP 501 Not Implemented until durable extension is installed
- Move NoOpMetricsReporter from SPI to impl (CDI annotations don't belong in SPI)
- Strip CDI/SmallRye/jandex from SPI build.gradle.kts
- Change @Identifier("default") -> @Identifier("log") on LoggingMetricsReporter
- Change @WithDefault("no-op") -> @WithDefault("log") in MetricsReportingConfiguration
- Remove listScanReports/listCommitReports from MetricsPersistence (durable query SPI belongs in extension, not core)
- Update telemetry.md and spec description to document 501 behavior and extension requirement
- Regenerate config reference for updated @WithDefault
…9, r3424437119

- Rename IcebergMetricsReporter package to org.apache.polaris.extension.metrics
  (was org.apache.polaris.core.metrics — now matches the extensions/metrics-reports/spi module)
  Move physical file to org/apache/polaris/extension/metrics/ and update all imports
- Restore fallback no-op lambda in ServiceProducers.metricsReporter() with comment
  explaining that LoggingMetricsReporter/NoOpMetricsReporter live in the impl module
  (not SPI), so the lambda guards against missing impl on the classpath
- Scope polaris-api-metrics-reports-service as compileOnly in runtime/service to avoid
  leaking the generated metrics API into transitive consumers; add runtimeOnly to
  runtime/server so Quarkus can discover the JAX-RS endpoint at startup
…the extension module

Renames extensions/metrics-reports/impl → extensions/metrics-reports/base and moves
MetricsReportsService (and its unit test) from runtime/service into base, so the module
owns all three baseline pieces: REST service class + LoggingMetricsReporter + NoOpMetricsReporter.
Removes the compileOnly hack from runtime/service/build.gradle.kts that was needed to
keep polaris-api-metrics-reports-service off the transitive dep graph.

Addresses: apache#4115 (comment)
…r3431429840, r3431490149

- Move IcebergMetricsReporter to org.apache.polaris.extension.metrics.spi package for
  proper jar isolation; update all import sites (r3431409615)
- Add DefaultMetricsReporter inner class @Identifier("default") extending LoggingMetricsReporter
  to preserve backward compat with the old "default" config value (r3431490149)
- Remove polaris.iceberg-metrics.reporting.type=no-op from application-it.properties;
  already covered by application-test.properties (r3431419210)
- Remove explicit runtimeOnly(polaris-api-metrics-reports-service) from runtime/server;
  it flows transitively via polaris-extensions-metrics-reports (r3431429840)
…6067891 r3476154815 r3476177318 r3476188544

- Move testRuntimeOnly resteasy-reactive next to testImplementation quarkus.bom block
- Fix alphabetical ordering of polaris-extensions-metrics-reports* in projects.main.properties
- Restore javadocs stripped from ScanMetricsRecord and CommitMetricsRecord
- Add LIST_TABLE_METRICS to RangerPolarisOperationSemantics (mirrors RbacOperationSemantics)
- Use getOrCreateMetricsPersistence(realmContext) in JdbcMetaStoreManagerFactory instead of inline no-op
The @beta annotation alone is not universally interpreted as implying
breaking changes. Explicitly state that:
- The API is experimental and subject to breaking changes in any release
- It should not be used in production until declared stable
- The @beta label means early-access / POC, not stability

Updated: CHANGELOG, OpenAPI spec description, telemetry.md, and
the API spec page title.
…th inline impl

The MetricsModelUtils class was removed by an upstream commit during rebase,
but toRecord() in both model interfaces still called it. Replace with a
private static parseMetadata helper using the tools.jackson API already present.
The Java default (MetricsReportingConfiguration) is log, not no-op.
Revert the doc table to match reality; changing the default belongs
in a separate PR per reviewer feedback.
@obelix74 obelix74 force-pushed the implement_metrics_rest_api branch from f684980 to d32520b Compare June 26, 2026 19:32
Align metrics API docs, Ranger privileges, and runtime defaults with the PR1 SPI behavior, and fix the rebase-related JDBC test compilation and formatting failures.

Co-authored-by: Cursor <cursoragent@cursor.com>
@obelix74

Copy link
Copy Markdown
Contributor Author

@dimas-b

This PR removes the existing durable JDBC metrics persistence path in commit:

#0385f125d refactor(metrics): fix SPI layering — move IcebergMetricsReporter to polaris-core, remove MetricsPersistence from BasePersistence

That commit removes metrics persistence methods from:

persistence/relational-jdbc/src/main/java/org/apache/polaris/persistence/relational/jdbc/JdbcBasePersistenceImpl.java

Specifically, it removes:

• writeScanReport
• writeCommitReport
• listScanReports
• listCommitReports
• JDBC metrics insert/query helpers
• metrics pagination/query code

It also removes related durable-path code from:

• runtime/service/src/main/java/org/apache/polaris/service/reporting/PersistingMetricsReporter.java
• polaris-core/src/main/java/org/apache/polaris/core/persistence/metrics/MetricsPersistence.java
• polaris-core/src/main/java/org/apache/polaris/core/metrics/iceberg/MetricsRecordConverter.java
• JDBC metrics model/test files under persistence/relational-jdbc

The reason given in the commit is architecture review feedback: durable metrics persistence was leaking through BasePersistence into every metastore backend. PR1 was narrowed to the API/SPI and non-durable reporters only, while durable JDBC persistence was deferred to PR2 as a separate extension module.

Warning: if PR1 is merged without PR2, this is a temporary regression. polaris.iceberg-metrics.reporting.type=persisting will no longer work, metrics will not be written to JDBC storage, and the Metrics Reports read API will return HTTP 501 until PR2 adds extensions/metrics-reports/persistence/relational-jdbc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants