(feat) Implement metrics rest api#4115
Conversation
1d37c50 to
ac66c27
Compare
dimas-b
left a comment
There was a problem hiding this comment.
LGTM with one minor remaining comment 😅
@sungwy , @sneethiraj : FYI about the new AuthZ operation.
flyingImer
left a comment
There was a problem hiding this comment.
The direction looks right to me
Two structural observations:
-
With this PR, MetricsPersistence grows from 2 write methods to 4 (read + write). It's marked @beta and the javadoc calls it a "Service Provider Interface." But it lives on BasePersistence, which only local DB backends implement. NoSqlMetaStoreManager and RemotePolarisMetaStoreManager go through empty BasePersistence implementations, so these methods are permanently no-op for them. Meanwhile, the actual SPI interfaces (PolarisMetricsReporter, PolarisMetricsManager) have no annotation at all. The @beta signal is on the wrong layer IIUC
-
The write path enters through PolarisMetricsManager on MetaStoreManager, but this read path bypasses that layer and goes straight to BasePersistence via callContext.getMetaStore(). If we want the metrics read API to work for non-JDBC backends, it would need a MetaStoreManager-level entry point, same as writes.
Not blocking on this. I think the question of where metrics persistence should sit architecturally is worth a discussion on dev@.
Thank you. I have added About the second point, thank you. Would this mean a read method to |
Thanks for adding @beta. Reads should go through MetaStoreManager too, same as writes. If reads stay on BasePersistence, non-JDBC backends can't implement the read API at all. I'd prefer fixing that in this PR so the read path ships with the same layering as writes. Separately, the persistence schema discussion on dev@ is still open. A follow-up issue linking to that thread would help track it. |
Pushed a commit (and rebased against updated main). For the persistence schema discussion — I'll open a follow-up issue linking to the dev@ thread once there's a message to reference. Happy to do that now if you can share the thread link (I don't have it handy). Please let me know. |
flyingImer
left a comment
There was a problem hiding this comment.
Are you planning to merge as-is now that Dmitri approved, or is there another round? Asking because the May 7 metrics sync landed on a few directional items that touch the schema and SPI shape here. Left some questions inline.
|
@obelix74 : it looks like this PR got a lot of conflicts 🤷 |
Resolved all conflicts and push. Rebased against main. |
flyingImer
left a comment
There was a problem hiding this comment.
Thanks for continuing to push this forward. The direction still looks good to me: exposing persisted metrics through a beta read API, using a stable response envelope, keeping the API in an extension module, and routing reads through table-scoped authz all make sense.
One thing I would still like to clarify before this merges is sequencing with the metrics SPI/schema work we discussed after the May 7 sync.
From the previous review thread, I think we already converged on a few points:
- the current
MetricsPersistence/PolarisMetricsManagerlayering is transitional, and #4397 is expected to move metrics persistence out of the old aggregatedBasePersistenceshape; - the current
scan_metrics_report/commit_metrics_reportsplit is also transitional, with the follow-up direction being a single metrics report model/table with a metric type discriminator; listScanMetrics/listCommitMetricsare therefore likely interim API/SPI shapes, and may collapse or be rerouted when the schema/SPI consolidation happens.
I don't think this PR has to solve all of that before it can make progress. But I do think we should avoid accidentally standardizing the transitional shape just because this PR is ready first.
Could we make the sequencing explicit before merge? For example, either rebase on #4397 if that lands first, or link a concrete follow-up that tracks:
- moving metrics persistence to the standalone SPI shape,
- consolidating the metrics schema/model,
- deciding whether the per-type list methods remain public SPI surface or collapse behind a typed query API.
…polaris-core, remove MetricsPersistence from BasePersistence Addresses architecture review feedback that the metrics reporting SPI was defined at the wrong layer (CDI-coupled in runtime/service instead of polaris-core), and that durable metrics persistence was bleeding through BasePersistence into every metastore backend. Scope 1 changes (this commit): - Add stable CDI-agnostic IcebergMetricsReporter SPI to polaris-core - Remove MetricsPersistence from BasePersistence extends clause - Remove PolarisMetricsManager from PolarisMetaStoreManager extends clause - Delete all durable-path code: JDBC models, converters, PersistingMetricsReporter - Add no-op (default) and log-only reporters to extensions/metrics-reports/impl - Stub REST read path in MetricsReportsService to return empty results - Fix namespace decoding to use NamespaceUtils.splitNamespace (canonical) - Add multi-level namespace test; fix CHANGELOG accuracy Durable JDBC metrics persistence is deferred to a follow-up extension module (extensions/metrics-reports/persistence/relational-jdbc) that will back the read API without touching BasePersistence.
Main-branch merge brought in references to MetricsPersistence (PolarisCallContext.getMetricsPersistence, MetaStoreManagerFactory .getOrCreateMetricsPersistence, etc.) that Scope 1 had deleted. Restore the six SPI types as standalone interfaces decoupled from BasePersistence so all existing callers compile correctly.
… fix factory
JdbcBasePersistenceImpl still declared implements MetricsPersistence after
the Scope 1 refactor removed the import and all metric method bodies,
causing a cannot-find-symbol compile error at the class declaration line.
JdbcMetaStoreManagerFactory still passed JdbcBasePersistenceImpl to the
two-arg PolarisCallContext constructor (which requires P extends both
BasePersistence and MetricsPersistence) and returned it from
getOrCreateMetricsPersistence(), causing no-suitable-constructor and
incompatible-types errors.
Fixes:
- Remove MetricsPersistence from JdbcBasePersistenceImpl implements clause
- Use PolarisCallContext(realmContext, metaStore, new MetricsPersistence(){})
in bootstrap, purge, and bootstrap-check code paths
- Return a no-op MetricsPersistence from getOrCreateMetricsPersistence();
the real JDBC implementation is provided by the extension module in Scope 2
… catalogName Java prohibits lambda parameters from shadowing enclosing-method parameters of the same name. The no-op metricsReporter lambda introduced in the Scope 1 refactor used catalogName as a parameter name inside a createHandler override whose method parameter is also catalogName, causing a compile error.
1. Remove stale MetricsReportToken$MetricsReportTokenType entry from persistence/relational-jdbc service file — the class was deleted in Scope 1 but ServiceLoader still found the registration, causing ServiceConfigurationError during pagination in LocalIcebergCatalog tests. 2. Fix PolarisCallContext constructor calls in JDBC test classes — both AtomicMetastoreManagerWithJdbcBasePersistenceImplTest and JdbcGrantRecordsIdempotencyTest used the two-arg constructor, which requires the second arg to implement MetricsPersistence. Now that JdbcBasePersistenceImpl no longer implements MetricsPersistence, the calls must explicitly pass a no-op MetricsPersistence. 3. Add polaris-extensions-metrics-reports as a runtimeOnly dependency to runtime/service — IcebergCatalogHandlerFactory injects IcebergMetricsReporter via CDI; without this module the NoOpMetricsReporter and LoggingMetricsReporter beans are absent, causing CDI injection failures in reportMetrics tests and integration tests.
…match application.properties application.properties sets polaris.iceberg-metrics.reporting.type=default, but LoggingMetricsReporter was annotated @Identifier("log") causing UnsatisfiedResolutionException at runtime. Rename identifier to "default" and update MetricsReportingConfiguration.type() default value to match. Also regenerate config doc for MetricsReportingConfiguration.
…ics-reports/spi IcebergMetricsReporter was never used inside polaris-core itself — it is called from IcebergCatalogHandler in runtime/service, so polaris-core was not the right home. Extract it into a new, minimal extensions/metrics-reports/spi module so that: * polaris-core has no dependency on (or knowledge of) the Iceberg metrics SPI * downstream servers can opt out of the metrics extension without dragging in polaris-core machinery they do not need * the SPI boundary is clear: runtime/service and the impl/jdbc extension modules all declare an explicit dep on polaris-extensions-metrics-reports-spi MetricsRecordConverter (only used by the persisting JDBC reporter) is also removed from polaris-core; it will live in the jdbc extension module on the Scope 2 branch. runtime/service: change runtimeOnly -> testRuntimeOnly for polaris-extensions-metrics-reports so the impl is not silently pushed onto the runtime classpath of every downstream server that depends on runtime/service.
…/service The @QuarkusIntegrationTest tests start a fully packaged application built from runtimeClasspath. testRuntimeOnly excludes the dependency from that packaged app, causing IcebergMetricsReporter CDI injection failures at startup and testSendMetricsReport() to fail with a non-204 response. runtimeOnly is required here so that LoggingMetricsReporter and NoOpMetricsReporter are available when the Quarkus app is packaged and started for integration tests.
The runtimeOnly dep on polaris-extensions-metrics-reports was leaking into downstream consumers' runtime classpaths. Instead, make the IcebergMetricsReporter CDI producer resilient: if no reporter is found for the configured type, log a warning and fall back to a no-op. For deployments, runtime/server declares its own runtimeOnly dep on the impl module so LoggingMetricsReporter is always present there. For runtime/service integration tests (@QuarkusIntegrationTest), the no-op fallback satisfies CDI injection and the metrics endpoint returns 204.
…mments - Move NoOpMetricsReporter from extensions/metrics-reports/impl to the SPI module so any project with polaris-extensions-metrics-reports-spi on its classpath gets a built-in no-op without reinventing one - Add jandex + CDI/SmallRye annotation deps to the SPI module so Quarkus discovers the bean via classpath scanning - Change MetricsReportingConfiguration code default to 'no-op' (was 'default') so it always resolves to the SPI bean when no explicit type is set; production deployments continue to use type=default via application.properties (LoggingMetricsReporter) - Override type=no-op in application-test.properties and application-it.properties so unit and integration tests in runtime/service (which do not have the impl module on runtimeClasspath) always get the SPI no-op without needing the impl module - Revert ServiceProducers.metricsReporter() to simple .get() — the SPI no-op guarantees the lookup always succeeds for the configured defaults - Remove redundant testImplementation(jakarta.ws.rs.api) from impl build (already covered by implementation() at line 35) - Delete comments-only META-INF/services registration left after MetricsReportToken was removed in Scope 1
- Move MetricsReportsService from impl to runtime/service (better layering) - Return HTTP 501 Not Implemented until durable extension is installed - Move NoOpMetricsReporter from SPI to impl (CDI annotations don't belong in SPI) - Strip CDI/SmallRye/jandex from SPI build.gradle.kts - Change @Identifier("default") -> @Identifier("log") on LoggingMetricsReporter - Change @WithDefault("no-op") -> @WithDefault("log") in MetricsReportingConfiguration - Remove listScanReports/listCommitReports from MetricsPersistence (durable query SPI belongs in extension, not core) - Update telemetry.md and spec description to document 501 behavior and extension requirement - Regenerate config reference for updated @WithDefault
…9, r3424437119 - Rename IcebergMetricsReporter package to org.apache.polaris.extension.metrics (was org.apache.polaris.core.metrics — now matches the extensions/metrics-reports/spi module) Move physical file to org/apache/polaris/extension/metrics/ and update all imports - Restore fallback no-op lambda in ServiceProducers.metricsReporter() with comment explaining that LoggingMetricsReporter/NoOpMetricsReporter live in the impl module (not SPI), so the lambda guards against missing impl on the classpath - Scope polaris-api-metrics-reports-service as compileOnly in runtime/service to avoid leaking the generated metrics API into transitive consumers; add runtimeOnly to runtime/server so Quarkus can discover the JAX-RS endpoint at startup
…the extension module Renames extensions/metrics-reports/impl → extensions/metrics-reports/base and moves MetricsReportsService (and its unit test) from runtime/service into base, so the module owns all three baseline pieces: REST service class + LoggingMetricsReporter + NoOpMetricsReporter. Removes the compileOnly hack from runtime/service/build.gradle.kts that was needed to keep polaris-api-metrics-reports-service off the transitive dep graph. Addresses: apache#4115 (comment)
…r3431429840, r3431490149 - Move IcebergMetricsReporter to org.apache.polaris.extension.metrics.spi package for proper jar isolation; update all import sites (r3431409615) - Add DefaultMetricsReporter inner class @Identifier("default") extending LoggingMetricsReporter to preserve backward compat with the old "default" config value (r3431490149) - Remove polaris.iceberg-metrics.reporting.type=no-op from application-it.properties; already covered by application-test.properties (r3431419210) - Remove explicit runtimeOnly(polaris-api-metrics-reports-service) from runtime/server; it flows transitively via polaris-extensions-metrics-reports (r3431429840)
…6067891 r3476154815 r3476177318 r3476188544 - Move testRuntimeOnly resteasy-reactive next to testImplementation quarkus.bom block - Fix alphabetical ordering of polaris-extensions-metrics-reports* in projects.main.properties - Restore javadocs stripped from ScanMetricsRecord and CommitMetricsRecord - Add LIST_TABLE_METRICS to RangerPolarisOperationSemantics (mirrors RbacOperationSemantics) - Use getOrCreateMetricsPersistence(realmContext) in JdbcMetaStoreManagerFactory instead of inline no-op
The @beta annotation alone is not universally interpreted as implying breaking changes. Explicitly state that: - The API is experimental and subject to breaking changes in any release - It should not be used in production until declared stable - The @beta label means early-access / POC, not stability Updated: CHANGELOG, OpenAPI spec description, telemetry.md, and the API spec page title.
…th inline impl The MetricsModelUtils class was removed by an upstream commit during rebase, but toRecord() in both model interfaces still called it. Replace with a private static parseMetadata helper using the tools.jackson API already present.
The Java default (MetricsReportingConfiguration) is log, not no-op. Revert the doc table to match reality; changing the default belongs in a separate PR per reviewer feedback.
…sCallContext.getMetricsPersistence)
f684980 to
d32520b
Compare
Align metrics API docs, Ranger privileges, and runtime defaults with the PR1 SPI behavior, and fix the rebase-related JDBC test compilation and formatting failures. Co-authored-by: Cursor <cursoragent@cursor.com>
|
This PR removes the existing durable JDBC metrics persistence path in commit: #0385f125d refactor(metrics): fix SPI layering — move IcebergMetricsReporter to polaris-core, remove MetricsPersistence from BasePersistence That commit removes metrics persistence methods from: persistence/relational-jdbc/src/main/java/org/apache/polaris/persistence/relational/jdbc/JdbcBasePersistenceImpl.java Specifically, it removes: • writeScanReport It also removes related durable-path code from: • runtime/service/src/main/java/org/apache/polaris/service/reporting/PersistingMetricsReporter.java The reason given in the commit is architecture review feedback: durable metrics persistence was leaking through BasePersistence into every metastore backend. PR1 was narrowed to the API/SPI and non-durable reporters only, while durable JDBC persistence was deferred to PR2 as a separate extension module. Warning: if PR1 is merged without PR2, this is a temporary regression. polaris.iceberg-metrics.reporting.type=persisting will no longer work, metrics will not be written to JDBC storage, and the Metrics Reports read API will return HTTP 501 until PR2 adds extensions/metrics-reports/persistence/relational-jdbc. |
This is an implementation of the proposal in #4010. This uses the stable envelope design for the REST API instead of a flattened structure.
Checklist
CHANGELOG.md(if needed)site/content/in-dev/unreleased(if needed)