Skip to content

Introduce DataSourceResolver for multi-datasource support in JDBC per…#3960

Closed
Subham-KRLX wants to merge 17 commits into
apache:mainfrom
Subham-KRLX:feat/multi-datasource-support
Closed

Introduce DataSourceResolver for multi-datasource support in JDBC per…#3960
Subham-KRLX wants to merge 17 commits into
apache:mainfrom
Subham-KRLX:feat/multi-datasource-support

Conversation

@Subham-KRLX

@Subham-KRLX Subham-KRLX commented Mar 9, 2026

Copy link
Copy Markdown
Contributor

Introduces a DataSourceResolver interface to decouple DataSource selection from JdbcMetaStoreManagerFactory, enabling per-realm and per-store-type routing (main, metrics, events).

Changes:

  • New DataSourceResolver interface with resolve(realmId, storeType)
  • DefaultDataSourceResolver — routes everything to the default DS (no behavior change)
  • JdbcMetaStoreManagerFactory now uses DataSourceResolver instead of direct Instance<DataSource>

This is the foundation for #3890. Named DS resolution and config properties will follow.

Closes #3890

Checklist

@github-project-automation github-project-automation Bot moved this to PRs In Progress in Basic Kanban Board Mar 9, 2026
@Subham-KRLX Subham-KRLX marked this pull request as draft March 9, 2026 18:31
@dimas-b

dimas-b commented Mar 9, 2026

Copy link
Copy Markdown
Contributor

@Subham-KRLX : thanks for the proposal... I'll review presently, but please fix conflicts to allow CI to run.

…sistence

This commit introduces the DataSourceResolver interface and a
DefaultDataSourceResolver implementation to enable flexible routing
of DataSources based on realm and store type (e.g., main, metrics,
events).

Key changes:
- New DataSourceResolver interface with resolve(realmId, storeType)
  and getAllUniqueDataSources() methods
- DefaultDataSourceResolver that routes all requests to the default
  DataSource (backward-compatible no-op)
- Refactored JdbcMetaStoreManagerFactory to use DataSourceResolver
  instead of direct DataSource injection

This lays the groundwork for isolating different persistence workloads
(entity metadata vs metrics vs events) into separate connection pools
or databases to mitigate noisy neighbor effects.

Closes apache#3890
@Subham-KRLX Subham-KRLX force-pushed the feat/multi-datasource-support branch from c082f09 to 136e381 Compare March 9, 2026 18:38

@dimas-b dimas-b left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice idea, @Subham-KRLX ! +1 to this approach in general.

Posting quite a few comments, but they are just to "streamline" the code (I hope).

- Refactored StoreType to an enum in DataSourceResolver
- Removed getAllUniqueDataSources() from DataSourceResolver (Admin use case)
- Moved DefaultDataSourceResolver to runtime/common/config/jdbc
- Inject plain DataSource instead of Instance<DataSource> in DefaultDataSourceResolver
- Updated JdbcMetaStoreManagerFactory to use the new StoreType enum
@Subham-KRLX

Copy link
Copy Markdown
Contributor Author

@dimas-b thanks for the review I have streamlined the implementation based on your feedback by refactoring StoreType into an enum within DataSourceResolver and removing the getAllUniqueDataSources() method to keep the interface focused strictly on runtime resolution. Additionally I moved DefaultDataSourceResolver
to the runtime-common module to better decouple the persistence layer from resolution logic and simplified the implementation by injecting a plain DataSource instead of an Instance.

@dimas-b dimas-b left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple more comments, but LGTM overall 👍

@Subham-KRLX : Do you want to remove the "draft" status and open this for general review?

I'd suggest to also send an email about this to dev for visibility.

@Subham-KRLX Subham-KRLX marked this pull request as ready for review March 9, 2026 19:07
- Renamed StoreType.MAIN to METASTORE
- Use RealmContext instead of String realmId in DataSourceResolver.resolve
- Added missing polaris-core dependency to runtime-common
- Fixed formatting of @Inject fields for Spotless
* @return the resolved DataSource
*/
DataSource resolve(
org.apache.polaris.core.context.RealmContext realmContext, StoreType storeType);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: import

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimas-b I have pushed an update to resolve the CI failures. It turned out to be a CDI injection issue with the @Identifier annotation that I have now resolved.

dimas-b
dimas-b previously approved these changes Mar 9, 2026
@github-project-automation github-project-automation Bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Mar 9, 2026
@Subham-KRLX

Copy link
Copy Markdown
Contributor Author

Thanks for the approval @dimas-b. I have fixed that last import nit and have sent the proposal email to the dev list [1] for broader visibility as you suggested. Appreciate all the guidance on streamlining this.

@dimas-b

dimas-b commented Mar 10, 2026

Copy link
Copy Markdown
Contributor

@Subham-KRLX : the PR LGTM, but please check CI failures out 😅

@Subham-KRLX

Copy link
Copy Markdown
Contributor Author

@dimas-b @flyrain I’ve pushed an update to fix the CI failures in the non-JDBC test suites. The root cause was DefaultDataSourceResolver unconditionally injecting a JDBC DataSource bean, which caused an io.quarkus.arc.InactiveBeanException when booting profiles where JDBC is disabled (like NoSQL/Mongo). I’ve updated the implementation to use Instance for lazy resolution, allowing the service to boot safely across all persistence modes. All CI checks should now pass cleanly.

dimas-b
dimas-b previously approved these changes Mar 11, 2026
@dimas-b

dimas-b commented Mar 11, 2026

Copy link
Copy Markdown
Contributor

[...] The root cause was DefaultDataSourceResolver unconditionally injecting a JDBC DataSource bean, which caused an io.quarkus.arc.InactiveBeanException when booting profiles where JDBC is disabled (like NoSQL/Mongo).

Current code in this PR LGTM, however, I still wonder why would DefaultDataSourceResolver be "engaged" in non-JDBC contexts (and subsequently get the InactiveBeanException) 🤔

@Subham-KRLX : Could you explore this a bit more?

Perhaps DefaultDataSourceResolver may need to be a @Dependent bean rather than @ApplicationScoped 🤔

@flyrain

flyrain commented Mar 11, 2026

Copy link
Copy Markdown
Contributor

As I said in the mailing thread https://lists.apache.org/thread/nokyythvrdzsfwz26hx0w5rpxrw5wjtd, I think this effort deserve more thoughts before we start coding. cc @singhpk234

@Subham-KRLX

Copy link
Copy Markdown
Contributor Author

[...] The root cause was DefaultDataSourceResolver unconditionally injecting a JDBC DataSource bean, which caused an io.quarkus.arc.InactiveBeanException when booting profiles where JDBC is disabled (like NoSQL/Mongo).

Current code in this PR LGTM, however, I still wonder why would DefaultDataSourceResolver be "engaged" in non-JDBC contexts (and subsequently get the InactiveBeanException) 🤔

@Subham-KRLX : Could you explore this a bit more?

Perhaps DefaultDataSourceResolver may need to be a @Dependent bean rather than @ApplicationScoped 🤔

Hi @dimas-b the root issue is Quarkus ArC's eager boot-time validation.Since DefaultDataSourceResolver
is in runtime-common, it’s always discovered during boot (even in NoSQL profiles). The Quarkus JDBC extension disables the DataSource bean when JDBC is off.
A direct @Inject DataSource causes Quarkus to throw an InactiveBeanException immediately at boot because the required injection point cannot be satisfied. By using @Inject Instance, we make the injection point dynamic/lazy, bypassing the strict upfront validation. (Even if the bean were @dependent, Quarkus ArC still eagerly validates all discovered injection points by default).

@flyingImer flyingImer left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I’m aligned with keeping this extensible, and I agree workload isolation is a useful direction.

My main concern is that the current PR may already be standardizing a broader routing contract than the implementation actually justifies today.

IIUC the current behavior is still quite narrow:

• the default resolver does not yet differentiate meaningfully by RealmContext / StoreType,
• the tests mostly validate the refactoring shape rather than the SPI as a real extension contract,
• and the current split still looks closer to logical separation than a clearly-defined built-in routing model.

So I wonder if we should be more explicit about the narrower initial story this PR is trying to support.

If the immediate goal is workload isolation, I’d rather scope this PR around that foundation first, and leave broader realm-aware routing as a later / separately justified step.

My concern is mostly about not letting “SPI can support this later” implicitly turn into “Polaris v1.x now owns this broader built-in routing story” before that boundary is explicit.

…sistence (apache#3960)

Drafted design doc, fixed purge for isolated stores, and resolved build failures.
@Subham-KRLX

Copy link
Copy Markdown
Contributor Author

@flyrain, @flyingImer, @dimas-b Just wanted to let you know that I have updated the PR with the fixes we discussed and have officially sent the design proposal for Multi-DataSource support to the dev@polaris.apache.org mailing list for broader community feedback You can find the design document in the repository here: jdbc-multi-datasource.md

@dimas-b dimas-b left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concentrating on the proposal doc for now.

~ specific language governing permissions and limitations
~ under the License.
-->
# Design: Multi-DataSource support in JDBC persistence

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this perhaps be under unreleased/proposals?.. e.g. like in this proposal using GH PR and .md files: #3924

I personally do not mind having the proposal doc in the same PR as the implementation, but we need to be careful with reviews so as to settle on the general proposal first and proceed to code changes later (to avoid unnecessary work).


### Bootstrap
When a realm is bootstrapped via `JdbcMetaStoreManagerFactory.bootstrapRealms()`:
1. **Resolution**: The `DataSourceResolver` is called for each `StoreType` (METASTORE, METRICS, EVENTS).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd think the bootstrap process should indicate which of the possible store types to initialize. Ultimately this is a the Admin user's decision, I think (defaulting to all).

The rationale is to not create tables that the Admin user knows are not going to be used (reducing database maintenance burden by not creating "dead" tables).

Upgrading a Polaris deployment from version N to N+1 involves:
1. **Detection**: The service detects that the `metastore`'s `polaris_schema_version` is below the requested level.
2. **Execution**: The migration logic resolves each `StoreType` and applies the relevant "vN-to-vN+1" upgrade script.
3. **Consistency**: Because the process is unified within the `JdbcMetaStoreManagerFactory`, it provides a single point of failure and ensures that all data sources are either upgraded or rolled back (where possible).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different store types do not have to be on the same schema version, IMHO. Their tables are not transactionally related.

Moreover, storage for METRICS does not have to be JDBC even if METASTORE is JDBC (same for events).

## Schema Upgrades
Upgrading a Polaris deployment from version N to N+1 involves:
1. **Detection**: The service detects that the `metastore`'s `polaris_schema_version` is below the requested level.
2. **Execution**: The migration logic resolves each `StoreType` and applies the relevant "vN-to-vN+1" upgrade script.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe schema upgrades should be part of the bootstrap flow and require an explicit user action... WDYT?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have refactored the Multi-DataSource JDBC implementation to fully address the feedback from @dimas-b. This update introduces selective bootstrap control via new CLI flags (--metastore, --metrics, --events), independent schema versioning for each store type, and moves schema upgrades to the explicit administrative bootstrap flow Additionally I have added null-safety to JdbcBasePersistenceImpl to support heterogeneous storage for metrics and events, and the design proposal has been finalized and moved to site/content/in-dev/unreleased/proposals/jdbc-multi-datasource.md.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @Subham-KRLX !

@flyrain , @singhpk234 : What is your take on the proposed design?

@Subham-KRLX Subham-KRLX requested a review from dimas-b April 7, 2026 08:23
@dimas-b

dimas-b commented Apr 7, 2026

Copy link
Copy Markdown
Contributor

@flyrain @singhpk234 : please review the latest changes proposed by @Subham-KRLX .

From my POV, the PR is moving in the right direction, but I'd appreciate reviews from both of you before we proceed. Thx!

@dimas-b dimas-b requested review from flyrain and singhpk234 April 7, 2026 19:28
@github-actions

Copy link
Copy Markdown

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions Bot added the stale label May 17, 2026
@github-actions github-actions Bot closed this May 24, 2026
@github-project-automation github-project-automation Bot moved this from Ready to merge to Done in Basic Kanban Board May 24, 2026
@Subham-KRLX Subham-KRLX deleted the feat/multi-datasource-support branch May 29, 2026 08:57
adutra added a commit to adutra/polaris that referenced this pull request Jun 17, 2026
… of **runtime** activation of the desired datasource.

This is achieved by:

1) Introducing a new enabler property: `polaris.persistence.relational.jdbc.datasource`, which points to a named datasource to activate.

2) Declaring, **at build time**, one named datasource for each JDBC driver shipped with the application.

3) Introducing a new `DataSourceActivator` config interceptor that will, at runtime, change the `active` setting of the target datasource to `true`, and all others to `false` (`active` is a runtime setting, contrary to `db-kind`).

This change also makes the server ship with the H2 driver by default, since that's the pre-condition for an H2 datasource to be selected at runtime.

This change will further enable **as future improvements**:

- Using to JDBC + H2 by default for server images, while still allowing users to switch to PostgreSQL *without having to rebuild Polaris*.

- Enabling the persistence to "plug" into not one, but many datasources, e.g. a specific datasource for main persistence, and a separate datasource for metrics. This was outlined in apache#3960 – but the machinery to make that happen was missing.
@adutra adutra mentioned this pull request Jun 17, 2026
6 tasks
adutra added a commit to adutra/polaris that referenced this pull request Jun 17, 2026
This change enables multiple datasources in Polaris, with possibility of **runtime** activation of the desired datasource.

This is achieved by:

1) Introducing a new enabler property: `polaris.persistence.relational.jdbc.datasource`, which points to a named datasource to activate.

2) Declaring, **at build time**, one named datasource for each JDBC driver shipped with the application.

3) Introducing a new `DataSourceActivator` config interceptor that will, at runtime, change the `active` setting of the target datasource to `true`, and all others to `false` (`active` is a runtime setting, contrary to `db-kind`).

This change also makes the server ship with the H2 driver by default, since that's the pre-condition for an H2 datasource to be selected at runtime.

This change will further enable **as future improvements**:

- Using to JDBC + H2 by default for server images, while still allowing users to switch to PostgreSQL *without having to rebuild Polaris*.

- Enabling the persistence to "plug" into not one, but many datasources, e.g. a specific datasource for main persistence, and a separate datasource for metrics. This was outlined in apache#3960 – but the machinery to make that happen was missing.
adutra added a commit to adutra/polaris that referenced this pull request Jun 17, 2026
This change enables multiple datasources in Polaris, with possibility of **runtime** activation of the desired datasource.

This is achieved by:

1) Introducing a new enabler property: `polaris.persistence.relational.jdbc.datasource`, which points to a named datasource to activate.

2) Declaring, **at build time**, one named datasource for each JDBC driver shipped with the application.

3) Introducing a new `DataSourceActivator` config interceptor that will, at runtime, change the `active` setting of the target datasource to `true`, and all others to `false` (`active` is a runtime setting, contrary to `db-kind`).

This change also makes the server ship with the H2 driver by default, since that's the pre-condition for an H2 datasource to be selected at runtime.

This change will further enable **as future improvements**:

- Using to JDBC + H2 by default for server images, while still allowing users to switch to PostgreSQL *without having to rebuild Polaris*.

- Enabling the persistence to "plug" into not one, but many datasources, e.g. a specific datasource for main persistence, and a separate datasource for metrics. This was outlined in apache#3960 – but the machinery to make that happen was missing.
adutra added a commit to adutra/polaris that referenced this pull request Jun 17, 2026
This change enables multiple datasources in Polaris, with possibility of **runtime** activation of the desired datasource.

This is achieved by:

1) Introducing a new enabler property: `polaris.persistence.relational.jdbc.datasource`, which points to a named datasource to activate.

2) Declaring, **at build time**, one named datasource for each JDBC driver shipped with the application.

3) Introducing a new `DataSourceActivator` config interceptor that will, at runtime, change the `active` setting of the target datasource to `true`, and all others to `false` (`active` is a runtime setting, contrary to `db-kind`).

This change also makes the server ship with the H2 driver by default, since that's the pre-condition for an H2 datasource to be selected at runtime.

This change will further enable **as future improvements**:

- Using to JDBC + H2 by default for server images, while still allowing users to switch to PostgreSQL *without having to rebuild Polaris*.

- Enabling the persistence to "plug" into not one, but many datasources, e.g. a specific datasource for main persistence, and a separate datasource for metrics. This was outlined in apache#3960 – but the machinery to make that happen was missing.
adutra added a commit to adutra/polaris that referenced this pull request Jun 17, 2026
This change enables multiple datasources in Polaris, with possibility of **runtime** activation of the desired datasource.

This is achieved by:

1) Introducing a new enabler property: `polaris.persistence.relational.jdbc.datasource`, which points to a named datasource to activate.

2) Declaring, **at build time**, one named datasource for each JDBC driver shipped with the application.

3) Introducing a new `DataSourceActivator` config interceptor that will, at runtime, change the `active` setting of the target datasource to `true`, and all others to `false` (`active` is a runtime setting, contrary to `db-kind`).

This change also makes the server ship with the H2 driver by default, since that's the pre-condition for an H2 datasource to be selected at runtime.

This change will further enable **as future improvements**:

- Using to JDBC + H2 by default for server images, while still allowing users to switch to PostgreSQL *without having to rebuild Polaris*.

- Enabling the persistence to "plug" into not one, but many datasources, e.g. a specific datasource for main persistence, and a separate datasource for metrics. This was outlined in apache#3960 – but the machinery to make that happen was missing.
adutra added a commit to adutra/polaris that referenced this pull request Jun 17, 2026
This change enables multiple datasources in Polaris, with possibility of **runtime** activation of the desired datasource.

This is achieved by:

1) Introducing a new enabler property: `polaris.persistence.relational.jdbc.datasource`, which points to a named datasource to activate.

2) Declaring, **at build time**, one named datasource for each JDBC driver shipped with the application.

3) Introducing a new `DataSourceActivator` config interceptor that will, at runtime, change the `active` setting of the target datasource to `true`, and all others to `false` (`active` is a runtime setting, contrary to `db-kind`).

This change also makes the server ship with the H2 driver by default, since that's the pre-condition for an H2 datasource to be selected at runtime.

This change will further enable **as future improvements**:

- Using to JDBC + H2 by default for server images, while still allowing users to switch to PostgreSQL *without having to rebuild Polaris*.

- Enabling the persistence to "plug" into not one, but many datasources, e.g. a specific datasource for main persistence, and a separate datasource for metrics. This was outlined in apache#3960 – but the machinery to make that happen was missing.
@adutra

adutra commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

FYI relevant ML discussion:

https://lists.apache.org/thread/jy6wb186h94n9q86kv01shbn68ppr6gv

Relevant work:

#4812

@Subham-KRLX

Copy link
Copy Markdown
Contributor Author

Thanks a lot @adutra for picking this up and continuing the multi-datasource work #3960 was closed by the stale bot but the design direction was still being actively discussed. I am happy to help carry this forward and align the earlier notes with #4812.

@adutra

adutra commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Thank you @Subham-KRLX ! Unfortunately the branch has been deleted and this PR cannot be reopened.

Let's wait to see if #4812 reaches consensus. If so, then I have some ideas about how to adapt your work on top of it. It shouldn't be hard.

@Subham-KRLX Subham-KRLX restored the feat/multi-datasource-support branch June 18, 2026 17:41
adutra added a commit to adutra/polaris that referenced this pull request Jun 26, 2026
This change enables multiple datasources in Polaris, with possibility of **runtime** activation of the desired datasource.

This is achieved by:

1) Introducing a new enabler property: `polaris.persistence.relational.jdbc.datasource`, which points to a named datasource to activate.

2) Declaring, **at build time**, one named datasource for each JDBC driver shipped with the application.

3) Introducing a new `DataSourceActivator` config interceptor that will, at runtime, change the `active` setting of the target datasource to `true`, and all others to `false` (`active` is a runtime setting, contrary to `db-kind`).

This change also makes the server ship with the H2 driver by default, since that's the pre-condition for an H2 datasource to be selected at runtime.

This change will further enable **as future improvements**:

- Using to JDBC + H2 by default for server images, while still allowing users to switch to PostgreSQL *without having to rebuild Polaris*.

- Enabling the persistence to "plug" into not one, but many datasources, e.g. a specific datasource for main persistence, and a separate datasource for metrics. This was outlined in apache#3960 – but the machinery to make that happen was missing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support multiple data sources in JDBC persistence

7 participants