Skip to content

optimize data branch merge apply execution#24860

Open
gouhongshen wants to merge 2 commits into
matrixorigin:mainfrom
gouhongshen:codex/data-branch-merge-exec-route-main
Open

optimize data branch merge apply execution#24860
gouhongshen wants to merge 2 commits into
matrixorigin:mainfrom
gouhongshen:codex/data-branch-merge-exec-route-main

Conversation

@gouhongshen
Copy link
Copy Markdown
Contributor

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

Related issue #24099

What this PR does / why we need it:

This is the main cherry-pick of #24858.

DATA BRANCH MERGE materializes row differences into __mo_diff_* helper tables, then applies them to the base table with base-table DML that reads those helper tables.

runSql previously routed every statement mentioning __mo_diff_* through BackgroundExec. That was needed for temp-table DDL and temp-table-targeting DML, but it was too broad for the base-table delete/insert apply statements. On a 100M-row table with only 1000 updated rows, the apply phase spent seconds on the slower path.

This change keeps BackgroundExec for data-branch temp table DDL and DML that targets __mo_diff_* tables, while allowing base-table DML that only reads those helper tables to use the internal SQL executor.

Validation on main:

  • make cgo
  • CGO_CFLAGS="-I$(pwd)/thirdparties/install/include" go test ./pkg/frontend

@matrix-meow matrix-meow added the size/M Denotes a PR that changes [100,499] lines label Jun 5, 2026
@mergify mergify Bot added kind/bug Something isn't working kind/enhancement kind/test-ci labels Jun 5, 2026
@gouhongshen gouhongshen changed the title [codex] optimize data branch merge apply execution optimize data branch merge apply execution Jun 5, 2026
@gouhongshen gouhongshen marked this pull request as ready for review June 5, 2026 07:35
@gouhongshen gouhongshen requested a review from XuPeng-SH as a code owner June 5, 2026 07:35
@qodo-code-review
Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@qodo-code-review
Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

Copy link
Copy Markdown
Contributor

@XuPeng-SH XuPeng-SH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after reviewing from three angles:

  • correctness: the execution routing now only forces back-exec when the temp diff table itself is the DDL/DML target, while main-table DML that reads temp tables keeps the internal path
  • regression risk: the change stays conservative for unknown temp-table statements and separately fixes subquery SQL formatting to preserve string literals
  • tests: unit tests and distributed SQL coverage were both extended for the main-table DML path and the quoted-string KEYS subquery case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Something isn't working kind/enhancement kind/test-ci size/M Denotes a PR that changes [100,499] lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants