Skip to content

compile: allow local notify streams for shuffle dispatch (#24159)#24854

Merged
mergify[bot] merged 3 commits into
matrixorigin:3.0-devfrom
LeftHandCold:send-notify-self-connection
Jun 7, 2026
Merged

compile: allow local notify streams for shuffle dispatch (#24159)#24854
mergify[bot] merged 3 commits into
matrixorigin:3.0-devfrom
LeftHandCold:send-notify-self-connection

Conversation

@LeftHandCold
Copy link
Copy Markdown
Contributor

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #24158

What this PR does / why we need it:

This fixes the self-connection bug in the compile/remoterun notification path.

Scope.RemoteRun already handles the local-scope case with ipAddrMatch, but Scope.sendNotifyMessage still creates a pipeline RPC stream for every RemoteReceivRegInfo.FromAddr. When one dispatch operator is scheduled on the same CN as the coordinator, fromAddr equals the local service address. The old pipelineClient.NewStream rejected that loopback connection with remote run pipeline in local, so the prepare-done notification path could not be established for the local dispatch operator.

That can break remote dispatch/shuffle coordination for queries such as secondary-index UPDATE plans using shuffle join with serial_full(...) and surface as failure or hang.

This PR keeps the existing RemoteRun local-execution guard intact, but allows the notification stream itself to connect to the local CN address. That is the smallest fix that restores the existing dispatch notification protocol without refactoring local/remote receiver handling.

Also included:

  • a unit test that pins the real regression point in pipelineClient.NewStream

…n#24159)

This fixes the self-connection bug in the compile/remoterun notification path.

`Scope.RemoteRun` already handles the local-scope case with `ipAddrMatch`, but `Scope.sendNotifyMessage` still creates a pipeline RPC stream for every `RemoteReceivRegInfo.FromAddr`. When one dispatch operator is scheduled on the same CN as the coordinator, `fromAddr` equals the local service address. The old `pipelineClient.NewStream` rejected that loopback connection with `remote run pipeline in local`, so the prepare-done notification path could not be established for the local dispatch operator.

That can break remote dispatch/shuffle coordination for queries such as secondary-index UPDATE plans using shuffle join with `serial_full(...)` and surface as failure or hang.

This PR keeps the existing `RemoteRun` local-execution guard intact, but allows the notification stream itself to connect to the local CN address. That is the smallest fix that restores the existing dispatch notification protocol without refactoring local/remote receiver handling.

Also included:
- a unit test that pins the real regression point in `pipelineClient.NewStream`

Approved by: @XuPeng-SH
Copilot AI review requested due to automatic review settings June 4, 2026 14:51
@LeftHandCold LeftHandCold requested a review from XuPeng-SH as a code owner June 4, 2026 14:51
@qodo-code-review
Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a self-connection failure in the compile/remote-run notification path by allowing pipeline RPC streams to connect to the local CN address (loopback), which is needed when a shuffle dispatch operator is scheduled on the same CN as the coordinator.

Changes:

  • Removed the “reject local backend” guard in pipelineClient.NewStream, allowing local CN notify streams to be established.
  • Added a unit test intended to pin the regression behavior around pipelineClient.NewStream accepting local backends.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
pkg/cnservice/cnclient/client.go Removes the local-backend rejection in pipelineClient.NewStream.
pkg/cnservice/cnclient/client_test.go Adds a regression unit test to ensure local backends are allowed for streams.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/cnservice/cnclient/client_test.go Outdated
Comment thread pkg/cnservice/cnclient/client_test.go Outdated
Copy link
Copy Markdown
Contributor

@XuPeng-SH XuPeng-SH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Allowing local notify streams here matches the existing higher-level guard in Scope.RemoteRun, fixes the self-connection regression in the prepare-done notify path, and keeps the change narrowly scoped. The added cnclient regression test is also a good pin for the underlying behavior.

@mergify mergify Bot added the queued label Jun 7, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Jun 7, 2026

Merge Queue Status

  • Entered queue2026-06-07 16:09 UTC · Rule: release-3.0
  • Checks skipped · PR is already up-to-date
  • Merged2026-06-07 16:09 UTC · at c76fb2338bd0eeaca0a9516660f660075939f15d · squash

This pull request spent 15 seconds in the queue, including 3 seconds running CI.

Required conditions to merge
  • #approved-reviews-by >= 1 [🛡 GitHub branch protection]
  • github-review-decision = APPROVED [🛡 GitHub branch protection]
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Utils CI (3.0) / Coverage
    • check-neutral = Matrixone Utils CI (3.0) / Coverage
    • check-skipped = Matrixone Utils CI (3.0) / Coverage
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone CI (3.0) / SCA Test on Ubuntu/x86
    • check-neutral = Matrixone CI (3.0) / SCA Test on Ubuntu/x86
    • check-skipped = Matrixone CI (3.0) / SCA Test on Ubuntu/x86
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone CI (3.0) / UT Test on Ubuntu/x86
    • check-neutral = Matrixone CI (3.0) / UT Test on Ubuntu/x86
    • check-skipped = Matrixone CI (3.0) / UT Test on Ubuntu/x86
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Compose CI (3.0) / multi cn e2e bvt test docker compose(Optimistic/PUSH)
    • check-neutral = Matrixone Compose CI (3.0) / multi cn e2e bvt test docker compose(Optimistic/PUSH)
    • check-skipped = Matrixone Compose CI (3.0) / multi cn e2e bvt test docker compose(Optimistic/PUSH)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Compose CI (3.0) / multi cn e2e bvt test docker compose(PESSIMISTIC)
    • check-neutral = Matrixone Compose CI (3.0) / multi cn e2e bvt test docker compose(PESSIMISTIC)
    • check-skipped = Matrixone Compose CI (3.0) / multi cn e2e bvt test docker compose(PESSIMISTIC)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI (3.0) / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
    • check-neutral = Matrixone Standlone CI (3.0) / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
    • check-skipped = Matrixone Standlone CI (3.0) / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI (3.0) / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
    • check-neutral = Matrixone Standlone CI (3.0) / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
    • check-skipped = Matrixone Standlone CI (3.0) / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI (3.0) / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
    • check-neutral = Matrixone Standlone CI (3.0) / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
    • check-skipped = Matrixone Standlone CI (3.0) / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Upgrade CI (3.0) / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-neutral = Matrixone Upgrade CI (3.0) / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-skipped = Matrixone Upgrade CI (3.0) / Compatibility Test With Target on Linux/x64(LAUNCH)

@mergify mergify Bot merged commit 1b4aa0a into matrixorigin:3.0-dev Jun 7, 2026
42 of 44 checks passed
@mergify mergify Bot removed the queued label Jun 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Something isn't working size/S Denotes a PR that changes [10,99] lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants