fix: avoid closing running flush object batches(cp to 4.0-dev)#24843
fix: avoid closing running flush object batches(cp to 4.0-dev)#24843aptend wants to merge 2 commits into
Conversation
When a flush object subtask fails during object sync, the parent task can abort and release sibling subtasks. Previously, that release path could close a batch while an IO worker still used it, leaving nil vectors and causing `ToCNBatch` to panic. This PR moves batch ownership to the IO task when it starts, lets the parent release only batches that have not been picked up by an IO task, and releases scanned batch data immediately after `WriteBatch` succeeds before object `Sync`. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
There was a problem hiding this comment.
Pull request overview
This PR backports a fix to prevent containers.ToCNBatch panics caused by a flush-object subtask’s batch being closed while an IO worker is still using it. It does this by transferring batch ownership to the IO task at execution start and ensuring the parent only releases batches that have not been picked up.
Changes:
- Ensure scheduled flush-object subtasks are registered in the
subtasksslice before scheduling, so error paths can reliably release them. - Move batch ownership from the parent task to the IO task at the beginning of
flushObjTask.Execute, preventing concurrent closes from the parent release path. - Release (close) scanned batch data immediately after
WriteBatchsucceeds (beforeSync) to reduce memory retention while syncing.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| pkg/vm/engine/tae/tables/jobs/flushTableTail.go | Makes subtask tracking/release more reliable and removes the now-obsolete done bookkeeping. |
| pkg/vm/engine/tae/tables/jobs/flushobj.go | Transfers batch ownership safely to the IO task and adjusts release semantics to avoid closing in-use batches. |
What type of PR is this?
Which issue(s) this PR fixes:
issue #24319
What this PR does / why we need it:
When a flush object subtask fails during object sync, the parent task can abort and release sibling subtasks. Previously, that release path could close a batch while an IO worker still used it, leaving nil vectors and causing
ToCNBatchto panic.This PR moves batch ownership to the IO task when it starts, lets the parent release only batches that have not been picked up by an IO task, and releases scanned batch data immediately after
WriteBatchsucceeds before objectSync.Tests:
go test ./pkg/vm/engine/tae/tables/jobsgo test ./pkg/vm/engine/tae/containers