[Dataflow Streaming] Activate SourceState Finalizers before submitting workitem to harness threads#38921
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request improves the robustness of Dataflow streaming work scheduling by ensuring that finalizers associated with work items are processed immediately upon scheduling. By moving the finalization logic into the scheduling flow, the system guarantees that finalizers for failed items are executed even if the corresponding work item is subsequently cancelled, preventing potential resource leaks or missed cleanup tasks. Highlights
New Features🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
|
R: @scwhittle |
There was a problem hiding this comment.
Code Review
This pull request refactors the handling of applied finalize IDs in the streaming worker. Instead of queueing them separately via queueAppliedFinalizeIds, they are now passed directly to StreamingWorkScheduler.scheduleWork and finalized alongside the work item's source state finalize IDs before work is scheduled. A review comment suggests adding empty checks before calling commitFinalizer.finalizeCommits to avoid unnecessary overhead and potential lock contention on the critical path when there are no finalize IDs.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment |
There was a problem hiding this comment.
Code Review
This pull request moves the execution of pending OnCommit callbacks (commitFinalizer.finalizeCommits) from the processWork method to the scheduleWork method in StreamingWorkScheduler.java. The reviewer raised a high-severity concern that executing these finalizers synchronously on the scheduling/dispatching thread can block the main dispatching loop, potentially leading to severe performance bottlenecks or thread starvation if the finalizers perform blocking I/O or run arbitrary user code. It is recommended to offload this execution asynchronously or handle it within the work item's lifecycle.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Adding a filter for failed items in #38920. Making this change so finalizers from failed items get scheduled even if the work item gets cancelled.