Skip to content

test: handle expected OCC conflict in concurrent Java writer test#19124

Open
yihua wants to merge 1 commit into
apache:masterfrom
yihua:test-occ-multiwriter-conflict
Open

test: handle expected OCC conflict in concurrent Java writer test#19124
yihua wants to merge 1 commit into
apache:masterfrom
yihua:test-occ-multiwriter-conflict

Conversation

@yihua

@yihua yihua commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Describe the issue this Pull Request addresses

TestMultipleHoodieJavaWriteClient.testOccWithMultipleWriters is flaky. It runs three concurrent Java write clients under OPTIMISTIC_CONCURRENCY_CONTROL and deliberately creates contention (20 records over keys repeated twice). When two concurrent commits touch overlapping file groups, OCC correctly aborts one of them by throwing HoodieWriteConflictException (from SimpleConcurrentFileWritesConflictResolutionStrategy). The test never handles that exception, so it propagates out of the parallel commit loop and fails the test whenever the race lands. The conflict is correct behavior, not a bug, so the flakiness lives in the test itself.

Summary and Changelog

This test was added under HUDI-5583 to validate that concurrent Java writers do not deadlock, not that every commit succeeds. This change makes it tolerate the conflict OCC is designed to throw:

  • Catch HoodieWriteConflictException around writer.commit(...). An aborted commit is an acceptable outcome under OCC contention, so it is logged and skipped rather than failing the test.
  • Move the writer return into a finally block so the writer pool is never starved when a commit is aborted (previously a thrown commit also leaked the writer).

The conflict-resolution path already releases the transaction lock in a finally block, so the writer is safe to reuse after an aborted commit.

Impact

Test-only change. No production code or public API is affected.

Risk Level

low

Verified locally on the Java client. Forcing maximum contention (collapsing all records onto a single file group) reproduces the failure on every run with the exact HoodieWriteConflictException (wrapping ConcurrentModificationException, "overlapping file groups") observed in CI. With this change the same workload completes: conflicts are caught and tolerated. The unmodified test (both COPY_ON_WRITE and MERGE_ON_READ) passes with no checkstyle violations.

Documentation Update

none

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

testOccWithMultipleWriters runs concurrent Java write clients under
optimistic concurrency control and intentionally contends on the same
file groups. OCC aborts one of two conflicting commits by throwing
HoodieWriteConflictException, which the test did not handle, so it failed
intermittently. Catch the expected conflict and always return the writer
to the pool so the test validates deadlock freedom without flaking.

@hudi-agent hudi-agent left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ 🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for working on this! This PR makes testOccWithMultipleWriters tolerate the HoodieWriteConflictException that OCC throws by design under contention, and moves the writer return into a finally block so an aborted commit no longer starves the writer pool. I traced the commit path and confirmed the conflict propagates as an unchecked exception (it's not re-wrapped by the catch (IOException) in commitStats), while the transaction lock and instant heartbeat are both released in finally (endStateChange + releaseResources), so the write client is safe to reuse after an aborted commit. The narrow catch is appropriate — it tolerates the expected conflict without masking other failures. No issues flagged from this automated pass — a Hudi committer or PMC member can take it from here for a final review.

cc @yihua

@github-actions github-actions Bot added the size:S PR with lines of changes in (10, 100] label Jun 30, 2026
@hudi-bot

Copy link
Copy Markdown
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S PR with lines of changes in (10, 100]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants