chore(glue-sync): Ignore EntityNotFoundException when dropping Glue partitions#19142
Conversation
hudi-agent
left a comment
There was a problem hiding this comment.
Thanks for the contribution! This PR makes AWSGlueCatalogSyncClient#dropPartitionsInternal tolerant of EntityNotFoundException errors returned by Glue's BatchDeletePartition, treating already-gone partitions as a no-op while still failing on other errors. I verified the error-code string against the AWS Glue API docs (per-partition ErrorDetail.ErrorCode is "EntityNotFoundException"), traced the partitioning/throw logic and the idempotency semantics, and confirmed batching and imports are unaffected. No correctness issues found. A few style/readability suggestions in the inline comments. Please take a look, and this should be ready for a Hudi committer or PMC member to take it from here. A couple of small naming and readability suggestions below.
cc @yihua
6c4c12a to
52b0e23
Compare
hudi-agent
left a comment
There was a problem hiding this comment.
Thanks for working on this! This PR makes Glue partition drops idempotent by filtering out per-partition EntityNotFoundException errors from BatchDeletePartition (logging them at INFO) while still raising HoodieGlueSyncException for other errors. The change mirrors the existing AlreadyExistsException handling in addPartitionsToTableInternal, the partitioningBy split is null-safe, and the round-1 nits (constant extraction and ignorableErrors naming) look addressed. No correctness issues found. A few style/readability suggestions in the inline comments. Please take a look, and this should be ready for a Hudi committer or PMC member to take it from here. One small naming suggestion on the error-code constant; the rest of the change is clean and well-commented.
cc @yihua
hudi-agent
left a comment
There was a problem hiding this comment.
Thanks for the contribution! This PR makes Glue partition drops tolerant of EntityNotFoundException so idempotent cleanup no longer fails the entire sync when a partition is already gone, while still surfacing other errors such as permission and throttling failures. The partitioning logic is null-safe on errorCode(), treats unexpected error shapes as real errors (fail-closed), and the semantic interpretation of a per-partition EntityNotFoundException is correct. No issues flagged from this automated pass - a Hudi committer or PMC member can take it from here for a final review.
cc @yihua
Describe the issue this Pull Request addresses
AWSGlueCatalogSyncClient#dropPartitionsInternal previously failed the whole sync whenever BatchDeletePartition returned any error, including EntityNotFoundException for partitions that no longer exist. Dropping a non-existent partition is a no-op for this idempotent cleanup and is easily triggered by sync retries, concurrent writers, or externally-removed partitions.
Summary and Changelog
AWSGlueCatalogSyncClient#dropPartitionsInternalpreviously failed the entire sync wheneverBatchDeletePartitionreturned any error in the response, includingEntityNotFoundExceptionfor partitions that no longer exist.Dropping a non-existent partition is a no-op for this idempotent cleanup and is easily triggered in practice by sync retries, concurrent writers dropping the same partitions, or partitions removed externally.
Changes:
EntityNotFoundExceptionerrors returned byBatchDeletePartition;HoodieGlueSyncException.testDropPartitions_IgnoresEntityNotFoundandtestDropPartitions_MixedErrorsStillThrow.Impact
none
Risk Level
none
Documentation Update
none
Contributor's checklist