feat(writer): Switch default to native log format for table version >…#19118
Draft
cshuo wants to merge 13 commits into
Draft
feat(writer): Switch default to native log format for table version >…#19118cshuo wants to merge 13 commits into
cshuo wants to merge 13 commits into
Conversation
47076a9 to
14d4c60
Compare
14d4c60 to
e0e79f1
Compare
danny0405
reviewed
Jul 1, 2026
| * | ||
| * @param writeConfig the writer configuration. | ||
| */ | ||
| public static boolean shouldWriteNativeLogFormat(HoodieWriteConfig writeConfig) { |
Contributor
There was a problem hiding this comment.
rename to shouldWriteNativeLogs
danny0405
reviewed
Jul 1, 2026
| return getBlockType().isDataOrDeleteBlock(); | ||
| } | ||
|
|
||
| protected HoodieSchema getSchemaFromHeader() { |
Contributor
There was a problem hiding this comment.
we have HoodieAvroSchemaCache now, can use it instead.
danny0405
reviewed
Jul 1, 2026
| public <T> List<BufferedRecord<T>> getRecordsToDelete(HoodieReaderContext<T> readerContext) { | ||
| return Arrays.stream(getRecordsToDelete()) | ||
| .map(deleteRecord -> BufferedRecords.fromDeleteRecord(deleteRecord, recordContext)) | ||
| .map(deleteRecord -> BufferedRecords.fromDeleteRecord(deleteRecord, readerContext.getRecordContext())) |
danny0405
reviewed
Jul 1, 2026
| throw new HoodieNotSupportedException("Native delete log files do not support the legacy DeleteRecord[] API. " | ||
| + "Use getRecordsToDelete(RecordContext) instead. Log file: " + logFile); | ||
| if (recordsToDelete == null) { | ||
| recordsToDelete = readRecordsToDelete(); |
Contributor
There was a problem hiding this comment.
why read as delete record from native delete logs
367d6af to
52b0284
Compare
Collaborator
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…= 10
Describe the issue this Pull Request addresses
Table version 10 should use the native v2 log format by default for Parquet MOR writes, but the writer selection logic was still tied to the LSM-tree storage layout. That made native log writing depend on storage layout instead of the effective write version and base file format.
This PR switches the write-path decision to the effective table write version, adds the missing version-10 write-config support, and hardens native log read/delete handling so Spark, Flink, Hive realtime read paths can consistently work with native data and delete logs.
Summary and Changelog
HoodieTableVersion.TENas a supported write version inHoodieWriteConfig.CommonClientUtils.shouldWriteNativeLogFormat, which enables native v2 log writes for Parquet tables when the effective write version is >= 10.Impact
CommonClientUtils.shouldWriteNativeLogFormat, removing duplicated storage-layout checks from engine-specific writer paths.Risk Level
medium. This changes the default log format for table-version-10 Parquet MOR writes and touches shared Spark, Flink, compaction, and common log reader code. The commit mitigates the risk with new Spark, Flink, Hive realtime, common log-format, and utility tests covering native data/delete log write and read behavior.
Documentation Update
Document that Parquet MOR writes targeting table version 10 use the native v2 log format by default, while lower write versions and non-Parquet base file formats continue using the legacy inline log format.
Contributor's checklist