[MINOR] Add MDT-partition-shape parameterized HBase-read compat test#19080
[MINOR] Add MDT-partition-shape parameterized HBase-read compat test#19080shangxinli wants to merge 2 commits into
Conversation
The existing TestHFileCompatibility#testHbaseReaderSucceedsWhenKeyValueVersionIsSetTo1 covers the single-data-block happy path with 5 records, which does not stress the load-on-open data structures (data-block index, meta-block index) that HFileWriterImpl's trailer points at. Add a parameterized cross-version compat test that writes 5,000 records across 5 key/value shapes representative of the Metadata Table (MDT) partitions (FILES, COLUMN_STATS, RECORD_INDEX, SECONDARY_INDEX, BLOOM_FILTERS), reopens each file with HBase 2.4.x HFile.createReader, and asserts: - trailer parses (no CorruptHFileException), - getEntries() matches the count written, - a full forward scan returns every key in order. A future trailer-layout drift in HFileWriterImpl (field reorder, missed field, width change) would fail at least one assertion across these shapes, before the change reaches MDT files in production. No production code changes; test-only. Signed-off-by: Xinli Shang <shangxinli@apache.org>
Three issues found by second-pass review on the previous commit: 1. BLOOM_FILTERS shape had keyLen=24 with a prefix of "BLOOM_FILTERS::" (15 bytes), leaving only 9 bytes for the 10-digit idx after the prefix. The earlier truncation path silently dropped the last digit, producing 10 identical keys per group of consecutive indices. Bump to keyLen=32 and harden the generator: it now throws IllegalArgumentException when keyLen is too small for prefix + idx instead of truncating. 2. assertNotNull(scanner.seekTo()) was a no-op guard; seekTo() returns a primitive (boolean in HBase 2.4.13), autoboxed to a non-null wrapper in all cases. Replace with assertTrue(scanner.seekTo(), ...). 3. The redundant assertDoesNotThrow(() -> HFile.createReader(...)) block opened a Reader that was never closed and immediately reopened one in the try-with-resources below. Remove it; the try-with-resources already proves trailer parse success. Signed-off-by: Xinli Shang <shangxinli@apache.org>
|
FYI — cherry-picked internally at Uber to gate our 0.14 → 1.2 cutover ahead of the OSS review cycle: uber-code/data-hoodie_oss#273. Any feedback here will be back-ported. |
yihua
left a comment
There was a problem hiding this comment.
@shangxinli Thanks for putting this up! I enhanced the HFile writer and tests in #19071 and #19083. Could you check if that have already covered your changes?
|
Thanks @yihua — #19071 and #19083 cover the generic HBase-reader → native multi-block path (point lookup, The piece still not covered there: those tests exercise a single generic multi-block file. MDT writes files with five distinct key/value sizing profiles ( I've rescoped this PR to just that parameterized MDT-shape sweep, stacked on #19083. Happy to fold it into #19083 directly if you'd rather, just say the word and I'll close this and send a patch there. |
Describe the issue this Pull Request addresses
Stacked on top of #19071 and #19083, which already cover the generic HBase 2.4.x reader → native multi-block HFile path (point lookup, full scan,
seekBefore, byte-parity, NONE + GZIP). Please land those first.What is still uncovered after #19071 / #19083: those tests exercise one generic multi-block file shape. The Metadata Table writes files with five distinct key/value sizing profiles (small key + medium value, record-key-shaped, long composite key, small key + 64 KB value, etc.). A future change to
HFileWriterImplthat only perturbs the trailer index sizing for, say, theRECORD_INDEXorBLOOM_FILTERSshape would still pass the generic tests — the data-block / meta-block index in the trailer is laid out per-file based on actual key and value sizes.Summary and Changelog
Test-only. Adds one parameterized test,
TestHudiHFileMdtHbaseReadCompatibility, that — for each of the five MDT-realistic shapes — writes ~5,000 records (≥ 5 data blocks at the default 64 KB block size) with the native writer, reopens with HBase 2.4.xHFile.createReader, and asserts:CorruptHFileException),getEntries()matches the count written,Shapes covered:
FILESCOLUMN_STATSRECORD_INDEXSECONDARY_INDEXBLOOM_FILTERSNo production code changes. New test file:
hudi-io/src/test/java/org/apache/hudi/io/hfile/TestHudiHFileMdtHbaseReadCompatibility.java.Impact
None on runtime behavior. CI runs ~2 seconds longer in
hudi-io.Risk Level
none — test-only.
Documentation Update
none
Contributor's checklist