perf(common): replace BitSet with a fixed word array in the ported bloom filter#19140
Conversation
hudi-agent
left a comment
There was a problem hiding this comment.
Thanks for working on this! This PR swaps the BitSet backing InternalBloomFilter for a fixed long[] word array to cut per-probe and per-serialization overhead on the bloom filter hot paths, while keeping the serialized byte layout unchanged. I traced the byte↔word mapping in write/readFields, the bounds of add/membershipTest, and the unused-bit clearing across the boundary vector sizes, and the format-compatibility invariant holds. The differential tests against a BitSet oracle and the golden-string checks give good coverage. No issues flagged from this automated pass — a Hudi committer or PMC member can take it from here for a final review.
cc @yihua
hudi-agent
left a comment
There was a problem hiding this comment.
Thanks for working on this! This PR swaps the BitSet in InternalBloomFilter for a fixed-size long[] word array to eliminate per-probe BitSet bookkeeping and bit-by-bit serialization on the bloom filter hot paths (HFile writes, parquet footer reads, MDT bloom partition). I traced the serialized byte layout (it's the little-endian word view, so write/readFields stay byte-for-byte identical), the h[i] ∈ [0, vectorSize) hash bound that keeps add/membershipTest in-bounds and preserves the unused-bits-zero invariant, the clearUnusedBits handling on the readFields/not paths, and the word-wise and/or/xor ops — the serde format looks preserved and the oracle/golden-string tests guard it well. No issues flagged from this automated pass — a Hudi committer or PMC member can take it from here for a final review.
cc @yihua
Describe the issue this Pull Request addresses
Fixes #19139 (part of #14367).
InternalBloomFilter, the port of Hadoop'sBloomFilterwhose serialized bytes must stay compatible in both directions, kept its bit vector in ajava.util.BitSetand converted it to and from bytes one bit at a time. Everyaddpaid per-probeBitSetbookkeeping (expandTo/ensureCapacity/wordsInUse), and every serialization or deserialization looped bit-by-bit overvectorSize(millions of iterations per filter). These paths run for every record key added during HFile base file writing in metadata table compaction, every parquet footer read by the bloom index, and every metadata table bloom filter partition record. CPU profiling of metadata table compaction attributed about 10% of executor CPU toBitSet.ensureCapacityunderBloomFilter.add.Summary and Changelog
InternalBloomFilter: store the bit vector in a fixed-sizelong[]word array. Bitilives atwords[i >> 6]under mask1L << (i & 63); the serialized layout (bitiat bytei >> 3under mask1 << (i & 7)) is the little-endian byte view of the array, sowrite/readFieldstranslate by byte position alone and the serialized bytes are unchanged.add/membershipTestbecome one mask operation per probe with no growth or bookkeeping logic;and/or/xor/notbecome word-wise loops;write/readFieldspack and unpack one byte per iteration instead of one bit.vectorSizeare kept zero (clearUnusedBits), preserving the previous reader's behavior of ignoring unused trailing bits in the last serialized byte.SimpleBloomFilterandHoodieDynamicBoundedBloomFiltercomposeInternalBloomFilterand inherit the change without modification.TestInternalBloomFilter:java.util.BitSetoracle replicating the previous layout (serialized bits, membership on present and absent keys,and/or/xor/not) across word and byte boundary sizes (63/64/65/127/128/1000/43133 bits, up to 30 hash functions);write/readFieldsround trips at boundary sizes;serializeToStringfixtures forSIMPLEandDYNAMIC_V0filters captured from the previous implementation, asserted byte-identical and re-deserializable throughBloomFilterFactory.fromString.InternalBloomFilterBenchmark: manual microbenchmark for adds, membership tests, and serde. The class name does not match the surefire patterns so it never runs in CI; run it explicitly withmvn test -pl hudi-common -Dtest=InternalBloomFilterBenchmark -Dsurefire.failIfNoSpecifiedTests=false.Impact
Serialized bloom filter bytes are unchanged in both directions: golden fixtures captured before the change pass unchanged, and every false-positive count in the benchmark below is identical before and after (identical membership semantics).
InternalBloomFilterBenchmarkresults on JDK 11 / Apple Silicon, medians of 3 measured rounds after warmup, this branch vs its base commit on master:Adds improve most where the bit vector is large and the probe count is high (fpp 1e-9 means about 30 probes per key). The third scenario mirrors the current metadata table HFile writer defaults; there the bounded dynamic filter saturates (99.6% false positives on absent keys in the benchmark), so adds are dominated by hashing into tiny cache-resident rows. Right-sizing that filter is the follow-up tracked in #19139.
Risk Level
low
Byte-level equivalence is pinned by the oracle differential tests and the pre-change golden fixtures.
TestInternalBloomFilter,TestInternalDynamicBloomFilter, and the HFile suites that write and read blooms through real files (TestHoodieHFileReaderWriterin hudi-hadoop-common plus the hudi-io and hudi-common HFile tests) all pass.Documentation Update
none
Contributor's checklist