Skip to content

Use the doc-values skip index to skip per-doc value lookups in LongRangeFacetCutter#16268

Open
slow-J wants to merge 5 commits into
apache:mainfrom
slow-J:lucene-16249-skipper-range-facets
Open

Use the doc-values skip index to skip per-doc value lookups in LongRangeFacetCutter#16268
slow-J wants to merge 5 commits into
apache:mainfrom
slow-J:lucene-16249-skipper-range-facets

Conversation

@slow-J

@slow-J slow-J commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Resolves #16249

Implementation heavily inspired by HistogramCollector.java.

Range faceting (in the sandbox module -LongRangeFacetCutter) currently reads the doc-values value for every matching document and binary-searches it into an elementary interval. When the faceted field is single-valued, we can use a doc-values skip index. For a dense skip block whose min and max values fall into the same elementary interval, every document in that block maps to that interval, allowing us to skip the per-doc value lookup and binary search.

Limitation - applies to single-valued, long fields only.

Benchmark (luceneutil)

I used my branch of https://github.com/slow-J/luceneutil/tree/github-16249-range-facet-bench which cherry picked 2 of @epotyom 's commits (mainly mikemccand/luceneutil#582 which adds range-facet support)

Setup:
runlocal.py, wikimediumall (33.3M docs), index-sorted by lastMod_skipper with
addDVSkippers=true. baseline = main, candidate = this change, both DURING_COLLECTION, so
the only difference is this optimization. 30 JVM iterations.

Command: python3 -u src/python/localrun.py -s rangeFacetsWikimediumAll -b lucene_baseline -c lucene_candidate -iterations 30 -warmups 20 2>&1 | tee "$BASE/run-timing7.txt"

Edit: new benchmark results after the changes for Egors first 2 comments.
Edit2: new benchmark results after unwrapping removed

QPS

Task QPS baseline StdDev QPS modified StdDev Pct diff p-value
BrowseLastModOvlpRangeFacets 1.26 (7.7%) 2.72 (10.6%) 115.5% (90% - 145%) 0.000
BrowseLastModRangeFacets 2.21 (6.0%) 3.31 (8.8%) 50.0% (33% - 68%) 0.000
MedTermLastModOvlpRangeFacets 3.82 (13.5%) 5.48 (5.7%) 43.5% (21% - 72%) 0.000
MedTermLastModRangeFacets 4.15 (13.6%) 5.26 (7.9%) 26.5% (4% - 55%) 0.000
BrowseIDOvlpRangeFacets 1.21 (6.6%) 1.10 (6.7%) -9.6% (-21% - 4%) 0.000
BrowseIDRangeFacets 2.33 (8.6%) 2.57 (5.1%) 10.1% (-3% - 26%) 0.000
MedTermIDOvlpRangeFacets 3.79 (13.5%) 4.61 (11.1%) 21.6% (-2% - 53%) 0.000
MedTermIDRangeFacets 5.98 (4.6%) 5.92 (2.7%) -0.9% (-7% - 6%) 0.340

Latency (ms) — aggregated across all iterations

Task P50 B P50 C Diff P90 B P90 C Diff P99 B P99 C Diff P999 B P999 C Diff P100 B P100 C Diff
BrowseLastModOvlpRangeFacets 844.184 386.006 -54.3% 1437.289 581.094 -59.6% 7523.983 828.460 -89.0% 9510.480 868.764 -90.9% 9555.393 888.500 -90.7%
BrowseLastModRangeFacets 474.762 319.836 -32.6% 854.574 546.789 -36.0% 4412.829 781.421 -82.3% 7775.105 862.760 -88.9% 7910.258 893.449 -88.7%
MedTermLastModOvlpRangeFacets 286.226 187.654 -34.4% 552.668 436.448 -21.0% 771.820 599.881 -22.3% 1327.279 705.213 -46.9% 1445.804 707.766 -51.0%
MedTermLastModRangeFacets 260.932 200.115 -23.3% 652.004 510.872 -21.6% 847.848 635.331 -25.1% 2966.134 743.950 -74.9% 3060.317 745.647 -75.6%
BrowseIDOvlpRangeFacets 860.895 976.209 +13.4% 1419.693 1279.444 -9.9% 8271.185 1476.704 -82.1% 9919.502 1531.237 -84.6% 9928.280 1536.195 -84.5%
BrowseIDRangeFacets 461.967 404.593 -12.4% 799.144 625.845 -21.7% 5972.427 860.420 -85.6% 8963.973 930.259 -89.6% 9483.903 942.619 -90.1%
MedTermIDOvlpRangeFacets 294.831 235.198 -20.2% 676.861 539.088 -20.4% 897.009 671.736 -25.1% 1835.175 742.857 -59.5% 2055.182 744.089 -63.8%
MedTermIDRangeFacets 169.089 170.565 +0.9% 495.786 401.676 -19.0% 697.206 591.299 -15.2% 1026.169 690.797 -32.7% 1647.263 695.272 -57.8%

@slow-J slow-J force-pushed the lucene-16249-skipper-range-facets branch from 03d7d2a to 066c419 Compare June 17, 2026 16:03
@github-actions github-actions Bot added this to the 10.5.0 milestone Jun 17, 2026
@slow-J

slow-J commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

I reran benchmarks, this time correctly using localrun, and updated the results in #16268 (comment)

@slow-J slow-J force-pushed the lucene-16249-skipper-range-facets branch from 2e7144b to 0c72d5f Compare June 19, 2026 14:45

@epotyom epotyom left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice change! One suggestion below

@slow-J slow-J force-pushed the lucene-16249-skipper-range-facets branch from 1065433 to 7db2833 Compare June 23, 2026 10:39
@github-actions github-actions Bot modified the milestones: 10.5.0, 10.6.0 Jun 23, 2026
@slow-J slow-J requested a review from epotyom June 23, 2026 11:29
@slow-J slow-J force-pushed the lucene-16249-skipper-range-facets branch from 7db2833 to 88fe293 Compare June 29, 2026 11:40
}

/** Single-valued {@link LongValues} for {@link #skipField} in this segment. */
final LongValues skipFieldValues(LeafReaderContext context) throws IOException {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method a part of the single value unwrapping logic that we want to remove for now?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, this is part of the core skipper path.

…range-facets

# Conflicts:
#	lucene/CHANGES.txt
@slow-J slow-J marked this pull request as ready for review June 30, 2026 09:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Can we use DocValuesSkipper for range facets?

2 participants