GH-50247: Reuse abstraction for null partitions in sorting functions by taepper · Pull Request #50248 · apache/arrow

taepper · 2026-06-25T01:48:42Z

Rationale for this change

@pitrou mentioned this as a follow-up in #46926

What changes are included in this PR?

Refactoring sorting methods to reuse the helper methods avoid maintaining two abstractions for null partitions. The new abstraction was very seamless to implement in most cases, but a few spots required some care

In particular, these functions were severly simlpified by the new abstraction:

MarkDuplicates: duplicate nulls and nans were detected by checking every single row for Null one additional time, after we already had (and discarded) the nullness information
GenericMergeImpl: merging of null-ranges involved repartitioning null and nan values in every merge invocation. Now, we track this distinction and do not need any merge function for null and nan blocks

Are these changes tested?

Yes, the compute test suite passes as before

Are there any user-facing changes?

No.

GitHub Issue: [C++] Reuse abstraction for null partitions in sorting functions #50247

github-actions · 2026-06-25T01:49:07Z

⚠️ GitHub issue #50247 has been automatically assigned in GitHub to PR creator.

pitrou

Thank you! This is excellent, and the simplification is very welcome. Just a couple minor comments below.

pitrou · 2026-06-25T07:59:55Z

-  IndexType* non_nulls_end;
-  IndexType* nulls_begin;
-  IndexType* nulls_end;
+struct GenericPartitionResultByNullLikeness {


Let's just keep the old name? Or name it GenericNullLikePartition which is a bit shorter?

Yes, that is definitely sensible, I like removing the Result from the name as it is used to store that Partition in various places. NullPartitionResult sort of implied it is only used as a single-use struct which is only returned by a NullPartition function.

Having NullPartition, NanPartition (as helpers), and NullLikePartition for the total struct sounds great!

pitrou · 2026-06-25T08:03:05Z

+                           null_range.size()}};
+  }
+
+  static GenericPartitionResultByNullLikeness fromCounts(std::span<IndexType> indices,


Style nit: FromCounts

pitrou · 2026-06-25T08:16:27Z

+                                     sorted[i].null_range.size()),
+                batch.num_rows());
      begin_offset = end_offset;
      // XXX this is an upper bound on the true null count


Is this XXX still true? Presumably it implied that null_count could also account for nan values, but that is not the case anymore?

That seems right. I also noticed that this null_count was able to be removed entirely (no longer used in Merge{,AtStart,AtEnd})

pitrou · 2026-06-25T08:17:00Z

+      DCHECK_EQ(static_cast<int64_t>(sorted[i].non_null_like_range.size() +
+                                     sorted[i].null_range.size()),
+                batch.num_rows());


Shouldn't we also add nan_range.size() here?

Good catch!

pitrou · 2026-06-25T08:35:09Z

Hmm, there are some regressions in the test suite (see CI runs).

Also this runtime assertion on Windows CI might give a clue:

27/98 Test  #49: arrow-compute-vector-sort-test ...............Exit code 0xc0000409
***Exception:   0.43 sec
[==========] Running 284 tests from 130 test suites.
[----------] Global test environment set-up.
[----------] 2 tests from TestNthToIndicesForReal/0, where TypeParam = arrow::FloatType
[ RUN      ] TestNthToIndicesForReal/0.NthToIndicesDoesNotProvideDefaultOptions
[       OK ] TestNthToIndicesForReal/0.NthToIndicesDoesNotProvideDefaultOptions (1 ms)
[ RUN      ] TestNthToIndicesForReal/0.Real
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.44.35207\include\span(160) : Assertion failed: cannot compare incompatible span iterators

taepper added 12 commits June 24, 2026 16:27

replace PartitionNulls by PartitionNullsAndNans

a4f9728

more helpers and migrations

5fe6eb8

spanify one

6225331

spanify another

650d857

spanify working

5a62018

more spans

36f348f

another span

7c77d23

replace Partition usage

c0fd6ec

Change interface to PartitionResultByNullLikeness

245c10f

make comments not clip into new line

49ac605

bugfix

a6d2942

consolidate more functions

54c3aca

taepper requested a review from pitrou as a code owner June 25, 2026 01:48

github-actions Bot added Component: C++ awaiting review Awaiting review labels Jun 25, 2026

pitrou reviewed Jun 25, 2026

View reviewed changes

github-actions Bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jun 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GH-50247: Reuse abstraction for null partitions in sorting functions#50248

GH-50247: Reuse abstraction for null partitions in sorting functions#50248
taepper wants to merge 12 commits into
apache:mainfrom
taepper:better-null-partitions

taepper commented Jun 25, 2026 •

edited by pitrou

Loading

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

pitrou left a comment

Uh oh!

pitrou Jun 25, 2026

Uh oh!

taepper Jun 25, 2026

Uh oh!

pitrou Jun 25, 2026

Uh oh!

pitrou Jun 25, 2026

Uh oh!

taepper Jun 25, 2026 •

edited

Loading

Uh oh!

pitrou Jun 25, 2026

Uh oh!

taepper Jun 25, 2026

Uh oh!

pitrou commented Jun 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

taepper commented Jun 25, 2026 • edited by pitrou Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

pitrou left a comment

Choose a reason for hiding this comment

Uh oh!

pitrou Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

taepper Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

pitrou Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

pitrou Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

taepper Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pitrou Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

taepper Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

pitrou commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

taepper commented Jun 25, 2026 •

edited by pitrou

Loading

taepper Jun 25, 2026 •

edited

Loading

pitrou commented Jun 25, 2026 •

edited

Loading