Add per window throughput by gareth-ellis · Pull Request #2126 · elastic/rally

gareth-ellis · 2026-05-20T20:05:08Z

Whilst working on #2087 I was reminded that our throughput calculation is questionable - where it is an average of all requests since the start - this means that the longer into a test we get, the less impact a change in real throughput has on the reported throughput.

This adds "per window" throughput as an option. I propose at a later date, we switch the default of this so that the per window is the default - but this can be discussed in the PR.

Copilot

Pull request overview

This PR adds an opt-in “per window” (per-bucket) throughput calculation mode so reported throughput reflects recent changes (e.g., spikes/drops) instead of a cumulative average since task start.

Changes:

Add a new reporting config option metrics.request.throughput.window to toggle windowed throughput.
Plumb the new option through Driver -> SamplePostprocessor -> ThroughputCalculator.
Add a unit test validating windowed vs cumulative throughput behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
`tests/driver/driver_test.py`	Adds a new test asserting windowed throughput reports bucket-local spikes correctly.
`esrally/types.py`	Whitelists the new config key so it can be set via Rally configuration.
`esrally/driver/driver.py`	Introduces the windowed throughput option and implements per-bucket throughput calculation logic.

Comments suppressed due to low confidence (1)

esrally/driver/driver.py:1687

The windowed throughput calculation is currently ineffective because finish_bucket() sets prev_interval = self.interval (and prev_total_count = self.total_count) before the windowed rate is computed. Immediately after finish_bucket(), self.interval - self.prev_interval becomes 0, so windowed_throughput falls back to cumulative throughput and will not reflect per-bucket spikes.

Consider computing the windowed delta using the previous bucket boundary values (e.g., store last_bucket_total_count/last_bucket_interval and update them after computing the windowed rate, or compute and return the windowed rate inside finish_bucket() before updating the snapshot).

        def finish_bucket(self, new_total):
            self.prev_total_count = self.total_count
            self.prev_interval = self.interval
            self.unprocessed = []
            self.total_count = new_total
            self.has_samples_in_sample_type = True
            self.bucket = int(self.interval) + self.bucket_interval

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Agent-Logs-Url: https://github.com/elastic/rally/sessions/e662b65c-f00b-4944-9e57-7dbbd48ca06b Co-authored-by: gareth-ellis <14981026+gareth-ellis@users.noreply.github.com>

Agent-Logs-Url: https://github.com/elastic/rally/sessions/3c808134-d09e-4bd5-b77d-1f73c2100dac Co-authored-by: gareth-ellis <14981026+gareth-ellis@users.noreply.github.com>

Agent-Logs-Url: https://github.com/elastic/rally/sessions/8eb678d2-662e-4e40-9328-52c387fe5dba Co-authored-by: gareth-ellis <14981026+gareth-ellis@users.noreply.github.com>

gbanasiak

As expected, this is much less stable than current running average. Here are 4 runs of indexing in geonames on my local computer - 2 running average (blue/green) vs. 2 window throughput (red/pink).

esrally race --track=geonames --challenge=append-no-conflicts-index-only --distribution-version=9.2.7 --config=...

Around 10s into the indexing task we can see throughput spike on both windowed runs. I'm guessing this is an artifact of processing early samples which arrived late due to initial ES queueing (?). Can we consider such read-outs as "real"? Say we initiate a bulk request and it takes 10s to complete. If bucket interval (now 1s) is much lower than maximum bulk latency we might always see those types of spikes, right? I'm wondering if post-processing is a better strategy.

We probably want to expose bucket interval to experiment.

At the end of the run we can see how throughput decreases, most likely due to some of the clients finishing earlier.

gbanasiak · 2026-06-16T16:53:21Z

+        @property
+        def windowed_throughput(self):
+            """Throughput based only on ops and time elapsed since the previous bucket boundary."""
+            return self._windowed_rate if self._windowed_rate is not None else self.throughput
+


Can we avoid fallback to running average (self.throughput)? I think it would be better to return None and handle it at call sites, e.g. by skipping generation of throughput sample.

gbanasiak · 2026-06-17T08:26:59Z

Here is a comparison of post-processed throughput calculated from service time samples (windowed-start-time) with proposed throughput (windowed-proposed) from one of the races shown earlier. I've also added service time samples. The time buckets are 1s.

It is not clear to me why there's a throughput dip at the beginning of the race with no corresponding dip in the throughput calculated from raw service time samples. If there's a slowdown and clients are waiting for bulk responses new samples should not be created.

We may need to trace throughput calculation closer to understand this.

Add per window throughput

954ea4c

gareth-ellis requested a review from a team as a code owner May 20, 2026 20:05

Copilot AI review requested due to automatic review settings May 20, 2026 20:05

Copilot started reviewing on behalf of gareth-ellis May 20, 2026 20:05 View session

Copilot AI reviewed May 20, 2026

View reviewed changes

Comment thread esrally/driver/driver.py

Comment thread esrally/types.py

Update method

bbd8730

gareth-ellis requested a review from Copilot May 20, 2026 20:15

Copilot started reviewing on behalf of gareth-ellis May 20, 2026 20:15 View session

Copilot AI reviewed May 20, 2026

View reviewed changes

Comment thread esrally/driver/driver.py

Comment thread esrally/driver/driver.py Outdated

Copilot started work on behalf of gareth-ellis May 20, 2026 20:38 View session

Copilot started work on behalf of gareth-ellis May 20, 2026 20:40 View session

Remove unused _prev_total_count from TaskStats

8db1cae

Agent-Logs-Url: https://github.com/elastic/rally/sessions/e662b65c-f00b-4944-9e57-7dbbd48ca06b Co-authored-by: gareth-ellis <14981026+gareth-ellis@users.noreply.github.com>

Copilot finished work on behalf of gareth-ellis May 20, 2026 20:43

Fix throughput window config parsing as boolean

6235fee

Agent-Logs-Url: https://github.com/elastic/rally/sessions/3c808134-d09e-4bd5-b77d-1f73c2100dac Co-authored-by: gareth-ellis <14981026+gareth-ellis@users.noreply.github.com>

Copilot finished work on behalf of gareth-ellis May 20, 2026 20:45

Copilot started work on behalf of gareth-ellis May 25, 2026 09:32 View session

docs: document throughput window reporting setting

89e9573

Agent-Logs-Url: https://github.com/elastic/rally/sessions/8eb678d2-662e-4e40-9328-52c387fe5dba Co-authored-by: gareth-ellis <14981026+gareth-ellis@users.noreply.github.com>

Copilot finished work on behalf of gareth-ellis May 25, 2026 09:36

gbanasiak reviewed Jun 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add per window throughput#2126

Add per window throughput#2126
gareth-ellis wants to merge 5 commits into
masterfrom
per-window-throughput

gareth-ellis commented May 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

gbanasiak left a comment

Uh oh!

gbanasiak Jun 16, 2026

Uh oh!

gbanasiak commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

gareth-ellis commented May 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

gbanasiak left a comment

Choose a reason for hiding this comment

Uh oh!

gbanasiak Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

gbanasiak commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants