Skip to content

Add per window throughput#2126

Open
gareth-ellis wants to merge 5 commits into
masterfrom
per-window-throughput
Open

Add per window throughput#2126
gareth-ellis wants to merge 5 commits into
masterfrom
per-window-throughput

Conversation

@gareth-ellis

Copy link
Copy Markdown
Member

Whilst working on #2087 I was reminded that our throughput calculation is questionable - where it is an average of all requests since the start - this means that the longer into a test we get, the less impact a change in real throughput has on the reported throughput.

This adds "per window" throughput as an option. I propose at a later date, we switch the default of this so that the per window is the default - but this can be discussed in the PR.

@gareth-ellis gareth-ellis requested a review from a team as a code owner May 20, 2026 20:05
Copilot AI review requested due to automatic review settings May 20, 2026 20:05

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an opt-in “per window” (per-bucket) throughput calculation mode so reported throughput reflects recent changes (e.g., spikes/drops) instead of a cumulative average since task start.

Changes:

  • Add a new reporting config option metrics.request.throughput.window to toggle windowed throughput.
  • Plumb the new option through Driver -> SamplePostprocessor -> ThroughputCalculator.
  • Add a unit test validating windowed vs cumulative throughput behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
tests/driver/driver_test.py Adds a new test asserting windowed throughput reports bucket-local spikes correctly.
esrally/types.py Whitelists the new config key so it can be set via Rally configuration.
esrally/driver/driver.py Introduces the windowed throughput option and implements per-bucket throughput calculation logic.
Comments suppressed due to low confidence (1)

esrally/driver/driver.py:1687

  • The windowed throughput calculation is currently ineffective because finish_bucket() sets prev_interval = self.interval (and prev_total_count = self.total_count) before the windowed rate is computed. Immediately after finish_bucket(), self.interval - self.prev_interval becomes 0, so windowed_throughput falls back to cumulative throughput and will not reflect per-bucket spikes.

Consider computing the windowed delta using the previous bucket boundary values (e.g., store last_bucket_total_count/last_bucket_interval and update them after computing the windowed rate, or compute and return the windowed rate inside finish_bucket() before updating the snapshot).

        def finish_bucket(self, new_total):
            self.prev_total_count = self.total_count
            self.prev_interval = self.interval
            self.unprocessed = []
            self.total_count = new_total
            self.has_samples_in_sample_type = True
            self.bucket = int(self.interval) + self.bucket_interval

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread esrally/driver/driver.py
Comment thread esrally/types.py

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread esrally/driver/driver.py
Comment thread esrally/driver/driver.py Outdated
Agent-Logs-Url: https://github.com/elastic/rally/sessions/e662b65c-f00b-4944-9e57-7dbbd48ca06b

Co-authored-by: gareth-ellis <14981026+gareth-ellis@users.noreply.github.com>
Agent-Logs-Url: https://github.com/elastic/rally/sessions/3c808134-d09e-4bd5-b77d-1f73c2100dac

Co-authored-by: gareth-ellis <14981026+gareth-ellis@users.noreply.github.com>
Agent-Logs-Url: https://github.com/elastic/rally/sessions/8eb678d2-662e-4e40-9328-52c387fe5dba

Co-authored-by: gareth-ellis <14981026+gareth-ellis@users.noreply.github.com>

@gbanasiak gbanasiak left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As expected, this is much less stable than current running average. Here are 4 runs of indexing in geonames on my local computer - 2 running average (blue/green) vs. 2 window throughput (red/pink).

esrally race --track=geonames --challenge=append-no-conflicts-index-only --distribution-version=9.2.7 --config=...
Screenshot 2026-06-16 at 18 36 00

Around 10s into the indexing task we can see throughput spike on both windowed runs. I'm guessing this is an artifact of processing early samples which arrived late due to initial ES queueing (?). Can we consider such read-outs as "real"? Say we initiate a bulk request and it takes 10s to complete. If bucket interval (now 1s) is much lower than maximum bulk latency we might always see those types of spikes, right? I'm wondering if post-processing is a better strategy.

We probably want to expose bucket interval to experiment.

At the end of the run we can see how throughput decreases, most likely due to some of the clients finishing earlier.

Comment thread esrally/driver/driver.py
Comment on lines +1659 to +1663
@property
def windowed_throughput(self):
"""Throughput based only on ops and time elapsed since the previous bucket boundary."""
return self._windowed_rate if self._windowed_rate is not None else self.throughput

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid fallback to running average (self.throughput)? I think it would be better to return None and handle it at call sites, e.g. by skipping generation of throughput sample.

@gbanasiak

Copy link
Copy Markdown
Contributor

Here is a comparison of post-processed throughput calculated from service time samples (windowed-start-time) with proposed throughput (windowed-proposed) from one of the races shown earlier. I've also added service time samples. The time buckets are 1s.

Screenshot 2026-06-17 at 10 09 40

It is not clear to me why there's a throughput dip at the beginning of the race with no corresponding dip in the throughput calculated from raw service time samples. If there's a slowdown and clients are waiting for bulk responses new samples should not be created.

We may need to trace throughput calculation closer to understand this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants