Add per window throughput#2126
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds an opt-in “per window” (per-bucket) throughput calculation mode so reported throughput reflects recent changes (e.g., spikes/drops) instead of a cumulative average since task start.
Changes:
- Add a new reporting config option
metrics.request.throughput.windowto toggle windowed throughput. - Plumb the new option through
Driver -> SamplePostprocessor -> ThroughputCalculator. - Add a unit test validating windowed vs cumulative throughput behavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
tests/driver/driver_test.py |
Adds a new test asserting windowed throughput reports bucket-local spikes correctly. |
esrally/types.py |
Whitelists the new config key so it can be set via Rally configuration. |
esrally/driver/driver.py |
Introduces the windowed throughput option and implements per-bucket throughput calculation logic. |
Comments suppressed due to low confidence (1)
esrally/driver/driver.py:1687
- The windowed throughput calculation is currently ineffective because
finish_bucket()setsprev_interval = self.interval(andprev_total_count = self.total_count) before the windowed rate is computed. Immediately afterfinish_bucket(),self.interval - self.prev_intervalbecomes 0, sowindowed_throughputfalls back to cumulative throughput and will not reflect per-bucket spikes.
Consider computing the windowed delta using the previous bucket boundary values (e.g., store last_bucket_total_count/last_bucket_interval and update them after computing the windowed rate, or compute and return the windowed rate inside finish_bucket() before updating the snapshot).
def finish_bucket(self, new_total):
self.prev_total_count = self.total_count
self.prev_interval = self.interval
self.unprocessed = []
self.total_count = new_total
self.has_samples_in_sample_type = True
self.bucket = int(self.interval) + self.bucket_interval
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Agent-Logs-Url: https://github.com/elastic/rally/sessions/e662b65c-f00b-4944-9e57-7dbbd48ca06b Co-authored-by: gareth-ellis <14981026+gareth-ellis@users.noreply.github.com>
Agent-Logs-Url: https://github.com/elastic/rally/sessions/3c808134-d09e-4bd5-b77d-1f73c2100dac Co-authored-by: gareth-ellis <14981026+gareth-ellis@users.noreply.github.com>
Agent-Logs-Url: https://github.com/elastic/rally/sessions/8eb678d2-662e-4e40-9328-52c387fe5dba Co-authored-by: gareth-ellis <14981026+gareth-ellis@users.noreply.github.com>
gbanasiak
left a comment
There was a problem hiding this comment.
As expected, this is much less stable than current running average. Here are 4 runs of indexing in geonames on my local computer - 2 running average (blue/green) vs. 2 window throughput (red/pink).
esrally race --track=geonames --challenge=append-no-conflicts-index-only --distribution-version=9.2.7 --config=...
Around 10s into the indexing task we can see throughput spike on both windowed runs. I'm guessing this is an artifact of processing early samples which arrived late due to initial ES queueing (?). Can we consider such read-outs as "real"? Say we initiate a bulk request and it takes 10s to complete. If bucket interval (now 1s) is much lower than maximum bulk latency we might always see those types of spikes, right? I'm wondering if post-processing is a better strategy.
We probably want to expose bucket interval to experiment.
At the end of the run we can see how throughput decreases, most likely due to some of the clients finishing earlier.
| @property | ||
| def windowed_throughput(self): | ||
| """Throughput based only on ops and time elapsed since the previous bucket boundary.""" | ||
| return self._windowed_rate if self._windowed_rate is not None else self.throughput | ||
|
|
There was a problem hiding this comment.
Can we avoid fallback to running average (self.throughput)? I think it would be better to return None and handle it at call sites, e.g. by skipping generation of throughput sample.

Whilst working on #2087 I was reminded that our throughput calculation is questionable - where it is an average of all requests since the start - this means that the longer into a test we get, the less impact a change in real throughput has on the reported throughput.
This adds "per window" throughput as an option. I propose at a later date, we switch the default of this so that the per window is the default - but this can be discussed in the PR.