Skip to content

1/6 Don't crash on non-UTF-8 ApiError bodies#2134

Merged
gareth-ellis merged 3 commits into
masterfrom
otlp-pr1-driver-utf8
Jun 1, 2026
Merged

1/6 Don't crash on non-UTF-8 ApiError bodies#2134
gareth-ellis merged 3 commits into
masterfrom
otlp-pr1-driver-utf8

Conversation

@gareth-ellis

@gareth-ellis gareth-ellis commented May 29, 2026

Copy link
Copy Markdown
Member

Summary

When an Elasticsearch ApiError's body is non-UTF-8 (e.g. binary protobuf returned by OTLP endpoints under coordinator-bytes backpressure), e.body.decode("utf-8") in execute_single raises UnicodeDecodeError and crashes the worker.

Replaces six .decode("utf-8") call sites in the driver's ApiError handler with .decode("utf-8", errors="replace") so undecodable bytes downgrade to U+FFFD instead of crashing. No semantic change for valid UTF-8 (the common case).

Part 1 of 6 in the OTLP ingest series, but a standalone hardening fix that's useful on its own. Has no dependencies on the rest of the series.

Series

Test plan

  • All existing driver tests pass
  • pre-commit clean

🤖 Generated with Claude Code

The ApiError handler in execute_single() decodes `e.body`, `e.error`, and
`e.info` as UTF-8 to build a human-readable error message. When the body is
binary (e.g., binary protobuf returned by ES OTLP endpoints on 4xx/5xx), the
strict decode raises UnicodeDecodeError, which crashes the worker mid-task.

Switch the six decode() calls to use errors="replace" so undecodable bytes
become U+FFFD instead of aborting the worker. No semantic change for valid
UTF-8 (the common case).

This is a latent bug independent of OTLP — any operation that surfaces a
binary error body would have hit it.
Copilot AI review requested due to automatic review settings May 29, 2026 14:10
@gareth-ellis gareth-ellis requested a review from a team as a code owner May 29, 2026 14:10

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens execute_single in the driver so Elasticsearch ApiError handling does not crash when error bodies contain non-UTF-8 bytes.

Changes:

  • Updates ApiError body/error/info decoding to use UTF-8 replacement for undecodable bytes.
  • Adds an inline comment explaining the binary response-body motivation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread esrally/driver/driver.py
if isinstance(e.body, bytes):
# could be an empty body
if error_body := e.body.decode("utf-8"):
if error_body := e.body.decode("utf-8", errors="replace"):

@pquentin pquentin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's wild that OTLP ingest does this. A bug IMHO. Still approving to unblock you.

Comment thread tests/driver/driver_test.py Outdated
body = io.BytesIO(b"\xff")
str_literal = str(body)
error_meta = elastic_transport.ApiResponseMeta(
status=499,

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot why 499 and not 400 like the test is called? A server is unlikely to return a 499, since that is typically used as a client timeout

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 835a0e1: changed that test fixture and expectation to use HTTP 400 so it matches the test intent.

@gareth-ellis gareth-ellis enabled auto-merge (squash) June 1, 2026 12:03
@gareth-ellis gareth-ellis merged commit 61526a4 into master Jun 1, 2026
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants