Fix IndexError in replace_blank on boundary whitespace by KennyUMN · Pull Request #320 · OpenBMB/VoxCPM

KennyUMN · 2026-05-31T04:26:43Z

What

replace_blank() in src/voxcpm/utils/text_normalize.py raised IndexError on text ending in a space, and silently mishandled text starting with a space.

Why

When the loop hits a space it inspects both neighbours:

if (text[i + 1].isascii() and text[i + 1] != " ") and (text[i - 1].isascii() and text[i - 1] != " "):

A trailing space (i == len(text) - 1) makes text[i + 1] index out of range -> IndexError: string index out of range.
A leading space (i == 0) makes text[i - 1] evaluate to text[-1] (the last character) via Python's negative indexing, so the blank can be spuriously kept instead of dropped.

The sibling function split_paragraph() in the same file already guards this exact access with if i + 1 < len(text), so the missing bound here is an oversight.

Change

prev_ok = i > 0 and text[i - 1].isascii() and text[i - 1] != " "
next_ok = i + 1 < len(text) and text[i + 1].isascii() and text[i + 1] != " "
if prev_ok and next_ok:
    out_str.append(c)

Boundary spaces are now dropped (they have no neighbour to sit between), matching the function's documented intent ("remove blank between chinese character"). Behaviour on all interior inputs is unchanged.

Testing

Adds tests/test_text_normalize.py (mirrors the existing importlib + stubbed-deps style in tests/test_model_utils.py) covering trailing space, leading space, ASCII-interior spaces, CJK boundaries, and empty string. All pass; verified the pre-fix code raised IndexError on the trailing-space cases.

Note

replace_blank is currently called right after wetext's normalizer, which strips boundary whitespace, so this is a latent defect rather than a user-visible crash today. The fix makes the helper correct and crash-safe in isolation (and robust to any future caller / normalizer change).

replace_blank() indexed text[i + 1] and text[i - 1] unconditionally when it hit a space. A trailing space (i == len(text) - 1) therefore raised IndexError, and a leading space (i == 0) let text[i - 1] wrap around to text[-1] (the last character), which could spuriously preserve the blank. Guard both neighbour lookups so boundary spaces are dropped instead of crashing, mirroring the bounds check already present in split_paragraph(). Adds tests/test_text_normalize.py covering leading/trailing/interior cases.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Fixes replace_blank() edge cases for leading/trailing spaces and adds regression tests to prevent index/wraparound issues.

Changes:

Add bounds-checked neighbor logic in replace_blank() to avoid IndexError and text[-1] wraparound.
Introduce a focused test suite covering leading/trailing spaces, ASCII spacing rules, CJK adjacency, and empty input.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
tests/test_text_normalize.py	Adds regression tests for `replace_blank()` and loads the module with stubbed third‑party deps.
src/voxcpm/utils/text_normalize.py	Updates `replace_blank()` to safely check neighbors at string boundaries.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+# Stub heavy/third-party imports so the module loads without them. We only
+# exercise ``replace_blank``, which depends on nothing beyond the stdlib.
+for _name in ("regex", "inflect"):
+    sys.modules.setdefault(_name, types.ModuleType(_name))
+
+_wetext_stub = types.ModuleType("wetext")
+_wetext_stub.Normalizer = object
+sys.modules.setdefault("wetext", _wetext_stub)


+spec = importlib.util.spec_from_file_location("voxcpm.utils.text_normalize", TEXT_NORMALIZE_PATH)
+text_normalize = importlib.util.module_from_spec(spec)
+assert spec.loader is not None
+spec.loader.exec_module(text_normalize)


+            prev_ok = i > 0 and text[i - 1].isascii() and text[i - 1] != " "
+            next_ok = i + 1 < len(text) and text[i + 1].isascii() and text[i + 1] != " "
+            if prev_ok and next_ok:


Copilot AI review requested due to automatic review settings May 31, 2026 04:26

Copilot AI reviewed May 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix IndexError in replace_blank on boundary whitespace#320

Fix IndexError in replace_blank on boundary whitespace#320
KennyUMN wants to merge 1 commit into
OpenBMB:mainfrom
KennyUMN:fix/replace-blank-boundary-index

KennyUMN commented May 31, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

KennyUMN commented May 31, 2026

What

Why

Change

Testing

Note

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants