Skip to content

fix: guard neighbour lookups in replace_blank (IndexError on trailing space)#318

Open
koriyoshi2041 wants to merge 1 commit into
OpenBMB:mainfrom
koriyoshi2041:fix/replace-blank-bounds
Open

fix: guard neighbour lookups in replace_blank (IndexError on trailing space)#318
koriyoshi2041 wants to merge 1 commit into
OpenBMB:mainfrom
koriyoshi2041:fix/replace-blank-bounds

Conversation

@koriyoshi2041
Copy link
Copy Markdown

replace_blank (src/voxcpm/utils/text_normalize.py) crashes on text ending in a space, and mishandles text starting with a space.

for i, c in enumerate(text):
    if c == " ":
        if (text[i + 1].isascii() and text[i + 1] != " ") and (text[i - 1].isascii() and text[i - 1] != " "):
            out_str.append(c)
    else:
        out_str.append(c)

The intent is to keep a space only when it sits between two ASCII word characters, but the neighbour lookups aren't bounds-checked:

  • A trailing space makes text[i + 1] raise IndexError: string index out of range. replace_blank("hello ") crashes. This is reachable from TextNormalizer.normalize() for any zh text that ends in a space.
  • A leading space makes text[i - 1] read text[-1] (the last character), so the decision wraps around and a leading space can be wrongly kept.

Fix: only keep the space at interior positions (0 < i < len(text) - 1), so edge spaces are dropped (they aren't between two words) and no out-of-range access happens. Interior spacing (e.g. "a b" kept, "中 文""中文") is unchanged.

Added tests/test_text_normalize.py covering trailing/leading/interior spaces and a CJK-adjacent space.

Note: I verified the function's behaviour directly (it's pure string logic), but couldn't run the full test locally because importing the package pulls in torch/torchaudio and a native torchaudio lib fails to load on my machine — CI should exercise the added test normally.

replace_blank keeps a space only when both neighbours are ASCII non-space
characters, but it indexed text[i + 1] / text[i - 1] without bounds checks.
A trailing space raised IndexError, and a leading space read text[-1]
(wrapping to the last character) and was wrongly kept. Restrict the check to
interior positions so edge spaces are dropped and no out-of-range access
occurs; interior spacing is unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant