fix: guard neighbour lookups in replace_blank (IndexError on trailing space)#318
Open
koriyoshi2041 wants to merge 1 commit into
Open
fix: guard neighbour lookups in replace_blank (IndexError on trailing space)#318koriyoshi2041 wants to merge 1 commit into
koriyoshi2041 wants to merge 1 commit into
Conversation
replace_blank keeps a space only when both neighbours are ASCII non-space characters, but it indexed text[i + 1] / text[i - 1] without bounds checks. A trailing space raised IndexError, and a leading space read text[-1] (wrapping to the last character) and was wrongly kept. Restrict the check to interior positions so edge spaces are dropped and no out-of-range access occurs; interior spacing is unchanged.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
replace_blank(src/voxcpm/utils/text_normalize.py) crashes on text ending in a space, and mishandles text starting with a space.The intent is to keep a space only when it sits between two ASCII word characters, but the neighbour lookups aren't bounds-checked:
text[i + 1]raiseIndexError: string index out of range.replace_blank("hello ")crashes. This is reachable fromTextNormalizer.normalize()for anyzhtext that ends in a space.text[i - 1]readtext[-1](the last character), so the decision wraps around and a leading space can be wrongly kept.Fix: only keep the space at interior positions (
0 < i < len(text) - 1), so edge spaces are dropped (they aren't between two words) and no out-of-range access happens. Interior spacing (e.g."a b"kept,"中 文"→"中文") is unchanged.Added
tests/test_text_normalize.pycovering trailing/leading/interior spaces and a CJK-adjacent space.Note: I verified the function's behaviour directly (it's pure string logic), but couldn't run the full test locally because importing the package pulls in torch/torchaudio and a native
torchaudiolib fails to load on my machine — CI should exercise the added test normally.