feat: Add Seed Support for Reproducible Voice Generation (v1 & v2)#327
Open
elismasilva wants to merge 1 commit into
Open
feat: Add Seed Support for Reproducible Voice Generation (v1 & v2)#327elismasilva wants to merge 1 commit into
elismasilva wants to merge 1 commit into
Conversation
- Exposed 'seed' parameter in VoxCPMModel and VoxCPM2Model generation methods. - Added PyTorch RNG seed setting before inference runs. - Handled 'retry_badcase' seed adjustment by incrementing the seed value on retries. - Exposed 'self.last_successful_seed' as a model attribute for UI integrations. - Propagated 'seed' parameter to high-level pipeline class and CLI tools (cli.py). - Added '--seed' flag to full-finetune and LoRA inference scripts. - Configured validation audio generation in training script to use a fixed seed for objective comparison on TensorBoard. - Added comprehensive unit tests in CLI test files to validate seed parsing and propagation. - Updated English and Chinese READMEs with seed usage examples.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
This Pull Request introduces random seed (
seed) support across the VoxCPM ecosystem (covering both v1 and v2 architectures). The primary goal is to enable reproducible speech synthesis, ensuring consistent voice timbre, style, and prosody across generation runs.Key Changes
Core Models (
VoxCPMModel&VoxCPM2Model)seedparameter in both_generateand_generate_with_prompt_cachemethods.torch.manual_seedandtorch.cuda.manual_seed_all) right before the sampling loop in the core inference pipeline.retry_badcase=True), the seed is dynamically incremented by+1on each retry attempt to prevent deterministic infinite-loop generation failures.self.last_successful_seedas a model state attribute, enabling downstream user interfaces (e.g., Gradio, Streamlit) to easily retrieve and update fields with the exact seed that yielded the successful audio.High-Level Pipeline Wrapper (
VoxCPMincore.py)seedargument through the publicgenerateandgenerate_streamingAPI entry points.Command Line Interface (
cli.py) & Inference Scripts--seedoption to thedesign,clone, andbatchsubcommands, as well as legacy root CLI arguments.--seedflag into both the full-finetune (test_voxcpm_ft_infer.py) and LoRA (test_voxcpm_lora_infer.py) inference test scripts.Training & Validation
seed=42) during training insidegenerate_sample_audio. This ensures that periodic validation audio generated for TensorBoard shares the exact same initial acoustic noise path across steps, making qualitative progress evaluations reliable and scientifically objective.Unit Tests
test_cli.pyto verify that the command-line parser successfully parses the seed flag and accurately passes it to single-sample and batch generation tasks.Documentation
README.md) and Chinese (README_zh.md) documentations to showcase the usage of the new seed parameter.How to Test
Python API:
CLI Usage:
voxcpm design \ --text "Reproducible speech synthesis with a fixed seed." \ --seed 42 \ --output out.wav