Skip to content

feat: Add Seed Support for Reproducible Voice Generation (v1 & v2)#327

Open
elismasilva wants to merge 1 commit into
OpenBMB:mainfrom
DEVAIEXP:feat-add-seed
Open

feat: Add Seed Support for Reproducible Voice Generation (v1 & v2)#327
elismasilva wants to merge 1 commit into
OpenBMB:mainfrom
DEVAIEXP:feat-add-seed

Conversation

@elismasilva
Copy link
Copy Markdown

Overview

This Pull Request introduces random seed (seed) support across the VoxCPM ecosystem (covering both v1 and v2 architectures). The primary goal is to enable reproducible speech synthesis, ensuring consistent voice timbre, style, and prosody across generation runs.


Key Changes

  1. Core Models (VoxCPMModel & VoxCPM2Model)

    • Exposed an optional seed parameter in both _generate and _generate_with_prompt_cache methods.
    • Applied PyTorch CPU and CUDA random number generator seeds (torch.manual_seed and torch.cuda.manual_seed_all) right before the sampling loop in the core inference pipeline.
    • Badcase Handling: If automatic badcase recovery is active (retry_badcase=True), the seed is dynamically incremented by +1 on each retry attempt to prevent deterministic infinite-loop generation failures.
    • Exposed self.last_successful_seed as a model state attribute, enabling downstream user interfaces (e.g., Gradio, Streamlit) to easily retrieve and update fields with the exact seed that yielded the successful audio.
  2. High-Level Pipeline Wrapper (VoxCPM in core.py)

    • Propagated the seed argument through the public generate and generate_streaming API entry points.
  3. Command Line Interface (cli.py) & Inference Scripts

    • Added the --seed option to the design, clone, and batch subcommands, as well as legacy root CLI arguments.
    • Integrated the --seed flag into both the full-finetune (test_voxcpm_ft_infer.py) and LoRA (test_voxcpm_lora_infer.py) inference test scripts.
  4. Training & Validation

    • Fixed a default evaluation seed (seed=42) during training inside generate_sample_audio. This ensures that periodic validation audio generated for TensorBoard shares the exact same initial acoustic noise path across steps, making qualitative progress evaluations reliable and scientifically objective.
  5. Unit Tests

    • Added new test cases to test_cli.py to verify that the command-line parser successfully parses the seed flag and accurately passes it to single-sample and batch generation tasks.
  6. Documentation

    • Updated Python API quick-start examples and CLI usage blocks in both the English (README.md) and Chinese (README_zh.md) documentations to showcase the usage of the new seed parameter.

How to Test

Python API:

wav = model.generate(
    text="This is a test of reproducible voice generation.",
    seed=42
)
print(f"Last seed: {model.tts_model.last_successful_seed}") #to capture current new seed in bad cases.

CLI Usage:

voxcpm design \
  --text "Reproducible speech synthesis with a fixed seed." \
  --seed 42 \
  --output out.wav

- Exposed 'seed' parameter in VoxCPMModel and VoxCPM2Model generation methods.
- Added PyTorch RNG seed setting before inference runs.
- Handled 'retry_badcase' seed adjustment by incrementing the seed value on retries.
- Exposed 'self.last_successful_seed' as a model attribute for UI integrations.
- Propagated 'seed' parameter to high-level pipeline class and CLI tools (cli.py).
- Added '--seed' flag to full-finetune and LoRA inference scripts.
- Configured validation audio generation in training script to use a fixed seed for objective comparison on TensorBoard.
- Added comprehensive unit tests in CLI test files to validate seed parsing and propagation.
- Updated English and Chinese READMEs with seed usage examples.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant