fix(lora): resume from checkpoint fails due to strict state_dict loading by duchengyao · Pull Request #1296 · fishaudio/fish-speech

duchengyao · 2026-06-04T14:59:59Z

Is this PR adding new feature or fix a BUG?

Fix BUG.

Is this pull request related to any issue? If yes, please link the issue.

Problem

TextToSemantic.on_save_checkpoint intentionally saves only LoRA parameters to reduce checkpoint size (~100MB vs ~9GB). However, this causes Lightning's restore_model() to fail during resume because load_state_dict is called with strict=True, and the frozen base model weights are missing from the saved state_dict.

Fix

Override load_state_dict in TextToSemantic to always use strict=False.

Only LoRA weights are updated (base weights remain from from_pretrained)
Optimizer states and LR schedulers are correctly restored
Full fine-tuning is unaffected (non-LoRA checkpoints have all keys)

Before

RuntimeError: Error(s) in loading state_dict for TextToSemantic:
	Missing key(s): model.embeddings.weight, model.codebook_embeddings.weight, ...

After

LoRA training resumes from checkpoint with no errors.

Files changed

fish_speech/models/text2semantic/lit_module.py — add load_state_dict override

Testing

Tested on a single RTX 4090 (48GB) with LoRA r=8 on s2-pro:

Train for N steps → interrupt
Re-run training script → resume from latest checkpoint successfully
Loss curve is continuous, optimizer/scheduler states restored

fix(lora): resume from checkpoint fails due to strict state_dict loading

288b5d7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(lora): resume from checkpoint fails due to strict state_dict loading#1296

fix(lora): resume from checkpoint fails due to strict state_dict loading#1296
duchengyao wants to merge 1 commit into
fishaudio:mainfrom
duchengyao:fix-lora-checkpoint-resume

duchengyao commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

duchengyao commented Jun 4, 2026

Problem

Fix

Before

After

Files changed

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant