Skip to content

fix(room_io): replay unplayed audio tail on false interruptions#1884

Open
toubatbrian wants to merge 2 commits into
mainfrom
brian/fix-false-interruption-audio-loss
Open

fix(room_io): replay unplayed audio tail on false interruptions#1884
toubatbrian wants to merge 2 commits into
mainfrom
brian/fix-false-interruption-audio-loss

Conversation

@toubatbrian

@toubatbrian toubatbrian commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

1. Problem

On a false interruption, the agent loses unplayed audio. ParticipantAudioOutput.pause() calls this.audioSource.clearQueue(), which permanently discards every frame already pushed to the native rtc-node AudioSource queue but not yet played. The agent pauses playback on any VAD-detected overlapping speech (interruptByAudioActivity). When that speech turns out not to be a real turn, the framework resume()s — but the cleared frames were never re-pushed, so that tail is gone (audible in both the live track and the recording, since both are fed from the same AudioSource).

The JS room output also never set queueSizeMs, inheriting rtc-node's 1000ms default (Python uses 200ms), so the worst-case discardable tail was up to 1s.

2. Analysis & fixes

pause()clearQueue() is correct for a real interruption (the user cut the agent off, so the unplayed tail should die). The bug is that the same code path runs for a false interruption, where the audio should be preserved and resumed.

Fix in agents/src/voice/room_io/_output.ts:

  • Keep a rolling window of recently pushed frames (recentFrames, capped at queueSizeMs + headroom).
  • On pause(), before clearQueue(), capture the unplayed tail (the last audioSource.queuedDuration worth of frames) into replayFrames.
  • On the next captureFrame() after resume() (false interruption), replay replayFrames before pushing new audio — zero loss. Replayed frames are not re-counted in pushedDuration.
  • On a real interruption (clearBuffer()) and at segment end, discard replayFrames.

Secondary fix in agents/src/voice/room_io/room_io.ts:

  • Set DEFAULT_ROOM_OUTPUT_OPTIONS.queueSizeMs = 200 to match Python and bound the worst-case discardable tail.

3. Validations

Unit tests (_output.test.ts, +3): false interruption replays the exact unplayed tail (zero loss); real interruption (clearBuffer) discards it; no-op when nothing was queued.

Live runtime validation (cue-cli voice mode, real ParticipantAudioOutput + rtc-node AudioSource), false vs real interruption mid-utterance:

Check Evidence Result
pause captures unplayed tail real queuedMs:245.5 → captured 296ms (3 frames) pass
resume replays it, zero loss replayCount:3, replayMs:296 pass
real interrupt discards tail clearBuffer discardedReplayFrames:3 pass
no cross-segment leak false-intr segment end interrupted:false, replayFramesAtEnd:0 (full audio) pass

Scope / what this does NOT fix

This PR fixes the false-interruption audio loss (affects live + recording). It does not address the customer report in RM_oPQpspNxqjtb, where the cut is observability-only ("not actual calls") and occurs on every turn regardless of interruption. Runtime A/B (1.4.7 vs this branch) rules out the queue cap as that cause (only ~130–200ms ever queued at interruption, ~no difference between 1000ms and 200ms) and rules out the handoff-drain teardown (audio drains fully, clearBuffer never fires). That per-turn recording clip traces to recorder_io's wall-clock clamp of playbackPosition (onPlaybackFinished clamps to wall-clock elapsed, which runs slightly short of the actual audio duration) and is being investigated separately.

pause() cleared the entire native AudioSource queue, permanently dropping
up to queueSizeMs of generated-but-unplayed audio. On a false interruption
(pause then resume) those frames were never replayed, so up to ~1s of agent
speech was lost mid-sentence from both the live call and the recording.

Keep a rolling window of recently pushed frames, capture the unplayed tail
on pause(), and replay it on resume(), while discarding it on a real
interruption (clearBuffer()). Also cap the default room output queue to
200ms to match Python.

Co-authored-by: Cursor <cursoragent@cursor.com>
@changeset-bot

changeset-bot Bot commented Jun 25, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: 53e6381

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 35 packages
Name Type
@livekit/agents Patch
@livekit/agents-plugin-anam Patch
@livekit/agents-plugin-assemblyai Patch
@livekit/agents-plugin-baseten Patch
@livekit/agents-plugin-bey Patch
@livekit/agents-plugin-cartesia Patch
@livekit/agents-plugin-cerebras Patch
@livekit/agents-plugin-deepgram Patch
@livekit/agents-plugin-did Patch
@livekit/agents-plugin-elevenlabs Patch
@livekit/agents-plugin-fishaudio Patch
@livekit/agents-plugin-google Patch
@livekit/agents-plugin-hedra Patch
@livekit/agents-plugin-hume Patch
@livekit/agents-plugin-inworld Patch
@livekit/agents-plugin-lemonslice Patch
@livekit/agents-plugin-liveavatar Patch
@livekit/agents-plugin-livekit Patch
@livekit/agents-plugin-minimax Patch
@livekit/agents-plugin-mistral Patch
@livekit/agents-plugin-mistralai Patch
@livekit/agents-plugin-neuphonic Patch
@livekit/agents-plugin-openai Patch
@livekit/agents-plugin-perplexity Patch
@livekit/agents-plugin-phonic Patch
@livekit/agents-plugin-resemble Patch
@livekit/agents-plugin-rime Patch
@livekit/agents-plugin-runway Patch
@livekit/agents-plugin-sarvam Patch
@livekit/agents-plugin-silero Patch
@livekit/agents-plugin-soniox Patch
@livekit/agents-plugin-tavus Patch
@livekit/agents-plugins-test Patch
@livekit/agents-plugin-trugen Patch
@livekit/agents-plugin-xai Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8c89a2ae04

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread agents/src/voice/room_io/_output.ts Outdated
devin-ai-integration[bot]

This comment was marked as resolved.

@chenghao-mou chenghao-mou self-requested a review June 26, 2026 09:25
…izeMs

Clear replayFrames unconditionally when a playback segment finishes. It was
only cleared on interruption, so an end-of-utterance false interruption (which
completes the segment non-interrupted) left the captured tail behind and
prepended it to the next utterance. Mid-utterance false interruptions still
recover their tail because the next captureFrame consumes it before flush.

Also note the queueSizeMs default change (1000ms -> 200ms) in the docstring and
changeset, and add a regression test for the end-of-utterance leak.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants