Skip to content

fix: drain unused audio tee branches to prevent unbounded memory growth#1878

Open
tsushanth wants to merge 2 commits into
livekit:mainfrom
tsushanth:fix/audio-recognition-tee-leak
Open

fix: drain unused audio tee branches to prevent unbounded memory growth#1878
tsushanth wants to merge 2 commits into
livekit:mainfrom
tsushanth:fix/audio-recognition-tee-leak

Conversation

@tsushanth

Copy link
Copy Markdown
Contributor

When no VAD or STT consumer is configured — the common case with realtime LLM turn detection — audio_recognition.ts unconditionally calls .tee() on the primary input stream and assigns both branches to vadInputStream and sttInputStream. Neither branch is ever read, so the broadcast TransformStream queues every incoming audio frame with no backpressure release. In a sustained session this produces roughly 300 KB/s of RSS growth until the process OOMs or the connection drops.

The fix makes .tee() conditional on which consumers are actually present. When both VAD and STT are configured the stream is tee'd into both branches as before. When only VAD is present the stream is tee'd once. When only STT is present the primary stream is passed directly without any tee. When neither consumer exists a lightweight background reader drains the primary stream so the upstream broadcast transform stays unblocked without accumulating frames.

The corresponding null guards in forwardInputAudioToStt and createVadTask let TypeScript enforce the invariant and provide an explicit early-return path for callers that race before the stream is established.

A second, smaller leak lives in RealtimeSession.forwardEvents in the OpenAI realtime plugin. Each call allocated a Future, registered an 'abort' listener on the signal, and then raced the channel get against that future. Because Queue.get already accepts an AbortSignal directly, the Future and its listener were redundant and kept the listener registered on the signal until the channel's pending promise resolved — preventing GC of the closure in long-lived sessions. The fix passes the signal into Queue.get directly and removes the auxiliary future entirely.

Fixes #1462

DOMException (thrown by AbortSignal/AbortController) is not instanceof
Error in Node and Bun, so guards of the form:

  if (err instanceof Error && err.name === 'AbortError')

never match. The instanceof check is redundant for the name property
since any thrown value can carry it. Replace all six occurrences across
agents and plugins with:

  if ((err as { name?: string })?.name === 'AbortError')

This silences the spurious error-level log noise on normal turn and
session teardown that triggered livekit#1712.
When no VAD or STT consumer is configured (e.g. realtime LLM turn
detection only), audio_recognition.ts was unconditionally calling
.tee() on the primary input stream and assigning both branches to
vadInputStream and sttInputStream. Neither branch was ever consumed,
causing the broadcast transform to buffer indefinitely — roughly
300 KB/s RSS growth under sustained input.

Make .tee() conditional: tee into both branches only when both VAD
and STT consumers exist, tee into one branch when only VAD exists,
skip the tee entirely when only STT exists (pass primaryInputStream
directly), and drain the stream with a background reader when neither
consumer is present so the broadcast transform keeps flowing.

Also remove the abortFuture / Promise.race pattern in
RealtimeSession.forwardEvents: Queue.get already accepts an AbortSignal,
so the auxiliary Future and event-listener were redundant and prevented
the signal listener from being GC'd until the queue resolved.
@changeset-bot

changeset-bot Bot commented Jun 25, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: 21aa3f5

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

Open in Devin Review

Comment on lines +501 to 507
} else if (opts.vad) {
const [vadInputStream, sttInputStream] = primaryInputStream.tee();
this.vadInputStream = vadInputStream;
this.sttInputStream = mergeReadableStreams(
replaceSttInputWithSilence(sttInputStream),
this.silenceAudioTransform.readable,
);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 VAD-only (no STT) case still creates unconsumed sttInputStream

The else if (opts.vad) branch at line 501 creates a two-way tee and sets both vadInputStream and sttInputStream. If opts.stt is falsy, sttInputStream is never consumed by forwardInputAudioToStt (which now has a null guard), but the merged stream created at line 504-507 — including the tee branch — has no reader. With web streams, an unconsumed tee branch buffers indefinitely. This is the same behavior as the old code (which always tee'd regardless of STT presence), so it's a pre-existing concern rather than a regression. In practice, VAD-without-STT configurations are likely very rare.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AudioRecognition leaks all inbound audio frames when using realtime_llm turn detection without local VAD/STT

1 participant