Skip to content

Cloud turn detector timeout and late EOU detection warnings after switching to LiveKit Inference VAD/turn detector #6177

Description

@kaushal-aubie

Bug Description

After switching from the plugin-based VAD and turn detector to LiveKit Inference VAD + LiveKit Inference audio turn detector, I started seeing warnings/errors that did not occur with the previous setup.

Current issues

  1. Late EOU detection:
eou detection ran after the audio eot turn was already flushed (likely a late stt final).
consider raising `min_delay` in the endpointing options to accommodate slow stt.
  1. Cloud turn detector timeout/failure:
cloud turn detector failed (eot prediction timed out); falling back to local mini model
  1. Higher worker/job memory usage compared to previous setup around 970+mb which was much lower below 750mb.

Previous setup

  • Silero VAD plugin
  • LiveKit plugin turn detector

No warnings were observed.

Current setup

  • LiveKit Inference VAD
  • LiveKit Inference audio turn detector

Configuration

VAD: (Turn detector throw error when below this value, should be actually documented in website)

MIN_SILENCE_DURATION=0.25

Endpointing: (I understand they are aggressive but i need that to avoid as much lower latency as possible)

MIN_ENDPOINTING_DELAY=0.15
MAX_ENDPOINTING_DELAY=1.2
ENDPOINTING_MODE=fixed

Deepgram stt settings, so default endpointing that livekit has is 25ms.

    kwargs: dict = {
        "model": (c and c.model) or "nova-3",
        "language": (c and c.language) or "en",
        "filler_words": True,
        "smart_format": True,
    }

Observations

The warning suggests that EOU detection is completing after the turn has already been flushed, possibly because the endpointing delay is too aggressive or the cloud turn detector response arrives late.

I reduced the delays for lower latency, but after moving to the inference models these warnings started appearing.

Questions

  1. Is this expected behaviour when using LiveKit Inference turn detector compared to the plugin version?
  2. What is the recommended min_delay/max_delay range when using cloud turn detection with external STT providers?
  3. Is there any known reason for increased memory usage after enabling LiveKit Inference VAD + turn detector?

Would appreciate if these issues are fixed.

Expected Behavior

  1. There should not be increase of 200mb+ as its same vad and turn detector?
  2. There should be no failure/timeouts
  3. Proper documentation in website for developers to know minimum settings and ideal settings and reason/cause it could increase ram usage.

Reproduction Steps

Just switched the livekit plugin silero, turn detector to livekit inference ones.

Operating System

Livekit cloud

Models Used

Deepgram nova3, gpt 5.4 mini, cartesia tts

Package Versions

"livekit-agents>=1.6.1",
"livekit-api>=1.1.0"

Session/Room/Call IDs

RM_HQMqnZ3afh2U, RM_GfoXD29Yfto6,RM_7BcQza5BEoAA

Proposed Solution

Additional Context

No response

Screenshots and Recordings

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions