Skip to content

AMD: detection_timeout still armed at track subscription poisons long-ring outbound SIP calls (follow-up to #5848) #6187

Description

@mirandamon

Summary

On outbound SIP calls with carrier early media, AMD emits uncertain / detection_timeout with speech_duration=0 before the callee answers, so the agent falls through and treats a voicemail (or a late-answering human) as a live human. This is the detection-timeout analog of the no-speech-timer bug already fixed in #5848.

Root cause

#5848 correctly deferred the no-speech timer to the SIP active state, and its own description notes the remainder: "Detection timeout is still armed when track is subscribed."

In voice/amd/detector.py _setup, start_detection_timer() is still called immediately after wait_for_track_publication, with the comment:

# outer budget runs from track-up so AMD bails out even if the
# call never reaches the active state
self._classifier.start_detection_timer()

With carrier early media (e.g. Twilio), the audio track is subscribed during ringback, not at answer, so the detection_timeout budget (default 20s) runs down during the ringing phase. US cell voicemail commonly answers at ~25-30s, which is after the 20s budget has already expired. The classifier then emits uncertain / detection_timeout with zero speech, before the SIP leg ever reaches active.

start_listening() (the no-speech timer + transcript processing) is correctly gated on sip.callStatus == "active" via _wait_for_sip_answer; only the outer detection timer is not. (Verified against main @ b2eefbe.)

Reproduction (real call)

t (after dial) event
+6s detection budget armed (audio track up, early media)
+7s SIP ringing
+26s detection budget expires -> uncertain / detection_timeout, zero audio
+32s SIP active (mailbox answers, 6s too late)
+36s greeting transcribed: "...automated voice messaging system"

The same early-media reasoning that justified deferring the no-speech timer in #5848 applies identically to the detection timer: a clock armed at track subscription is poisoned by the pre-answer ringing phase.

Proposed fix

Extend #5848's answer-anchoring to the detection timeout, symmetric to the no-speech timer: for SIP participants, start (or reset) the detection timer from _start_listening() (i.e. at active) rather than at track-up, so the budget measures post-answer detection time. The never-answered hang-guard that the track-up start currently provides can be preserved with a separate, longer pre-answer ring-wait bound (so a call that never reaches active still bails), rather than by consuming the detection budget during ringback.

Happy to open a PR for whichever shape you prefer (the pre-answer-bound-vs-reset choice is yours to call).

Workaround

Callers can pass detection_options={"timeout": 45.0} so the budget outlasts a realistic ring, but that is a ceiling-guess, not a fix: a longer ring re-breaks it.

Version

livekit-agents 1.5.17 (behavior confirmed unchanged on main @ b2eefbe, 2026-06-22).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions