Skip to content

Nova Sonic 2 sessionStart rejected: endpointingSensitivity must be nested under turnDetectionConfigurationΒ #6200

Description

@St-Luciferr

Bug Description

Summary

The experimental AWS Nova Sonic realtime plugin serializes the turn-detection
setting as a flat endpointingSensitivity field directly under sessionStart.
Amazon Nova Sonic 2 (amazon.nova-2-sonic-v1:0) now strictly validates this event
and requires the value nested under turnDetectionConfiguration, rejecting the flat
field with ValidationException: Invalid input request. This kills the session at
startup and cascades into speech not done in time after interruption + shutdown.

Affected code

livekit-plugins/livekit-plugins-aws/.../experimental/realtime/events.py

class SessionStart(BaseModel):
    inferenceConfiguration: InferenceConfiguration
    endpointingSensitivity: TURN_DETECTION | None = "MEDIUM"   # flat β€” Sonic 1 shape

Emitted event:

{"event":{"sessionStart":{"inferenceConfiguration":{...},"endpointingSensitivity":"HIGH"}}}

Notes

- The plugin's serialization is unchanged across recent releases (verified 1.3.10
through 1.5.8 and current main), so this is an AWS-side validation tightening,
not a plugin regression. Previously the flat field was silently ignored, which
also means turn_detection=... was effectively a no-op on Sonic 2 until now.
- Fix needs to stay model-aware: Nova 2 wants the nested form; Nova Sonic 1
(amazon.nova-sonic-v1:0) predates controllable endpointing, so it should keep
the legacy flat field (or omit it) to avoid a regression.


### Expected Behavior

Expected (per AWS Nova 2 docs)
https://docs.aws.amazon.com/nova/latest/nova2-userguide/sonic-turn-taking.html
```json
{"event":{"sessionStart":{"inferenceConfiguration":{...},"turnDetectionConfiguration":{"endpointingSensitivity":"HIGH"}}}}

### Reproduction Steps

```bash
1. RealtimeModel.with_nova_sonic_2(voice="tiffany", turn_detection="HIGH")
2. Start a session.
3. Bedrock returns ValidationException: Invalid input request on the sessionStart
event; the session closes immediately.

Operating System

linux

Models Used

amazon.nova-2-sonic-v1:0

Package Versions

- livekit-plugins-aws 1.5.8 (also present on main)
- Model: amazon.nova-2-sonic-v1:0

Session/Room/Call IDs

No response

Proposed Solution

This is the non-regressing version β€” Nova 2 gets the nested form, Sonic 1 keeps the flat form. It threads the model id into the event builder (the call site at realtime_model.py already has self._realtime_model._opts.model).

events.py

# add near SessionStart
class TurnDetectionConfiguration(BaseModel):
    endpointingSensitivity: TURN_DETECTION


class SessionStart(BaseModel):
    inferenceConfiguration: InferenceConfiguration
    # Nova Sonic 1 used a flat field; Nova Sonic 2 requires it nested under
    # turnDetectionConfiguration. Exactly one is populated per model.
    endpointingSensitivity: TURN_DETECTION | None = None
    turnDetectionConfiguration: TurnDetectionConfiguration | None = None

class SonicEventBuilder:
    def __init__(
        self,
        prompt_name: str,
        audio_content_name: str,
        model: str = "amazon.nova-2-sonic-v1:0",
    ):
        ...
        self._nova_sonic_2 = model == "amazon.nova-2-sonic-v1:0"

    def create_session_start_event(
        self,
        max_tokens: int = 1024,
        top_p: float = 0.9,
        temperature: float = 0.7,
        endpointing_sensitivity: TURN_DETECTION | None = "MEDIUM",
    ) -> str:
        inference = InferenceConfiguration(
            maxTokens=max_tokens, topP=top_p, temperature=temperature
        )
        if self._nova_sonic_2 and endpointing_sensitivity is not None:
            session_start = SessionStart(
                inferenceConfiguration=inference,
                turnDetectionConfiguration=TurnDetectionConfiguration(
                    endpointingSensitivity=endpointing_sensitivity
                ),
            )
        else:
            session_start = SessionStart(
                inferenceConfiguration=inference,
                endpointingSensitivity=endpointing_sensitivity,
            )
        event = Event(event=SessionStartEvent(sessionStart=session_start))
        return event.model_dump_json(exclude_none=True)   # was exclude_none=False

realtime_model.py β€” pass the model id at both seb(...) construction sites (lines ~500 and ~741):
self._event_builder = seb(
    prompt_name=str(uuid.uuid4()),
    audio_content_name=str(uuid.uuid4()),
    model=self._realtime_model._opts.model,
)

### Additional Context

_No response_

### Screenshots and Recordings

_No response_

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions