Skip to content

chore(registry-dash): SSE frame logging to diagnose AI tool-pill loss (SRE-1963)#143

Closed
UDtorrey wants to merge 1 commit into
masterfrom
UDtorrey/sre-1963-pill-loss-diagnostics
Closed

chore(registry-dash): SSE frame logging to diagnose AI tool-pill loss (SRE-1963)#143
UDtorrey wants to merge 1 commit into
masterfrom
UDtorrey/sre-1963-pill-loss-diagnostics

Conversation

@UDtorrey

@UDtorrey UDtorrey commented May 6, 2026

Copy link
Copy Markdown
Collaborator

Summary

PR #141 was supposed to make tool pills persist in the chat scrollback, but on alpha they still vanish: pills appear IN_FLIGHT during execution, then by the end of the response no pills remain (and turn-0 / turn-1 assistant text gets concatenated with no per-turn split).

Backend GCP logs (ud-registry-alpha-gke, k8s_container) confirm the orchestrator emitted ToolUseEvent + ToolResultEvent for every tool call across every turn — so the SSE wire definitely carries the frames; the frontend is losing them somewhere.

Static analysis of ai-analysis.service.ts couldn't pinpoint the failure path. The frontend has no logging, and the SSE handler's catch block silently swallows every parse error, so any code change right now is a guess.

What changed

Two diagnostic lines in console-webapp/src/app/registry-dash/ai/ai-analysis.service.ts. No behavior change.

  1. Un-silence the parse-error catchconsole.warn the dropped frame and the parse error so chunk-boundary truncation or unexpected bytes in args surface instead of disappearing.
  2. Trace every frame at the dispatchconsole.debug the type + tool/status of every frame the frontend actually sees, in order.

Cross-referenced against the GCP turn-by-turn tool log we already have, this will tell us instantly whether:

  • frames are silently failing JSON.parse (the console.warn will fire);
  • frames never reach the dispatch (frame log shows a count below the GCP backend log);
  • frames arrive correctly but pills still vanish (state mutation or CSS issue, easy to confirm with ng.getComponent(...).aiService.conversationHistory() in the console).

In every case the next change is one specific, targeted fix to the actual failure mode rather than a defensive sweep.

Linear: https://linear.app/unstoppable-domains/issue/SRE-1963

Test plan

  • tsc clean (no behavior change; logging only)
  • Deploy to alpha
  • Repro the original prompt ("further drill in to renewal explosion" or any multi-tool prompt) with DevTools console open
  • Capture the [ai-chat] frame … and [ai-chat] dropped malformed SSE frame … output and compare to the matching GCP backend turn log for the session
  • Open a follow-up PR with the targeted fix, removing the console.debug and ideally keeping the console.warn as a permanent guardrail

🤖 Generated with Claude Code

… (SRE-1963)

PR #141 was supposed to make tool pills persist in the chat scrollback,
but on alpha they still vanish: pills appear IN_FLIGHT during execution,
then by the end of the response no pills remain (and turn-0 / turn-1
assistant text gets concatenated with no per-turn split).

Backend GCP logs (ud-registry-alpha-gke, k8s_container) confirm the
orchestrator emitted ToolUseEvent + ToolResultEvent for every tool
call — so the SSE wire definitely carries the frames; the frontend
loses them somewhere.

Static analysis of ai-analysis.service.ts couldn't pinpoint the failure
path. The frontend has no logging, and the SSE handler's catch block
silently swallows every parse error, so any code change is currently
a guess. Add minimal diagnostic surface so the next repro on alpha
tells us which of three failure modes is real:

- console.warn on the previously-silent catch — surfaces JSON.parse
  failures and the offending payload (e.g. chunk-boundary truncation
  or unexpected bytes in `args`).
- console.debug at the dispatch entry — prints the type + tool/status
  of every frame the frontend actually sees, in order. Cross-referenced
  against the GCP turn-by-turn tool log, this immediately shows whether
  frames are being dropped, mis-typed, or fully arriving but lost
  downstream (state mutation or render).

No behavior change. Once the diagnostic points at the actual failure
mode the fix will be one targeted change; the console.debug should be
removed at that point and the console.warn left in as a permanent
guardrail for future SSE parse regressions.
Copilot AI review requested due to automatic review settings May 6, 2026 18:22
@UDtorrey UDtorrey requested a review from a team as a code owner May 6, 2026 18:22
@UDtorrey UDtorrey closed this Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant