fix(tinyagents): keep attempted tool name in the timeline for unavailable tools#4419
Conversation
…vailable tools The tinyagents harness (tinyhumansai#4249) rewrites a call for a tool the agent can't see to UNKNOWN_TOOL_SENTINEL before the progress event fires. The observability bridge then emitted ToolCallStarted{tool_name: sentinel} with the same call_id, and the frontend (keying tool-timeline rows by call_id) overwrote the real streamed name (e.g. web_fetch) with the sentinel — dropping the attempted tool from the UI timeline. The pre-tinyagents engine emitted the real name first, so this regressed the Playwright chat-tool-call specs on every PR. Skip forwarding the sentinel ToolStarted; the streamed tool_args_delta row keeps the attempted name. The sentinel ToolCompleted only updates status by call_id. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughAdds a guard in the tinyagents-to-openhuman observability bridge to skip forwarding ChangesSentinel Tool-Start Suppression
Estimated code review effort: 1 (Trivial) | ~5 minutes Suggested labels: Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/openhuman/tinyagents/observability.rs (1)
262-296: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueGuard correctly suppresses sentinel
ToolStartedforwarding.The match-guard
tool_name.as_str() != UNKNOWN_TOOL_SENTINELcorrectly falls through to_ => {}for sentinel events while still forwarding real tool names via the same arm body, matching the PR's intended fix.One minor nit: the explanatory comment block (Lines 265-274) about why the sentinel is skipped lives inside the branch that only executes for non-sentinel tools, which can read as misplaced since the actual skip happens via the guard clause above it, not in this body.
✏️ Optional: relocate the rationale comment next to the guard
AgentEvent::ToolStarted { call_id, tool_name } + // Skip the sentinel Started event. When the model calls a tool the + // agent can't see, `UnknownToolRewriteMiddleware` rewrites the name + // to `UNKNOWN_TOOL_SENTINEL` *before* this event fires. The frontend + // keys tool-timeline rows by `call_id` and overwrites the name on + // Started, so a real streamed `web_fetch` row would be clobbered to + // the sentinel — dropping the attempted tool name from the timeline. + // The streamed `tool_args_delta` row (carrying the attempted name) + // survives, and the sentinel `ToolCompleted` only updates status by + // `call_id`. if tool_name.as_str() != super::tools::UNKNOWN_TOOL_SENTINEL => { - // Skip the sentinel Started event. When the model calls a tool the - // agent can't see, `UnknownToolRewriteMiddleware` rewrites the name - // to `UNKNOWN_TOOL_SENTINEL` *before* this event fires. The frontend - // keys tool-timeline rows by `call_id` and overwrites the name on - // Started, so a real streamed `web_fetch` row would be clobbered to - // the sentinel — dropping the attempted tool name from the timeline - // (regression vs. the pre-tinyagents engine, which emitted the real - // name before the availability block). The streamed - // `tool_args_delta` row (carrying the attempted name) survives, and - // the sentinel `ToolCompleted` only updates status by `call_id`. let iteration = self.iteration();🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/tinyagents/observability.rs` around lines 262 - 296, The rationale comment for skipping the sentinel ToolStarted event is currently placed inside the non-sentinel branch, which makes it look like it explains the forwarded path instead of the guard that actually filters UNKNOWN_TOOL_SENTINEL. Move or rewrite the explanation in AgentEvent::ToolStarted handling so it sits next to the match guard on tool_name.as_str() != super::tools::UNKNOWN_TOOL_SENTINEL, keeping the comment focused on why sentinel events are excluded before the body runs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/openhuman/tinyagents/observability.rs`:
- Around line 262-296: The rationale comment for skipping the sentinel
ToolStarted event is currently placed inside the non-sentinel branch, which
makes it look like it explains the forwarded path instead of the guard that
actually filters UNKNOWN_TOOL_SENTINEL. Move or rewrite the explanation in
AgentEvent::ToolStarted handling so it sits next to the match guard on
tool_name.as_str() != super::tools::UNKNOWN_TOOL_SENTINEL, keeping the comment
focused on why sentinel events are excluded before the body runs.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5ba9cd7c-b1ed-4aad-af55-b06304cd96b4
📒 Files selected for processing (1)
src/openhuman/tinyagents/observability.rs
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: edb81275be
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| AgentEvent::UsageRecorded { usage } => self.record_usage(usage), | ||
| AgentEvent::ToolStarted { call_id, tool_name } => { | ||
| AgentEvent::ToolStarted { call_id, tool_name } | ||
| if tool_name.as_str() != super::tools::UNKNOWN_TOOL_SENTINEL => |
There was a problem hiding this comment.
Preserve a row when no args delta exists
When an unavailable tool call comes from a buffered/non-streaming provider, this guard drops the only start event. The fallback row the comment relies on is emitted only from streamed ProviderDelta::ToolCallStart/ToolCallArgsDelta in ThinkingForwarder, while ProviderModel::invoke uses stream: None and the streaming adapter documents that non-streaming providers return only the terminal Completed item. In those contexts the later ToolCompleted only updates an existing row by call_id, so the unavailable tool attempt disappears from the timeline/ledger instead of showing the attempted tool; please synthesize the row from requested_tool or only suppress the sentinel when a prior args-delta row exists.
Useful? React with 👍 / 👎.
M3gA-Mind
left a comment
There was a problem hiding this comment.
Reviewed the change on src/openhuman/tinyagents/observability.rs. Overall this is a well-scoped, well-explained fix with a good regression guard — the match-guard + _ => {} fall-through correctly drops the sentinel ToolStarted so it can't clobber the streamed timeline row keyed by call_id. The sentinel is a distinctive constant (__openhuman_unknown_tool__), so there's no real-tool collision risk, and humanize_tool_name is now correctly never applied to the sentinel. I verified the ordering premise: the real name reaches the UI via ToolCallArgsDelta (emitted in tinyagents/model.rs::note_tool_call) during model streaming, which happens before UnknownToolRewriteMiddleware::before_tool rewrites the call — so on the streaming path the attempted name genuinely survives.
A few things worth confirming before merge:
1. Completed-only / non-streaming tool-call path (main concern). The surviving "real name" row is created by note_tool_call, which only fires when the provider streams ProviderDelta::ToolCallStart (model.rs:276). But model.rs:412 notes "Native tool calls always ride on Completed." For a provider/model that delivers the tool call only on the terminal Completed item (no streamed start delta), no ToolCallArgsDelta row is ever created — so skipping the sentinel ToolStarted would leave the unavailable-tool attempt with no timeline row at all, which is arguably worse than the pre-fix sentinel-named row. Could you confirm which providers/models actually hit this path in practice? If any do, consider a fallback that still surfaces the attempt (the middleware stashes the real name in arguments.requested_tool — middleware.rs:~785), rather than dropping it silently.
2. The sentinel still leaks through ToolCompleted (main concern). The ToolStarted arm is now guarded, but the ToolCompleted arm (observability.rs:297) is unchanged and still forwards ToolCallCompleted { tool_name: "__openhuman_unknown_tool__", .. }. The comment says the frontend "only updates status by call_id" on completion — but that's a cross-boundary (TS) contract that isn't asserted anywhere in this Rust-only PR. If the frontend ever reads/renders tool_name on completion (status label, analytics, humanize), the raw sentinel string could surface. Skipping the Completed event isn't safe (the row still needs its done-status), so forwarding-for-status is the right call — but please either link the exact frontend handler that guarantees the name is ignored on completion, or add a paired frontend test pinning that contract. As-is, the end-to-end "keep the attempted name" guarantee depends on an unverified frontend assumption.
3. Test coverage. sentinel_tool_started_is_not_forwarded is a solid guard for the Started path, but it only covers that half. Consider adding: (a) a companion assertion that the sentinel ToolCompleted still forwards a ToolCallCompleted (documents the status-only contract from concern #2), and ideally (b) an end-to-end ordering test — a real-name ToolCallArgsDelta (or ToolCallStart) followed by a sentinel ToolStarted — so the "the real row survives" guarantee is pinned against future refactors rather than implied. Minor: an empty tool_name passes the guard ("" != sentinel) and would forward a blank-named Started; that's pre-existing behavior, just noting it isn't affected here.
Local validation (from gh pr checkout):
cargo fmt --checkon the file — clean.cargo clippy --manifest-path Cargo.toml --lib— no warnings onobservability.rs(only pre-existing unused-import warnings in unrelated modules).cargo test ... tinyagents::observability— the crate compiled, but I could not run the tests locally: linking the test binary fails on this Apple-Silicon host with an unrelated toolchain issue (whisper-rs-sysMetal symbol_OBJC_CLASS_$_MTLResidencySetDescriptorundefined). That's an environment limitation, not a code problem — the PR's Rust compiled fine and reached the link stage; CI is the authoritative gate for the test run.
None of the above is blocking from a code-correctness standpoint for the common streaming path — concerns #1 and #2 are about edge coverage and an unstated frontend contract. Nice, targeted fix.
Summary
Fixes a repo-wide Playwright regression: the chat tool-call specs (
chat-tool-call-flow,chat-multi-tool-round) fail on every open PR because the tool timeline drops the attempted tool name for a tool the agent can't see.Problem
The tinyagents harness (#4249) now routes every chat turn. When the model calls a tool that isn't in the agent's visible set — e.g.
web_fetch,web_search_tool,file_read, which the E2E specs deliberately force via the mock —UnknownToolRewriteMiddlewarerewrites the call name toUNKNOWN_TOOL_SENTINEL(__openhuman_unknown_tool__) before the progress event fires.The observability bridge then emitted
AgentProgress::ToolCallStarted { tool_name: sentinel }with the samecall_id. The frontend keys tool-timeline rows bycall_idand overwrites the name on Started (ChatRuntimeProvider.tsx), so the real streamedweb_fetchrow was clobbered to the sentinel.toolTimelineNames(...).some(n => n.includes('web_fetch'))then returnsfalse, and the specs time out after 20s.The pre-tinyagents engine emitted the real tool name before the availability block, so the timeline showed the attempted tool even for a blocked call — hence green before #4249. This now fails on #4388, #4389, #4390, and #4393, confirming it's a
mainregression, not any single PR.Fix
Skip forwarding the sentinel
ToolStartedin the observability bridge (src/openhuman/tinyagents/observability.rs) via a match guard. The streamedtool_args_deltarow keeps the attempted tool name, and the sentinelToolCompletedonly updates status bycall_id— so the timeline once again shows the tool the model attempted, matching the pre-#4249 behaviour. No test is weakened.Tests
sentinel_tool_started_is_not_forwardedunit test: a sentinelToolStartedemits noToolCallStarted, while a real tool name still forwards.chat-tool-call-flow/chat-multi-tool-roundspecs (validated in CI's web lane).Submission Checklist
## Related— N/A: no coverage-matrix feature ID applies.Closes #NNN— N/A: regression fix for Improve agent harness with LangGraph-style state machine or Polaris-like architecture #4249; no separate tracking issue.Impact
E2E (Playwright / web lane)check across all open PRs.Related
Summary by CodeRabbit