Skip to content

fix(tinyagents): keep attempted tool name in the timeline for unavailable tools#4419

Merged
senamakel merged 1 commit into
tinyhumansai:mainfrom
sanil-23:fix/tool-timeline-unknown-tool-sentinel
Jul 2, 2026
Merged

fix(tinyagents): keep attempted tool name in the timeline for unavailable tools#4419
senamakel merged 1 commit into
tinyhumansai:mainfrom
sanil-23:fix/tool-timeline-unknown-tool-sentinel

Conversation

@sanil-23

@sanil-23 sanil-23 commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes a repo-wide Playwright regression: the chat tool-call specs (chat-tool-call-flow, chat-multi-tool-round) fail on every open PR because the tool timeline drops the attempted tool name for a tool the agent can't see.

Problem

The tinyagents harness (#4249) now routes every chat turn. When the model calls a tool that isn't in the agent's visible set — e.g. web_fetch, web_search_tool, file_read, which the E2E specs deliberately force via the mock — UnknownToolRewriteMiddleware rewrites the call name to UNKNOWN_TOOL_SENTINEL (__openhuman_unknown_tool__) before the progress event fires.

The observability bridge then emitted AgentProgress::ToolCallStarted { tool_name: sentinel } with the same call_id. The frontend keys tool-timeline rows by call_id and overwrites the name on Started (ChatRuntimeProvider.tsx), so the real streamed web_fetch row was clobbered to the sentinel. toolTimelineNames(...).some(n => n.includes('web_fetch')) then returns false, and the specs time out after 20s.

The pre-tinyagents engine emitted the real tool name before the availability block, so the timeline showed the attempted tool even for a blocked call — hence green before #4249. This now fails on #4388, #4389, #4390, and #4393, confirming it's a main regression, not any single PR.

Fix

Skip forwarding the sentinel ToolStarted in the observability bridge (src/openhuman/tinyagents/observability.rs) via a match guard. The streamed tool_args_delta row keeps the attempted tool name, and the sentinel ToolCompleted only updates status by call_id — so the timeline once again shows the tool the model attempted, matching the pre-#4249 behaviour. No test is weakened.

Tests

  • New sentinel_tool_started_is_not_forwarded unit test: a sentinel ToolStarted emits no ToolCallStarted, while a real tool name still forwards.
  • Restores the Playwright chat-tool-call-flow / chat-multi-tool-round specs (validated in CI's web lane).

Submission Checklist

  • Tests added or updated (happy path + failure/edge case) — unit test above + the restored Playwright specs.
  • Diff coverage ≥ 80% — the changed lines (the guard) are exercised by the new unit test.
  • Coverage matrix updated — N/A: behaviour-only observability fix, no feature row changed.
  • All affected feature IDs listed under ## Related — N/A: no coverage-matrix feature ID applies.
  • No new external network dependencies introduced.
  • Manual smoke checklist updated if this touches release-cut surfaces — N/A.
  • Linked issue closed via Closes #NNN — N/A: regression fix for Improve agent harness with LangGraph-style state machine or Polaris-like architecture #4249; no separate tracking issue.

Impact

  • Runtime impact: agent tool-timeline UI on the tinyagents path. No behaviour change for tools the agent can see.
  • Unblocks the E2E (Playwright / web lane) check across all open PRs.
  • No persistence, migration, security, or network changes.

Related

Summary by CodeRabbit

  • Bug Fixes
    • Prevented placeholder tool-start events from appearing in the activity timeline, so only real tool calls are shown.
    • Improved event handling to avoid overwriting or clobbering visible progress updates.
  • Tests
    • Added regression coverage to ensure placeholder tool events are ignored while valid tool names still appear correctly.

…vailable tools

The tinyagents harness (tinyhumansai#4249) rewrites a call for a tool the agent can't see
to UNKNOWN_TOOL_SENTINEL before the progress event fires. The observability
bridge then emitted ToolCallStarted{tool_name: sentinel} with the same call_id,
and the frontend (keying tool-timeline rows by call_id) overwrote the real
streamed name (e.g. web_fetch) with the sentinel — dropping the attempted tool
from the UI timeline. The pre-tinyagents engine emitted the real name first, so
this regressed the Playwright chat-tool-call specs on every PR.

Skip forwarding the sentinel ToolStarted; the streamed tool_args_delta row keeps
the attempted name. The sentinel ToolCompleted only updates status by call_id.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@sanil-23 sanil-23 requested a review from a team July 2, 2026 12:55
@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a guard in the tinyagents-to-openhuman observability bridge to skip forwarding AgentEvent::ToolStarted events whose tool name matches the sentinel UNKNOWN_TOOL_SENTINEL, preventing spurious progress events. Includes a new regression test validating the suppression.

Changes

Sentinel Tool-Start Suppression

Layer / File(s) Summary
Guard and test for sentinel ToolStarted events
src/openhuman/tinyagents/observability.rs
Adds a sentinel check to skip forwarding ToolStarted events matching UNKNOWN_TOOL_SENTINEL, and adds a test confirming only real tool names produce ToolCallStarted events.

Estimated code review effort: 1 (Trivial) | ~5 minutes

Suggested labels: rust-core, agent, bug

Poem

A sentinel hopped through the code so sly,
Clobbering timelines, oh me, oh my!
Now guarded and quiet, it skips right along,
While real tools still sing their starting song. 🐰
Test in place, all snug and neat—
This rabbit's PR review is complete!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main fix: preserving the attempted tool name in the timeline for unavailable tools.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot added agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. bug rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. labels Jul 2, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/openhuman/tinyagents/observability.rs (1)

262-296: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Guard correctly suppresses sentinel ToolStarted forwarding.

The match-guard tool_name.as_str() != UNKNOWN_TOOL_SENTINEL correctly falls through to _ => {} for sentinel events while still forwarding real tool names via the same arm body, matching the PR's intended fix.

One minor nit: the explanatory comment block (Lines 265-274) about why the sentinel is skipped lives inside the branch that only executes for non-sentinel tools, which can read as misplaced since the actual skip happens via the guard clause above it, not in this body.

✏️ Optional: relocate the rationale comment next to the guard
             AgentEvent::ToolStarted { call_id, tool_name }
+                // Skip the sentinel Started event. When the model calls a tool the
+                // agent can't see, `UnknownToolRewriteMiddleware` rewrites the name
+                // to `UNKNOWN_TOOL_SENTINEL` *before* this event fires. The frontend
+                // keys tool-timeline rows by `call_id` and overwrites the name on
+                // Started, so a real streamed `web_fetch` row would be clobbered to
+                // the sentinel — dropping the attempted tool name from the timeline.
+                // The streamed `tool_args_delta` row (carrying the attempted name)
+                // survives, and the sentinel `ToolCompleted` only updates status by
+                // `call_id`.
                 if tool_name.as_str() != super::tools::UNKNOWN_TOOL_SENTINEL =>
             {
-                // Skip the sentinel Started event. When the model calls a tool the
-                // agent can't see, `UnknownToolRewriteMiddleware` rewrites the name
-                // to `UNKNOWN_TOOL_SENTINEL` *before* this event fires. The frontend
-                // keys tool-timeline rows by `call_id` and overwrites the name on
-                // Started, so a real streamed `web_fetch` row would be clobbered to
-                // the sentinel — dropping the attempted tool name from the timeline
-                // (regression vs. the pre-tinyagents engine, which emitted the real
-                // name before the availability block). The streamed
-                // `tool_args_delta` row (carrying the attempted name) survives, and
-                // the sentinel `ToolCompleted` only updates status by `call_id`.
                 let iteration = self.iteration();
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/tinyagents/observability.rs` around lines 262 - 296, The
rationale comment for skipping the sentinel ToolStarted event is currently
placed inside the non-sentinel branch, which makes it look like it explains the
forwarded path instead of the guard that actually filters UNKNOWN_TOOL_SENTINEL.
Move or rewrite the explanation in AgentEvent::ToolStarted handling so it sits
next to the match guard on tool_name.as_str() !=
super::tools::UNKNOWN_TOOL_SENTINEL, keeping the comment focused on why sentinel
events are excluded before the body runs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/openhuman/tinyagents/observability.rs`:
- Around line 262-296: The rationale comment for skipping the sentinel
ToolStarted event is currently placed inside the non-sentinel branch, which
makes it look like it explains the forwarded path instead of the guard that
actually filters UNKNOWN_TOOL_SENTINEL. Move or rewrite the explanation in
AgentEvent::ToolStarted handling so it sits next to the match guard on
tool_name.as_str() != super::tools::UNKNOWN_TOOL_SENTINEL, keeping the comment
focused on why sentinel events are excluded before the body runs.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5ba9cd7c-b1ed-4aad-af55-b06304cd96b4

📥 Commits

Reviewing files that changed from the base of the PR and between 8c58d3a and edb8127.

📒 Files selected for processing (1)
  • src/openhuman/tinyagents/observability.rs

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: edb81275be

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

AgentEvent::UsageRecorded { usage } => self.record_usage(usage),
AgentEvent::ToolStarted { call_id, tool_name } => {
AgentEvent::ToolStarted { call_id, tool_name }
if tool_name.as_str() != super::tools::UNKNOWN_TOOL_SENTINEL =>

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve a row when no args delta exists

When an unavailable tool call comes from a buffered/non-streaming provider, this guard drops the only start event. The fallback row the comment relies on is emitted only from streamed ProviderDelta::ToolCallStart/ToolCallArgsDelta in ThinkingForwarder, while ProviderModel::invoke uses stream: None and the streaming adapter documents that non-streaming providers return only the terminal Completed item. In those contexts the later ToolCompleted only updates an existing row by call_id, so the unavailable tool attempt disappears from the timeline/ledger instead of showing the attempted tool; please synthesize the row from requested_tool or only suppress the sentinel when a prior args-delta row exists.

Useful? React with 👍 / 👎.

@M3gA-Mind M3gA-Mind left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the change on src/openhuman/tinyagents/observability.rs. Overall this is a well-scoped, well-explained fix with a good regression guard — the match-guard + _ => {} fall-through correctly drops the sentinel ToolStarted so it can't clobber the streamed timeline row keyed by call_id. The sentinel is a distinctive constant (__openhuman_unknown_tool__), so there's no real-tool collision risk, and humanize_tool_name is now correctly never applied to the sentinel. I verified the ordering premise: the real name reaches the UI via ToolCallArgsDelta (emitted in tinyagents/model.rs::note_tool_call) during model streaming, which happens before UnknownToolRewriteMiddleware::before_tool rewrites the call — so on the streaming path the attempted name genuinely survives.

A few things worth confirming before merge:

1. Completed-only / non-streaming tool-call path (main concern). The surviving "real name" row is created by note_tool_call, which only fires when the provider streams ProviderDelta::ToolCallStart (model.rs:276). But model.rs:412 notes "Native tool calls always ride on Completed." For a provider/model that delivers the tool call only on the terminal Completed item (no streamed start delta), no ToolCallArgsDelta row is ever created — so skipping the sentinel ToolStarted would leave the unavailable-tool attempt with no timeline row at all, which is arguably worse than the pre-fix sentinel-named row. Could you confirm which providers/models actually hit this path in practice? If any do, consider a fallback that still surfaces the attempt (the middleware stashes the real name in arguments.requested_toolmiddleware.rs:~785), rather than dropping it silently.

2. The sentinel still leaks through ToolCompleted (main concern). The ToolStarted arm is now guarded, but the ToolCompleted arm (observability.rs:297) is unchanged and still forwards ToolCallCompleted { tool_name: "__openhuman_unknown_tool__", .. }. The comment says the frontend "only updates status by call_id" on completion — but that's a cross-boundary (TS) contract that isn't asserted anywhere in this Rust-only PR. If the frontend ever reads/renders tool_name on completion (status label, analytics, humanize), the raw sentinel string could surface. Skipping the Completed event isn't safe (the row still needs its done-status), so forwarding-for-status is the right call — but please either link the exact frontend handler that guarantees the name is ignored on completion, or add a paired frontend test pinning that contract. As-is, the end-to-end "keep the attempted name" guarantee depends on an unverified frontend assumption.

3. Test coverage. sentinel_tool_started_is_not_forwarded is a solid guard for the Started path, but it only covers that half. Consider adding: (a) a companion assertion that the sentinel ToolCompleted still forwards a ToolCallCompleted (documents the status-only contract from concern #2), and ideally (b) an end-to-end ordering test — a real-name ToolCallArgsDelta (or ToolCallStart) followed by a sentinel ToolStarted — so the "the real row survives" guarantee is pinned against future refactors rather than implied. Minor: an empty tool_name passes the guard ("" != sentinel) and would forward a blank-named Started; that's pre-existing behavior, just noting it isn't affected here.

Local validation (from gh pr checkout):

  • cargo fmt --check on the file — clean.
  • cargo clippy --manifest-path Cargo.toml --lib — no warnings on observability.rs (only pre-existing unused-import warnings in unrelated modules).
  • cargo test ... tinyagents::observability — the crate compiled, but I could not run the tests locally: linking the test binary fails on this Apple-Silicon host with an unrelated toolchain issue (whisper-rs-sys Metal symbol _OBJC_CLASS_$_MTLResidencySetDescriptor undefined). That's an environment limitation, not a code problem — the PR's Rust compiled fine and reached the link stage; CI is the authoritative gate for the test run.

None of the above is blocking from a code-correctness standpoint for the common streaming path — concerns #1 and #2 are about edge coverage and an unstated frontend contract. Nice, targeted fix.

@senamakel senamakel merged commit 4a359bb into tinyhumansai:main Jul 2, 2026
19 of 23 checks passed
@github-project-automation github-project-automation Bot moved this from Todo to Done in Team Openhuman Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. bug rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants