Skip to content

TinyAgents migration parity & bug fixes (#4451–#4469)#4495

Merged
senamakel merged 26 commits into
tinyhumansai:mainfrom
senamakel:agent/parity-all
Jul 4, 2026
Merged

TinyAgents migration parity & bug fixes (#4451–#4469)#4495
senamakel merged 26 commits into
tinyhumansai:mainfrom
senamakel:agent/parity-all

Conversation

@senamakel

@senamakel senamakel commented Jul 4, 2026

Copy link
Copy Markdown
Member

Implements the TinyAgents migration parity & bug-fix tracker (#4470) — the audit of the 1.3→1.5 migration. All 19 child issues addressed in one branch: 18 fixed (each seam-side, no vendor/tinyagents changes), 1 deferred (#4468, pure test-parity — see below).

Built on current main (post #4473 / #4480 / #4483). Each fix was re-checked against live code (some issues were partly reshaped by the migration waves) and stacked sequentially, each passing cargo check --lib before landing; a final full cargo check --lib on the complete branch is green.

Phase 1 — critical correctness & security

Phase 2 — durable-state corruption & leaks

Phase 3 — follow-up PR repairs

Phase 4 — parity restoration

Deferred

⚠️ Tests deferred — this PR is implementation + compile-verification only

Per the working constraint, no tests were written/run in this pass; every fix is verified via cargo check --lib (implementation compiles). A follow-up test pass must:

  1. Add the unit/e2e tests each issue's acceptance criteria call for (the per-commit messages note what's needed), incl. Test parity matrix: port behavior clusters that lost all coverage in the tinyagents migration (spec Phase 11) #4468's loop-level parity matrix.
  2. Reconcile the adapter-inventory assertions in src/openhuman/tinyagents/tests.rs (tool_middleware_len / lifecycle counts): several fixes added middlewares and bumped these, but cargo check --lib does not compile #[cfg(test)], so the cumulative counts are unverified and likely need adjustment once tests compile.

Generated via multi-agent orchestration. Full per-issue rationale is in each commit body.

https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR

Summary by CodeRabbit

  • New Features

    • Expanded tool-failure UI messaging (localized) with new cases for denied actions and expired approvals, including clearer “cause” and “next step”.
    • Enabled the memory update tool (update_memory_md) by default so agents can persist memory changes via the standard path.
    • Improved sub-agent handling and reporting when execution halts due to repeated failures.
  • Bug Fixes

    • Preserved tool-failure “why + what to do next” details across snapshot/restart flows.
    • Improved transcript/progress handling and prevented empty assistant messages from polluting future requests.
    • Strengthened credential redaction for more secret formats.
  • Chores / Other

    • Improved tracing privacy: content is captured only when explicitly enabled.

senamakel added 18 commits July 4, 2026 03:21
… tool errors (tinyhumansai#4451)

The tinyagents harness runs a fatal JSON-schema gate
(`ToolSchema::validate_call`) on every tool call between `before_tool` and
the tool-wrap onion; any missing-required/wrong-type/bad-enum call returns
`TinyAgentsError::Validation`, which propagates out of `run_loop` and aborts
the whole turn with `chat_error`. The legacy engine had no such gate — bad
args came back as recoverable tool results the model self-corrected on.

Fix entirely seam-side (vendor/tinyagents is upstream/read-only):

- Add `SchemaGuardMiddleware` (in src/openhuman/tinyagents/middleware.rs)
  implementing both `Middleware` (before_tool) and `ToolMiddleware`
  (wrap_tool). before_tool re-runs the same validation via the identical
  `spec_to_schema` conversion the runner uses; on failure it records a
  descriptive `invalid arguments for <tool>: <detail>. Expected schema: ...`
  message keyed by call id and rewrites the args to a schema-satisfying stub
  so the crate's fatal gate passes. wrap_tool then short-circuits the flagged
  call with a synthetic failed `ToolResult` before the stub reaches the tool,
  so the loop continues and the model self-corrects. Installed as the
  outermost tool-wrap middleware.
- Fix `ArgRecoveryMiddleware` to stop coercing non-object args to `{}` when
  the schema has required fields: it now decodes JSON-encoded-string args
  (stripping markdown fences), coerces to `{}` only for permissive schemas,
  and otherwise leaves the args for the schema-guard tool-error path (the old
  behaviour guaranteed a fatal "<field> is required" abort).
- Update the middleware-inventory assertions (lifecycle 14->15, tool 2->3).

Tests deferred: tests/agent_harness_e2e.rs needs chat + subagent regressions
where a scripted provider emits one malformed call then a corrected one,
asserting the turn continues with no chat_error.

Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
Sub-agent tool exposure had inverted from fail-closed to fail-open after
the TinyAgents migration: the registration predicate
`allowed.is_empty() || allowed.contains(name)` treated an EMPTY allowlist
as "register every tool", so a tool-less sub-agent (`ToolScope::Named([])`,
a zero-match `skill_filter`, or a `named` list that resolved to nothing)
silently inherited the parent's full tool surface — shell, file-write, and
spawn/delegate — a prompt-injection privilege-escalation path.

Change the plumbed allowlist type to `Option<HashSet<String>>` through the
shared seam (`run_turn_via_tinyagents_shared` -> `assemble_turn_harness`):
`None` = no filter (register all visible tools); `Some(set)` = register
exactly those tools, so `Some(empty)` is a genuine deny-all. The sub-agent
path now passes `Some(allowed_names)` (always a concrete, resolved set), so
an empty resolution denies all instead of inheriting everything. The
chat/channel paths keep their historical "empty = all" convention by
mapping an empty/absent set to `None`. A warn is logged when the allowlist
resolves empty so misconfigured agents are diagnosable.

Also re-assert the "sub-agents never spawn" invariant at registration time:
spawn/delegate/worker-thread tool names are stripped from any sub-agent run
regardless of allowlist contents (defense-in-depth beyond the caller-side
`allowed_indices` strip). The tool-exposure shadow middleware is updated to
the same fail-closed `Option` semantics so its divergence reference matches
what is actually registered.

Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
…uts (tinyhumansai#4453)

The legacy engine scrubbed every tool output before it entered model
context; the tinyagents path dropped that call site and scrub_credentials
had zero production callers. Secrets in tool output reached model context,
on-disk transcripts, worker-thread mirrors, and the tool-outcome sink.

Re-introduce scrubbing as CredentialScrubMiddleware, a wrap_tool middleware
registered as the innermost tool wrap on the shared assemble_turn_harness
seam, so it scrubs the RAW tool result before the after_tool chain
(summarization/caps), the transcript push, and the tool-outcome capture
sink — covering parent, subagent, persisted transcript, and ToolCallOutcome
by construction. Extend the scrub patterns to catch bare AKIA/ASIA AWS keys,
sk- OpenAI keys, and space-separated Bearer tokens; also scrub JSON raw
payloads recursively. Update the adapter-inventory assertion (3 -> 4).

Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
…age (tinyhumansai#4454)

The trace exporter attached the user prompt and model reply to the turn span
unconditionally, so spans_to_ndjson/export_spans serialized plaintext to the
NDJSON file (or printed it at info level to the app log) even with the default
capture_content=false. Gate content at the span-STORAGE level instead: the
SpanCollector now carries a capture_content flag and drops TurnContent unless
opted in, so no exporter — NDJSON, app log, or Langfuse push — can ever
serialize prompt/reply text. The web progress bridge wires the flag from
config; the TurnContent emit in session/turn/core.rs also respects the gate as
defense in depth; and export_spans no longer logs span content at info (metadata
at info, NDJSON body at debug only).

Separately fix Langfuse usageDetails double-counting: input_tokens is inclusive
of cached tokens (cost.rs treats cached as a subset of input), but Langfuse sums
usageDetails components as disjoint buckets. Emit input = input_tokens -
cached_input_tokens so non_cached_input + cache_read + output reconcile to the
explicit total.

Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
…rs (tinyhumansai#4455)

Mid-turn steer/collect messages are injected as `Message::user(...)` and
appended to the harness working transcript. Post-run persistence sliced the
returned conversation with the "suffix after the last user message" convention
(`messages_since_last_user`), so an injected steer MOVED that boundary — every
pre-steer assistant/tool round plus the steer text itself vanished from
persisted history, the next-turn KV-cache prefix, and subagent checkpoints.

Replace the last-user-suffix convention on the persistence path with an
explicit boundary: capture `base_len = input.len()` (the request transcript
length) BEFORE `run_loop` consumes it, then persist `run.messages[base_len..]`
via the new `convert::messages_since_request`. The crate returns the full
transcript in `run.messages` (the loop seeds `messages = input` and only
appends; the compression/trim middleware rewrites only the per-call
`request.messages.clone()`), so slicing at `base_len` yields the complete
post-request transcript — pre-steer rounds, the framed steer messages, and
post-steer rounds, in execution order. Applied to both the thin
`run_turn_via_tinyagents` and the shared `run_turn_via_tinyagents_shared` used
by the parent (`session/turn/core.rs`) and subagent
(`subagent_runner/ops/graph.rs`) paths, so checkpoints, cap digests, and worker
mirrors see the full transcript. Steer framing (`[User steering message]:` /
`[Additional context from user]:`) is preserved automatically since the
injected user messages now fall inside the persisted slice.

`messages_since_last_user` is retained (test-covered) but gated
`#[cfg_attr(not(test), allow(dead_code))]` as it no longer has a production
caller.

Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
…inyhumansai#4456)

run_turn_via_tinyagents_shared spawned a 50ms steering-forwarder poll
loop whose cleanup (abort + steering-registry deregister) only ran on
normal return. Cancellation here is drop-based (web select! cancel token;
sub-agent AbortHandle), so a cancelled turn detached the task to loop
forever, pinning the session-owned Arc<RunQueue> + SteeringHandle and
racing the next turn's forwarder into a dead handle. Aborted sub-agent
registry entries were also never removed.

Wrap the forwarder in an abort-on-drop RAII guard
(SteeringForwarderGuard) held across the drive future: its Drop aborts
the poll task, deregisters the sub-agent steering handle, and drains
residual delivered-but-unapplied steers back into the session RunQueue
so a late steer becomes the next turn's input instead of vanishing.
Cleanup now runs identically on normal return, error, and
drop-cancellation. A process-wide ACTIVE_FORWARDERS counter
(active_steering_forwarders()) lets tests assert zero live tasks after a
cancel. Re-emit delivery/requeue observability via new
RunQueueMessageDelivered + RunQueueSteerRequeued DomainEvents.

Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
…le error_slot (tinyhumansai#4457)

Turn-finalization audit fixes for the TinyAgents migration:

A. EmptyProviderResponse no longer leaves a blank assistant row. On the
   tool_calls==0 empty-completion path, turn/core.rs returned Err with the
   empty Chat(assistant("")) still folded into history (via history.extend),
   so the next request carried an empty-content assistant message and
   strict providers (Anthropic) 400'd the whole thread. Pop the trailing
   empty assistant row before returning, matching the tinyhumansai#4093 branch.

B. Stale provider error no longer masks the real run failure. ProviderModel
   now clears error_slot on every successful (incl. fallback-recovered)
   model call (buffered + streaming), and the runner maps the model-call cap
   / spawn-depth failures BEFORE consulting error_slot, so a run that failed
   over successfully then failed for an unrelated reason surfaces the real
   classification instead of a leftover provider error.

C. Exactly one TurnCompleted per turn, after the wrap-up streams. The seam
   emitted TurnCompleted per parent turn AND the chat session emitted it
   again after summarize_turn_wrapup — the web bridge recorded two ledger
   events + two Completed upserts and turn_active=false landed before the
   cap/tinyhumansai#4093 checkpoint finished streaming. New defer_turn_completed_to_caller
   seam param suppresses the seam emit on the chat path (core.rs is the single
   post-wrap-up emitter); channel/CLI keeps the seam emit.

Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
…rovider calls (tinyhumansai#4460)

ProviderModel::stream ran every streamed provider call in a detached
tokio::spawn producer, which lost the caller's task-locals and left the
call running after cancellation:

- thread_id task-local was None inside the spawn, so the managed backend's
  thread_id extension was omitted on every streamed request.
- the resolved-route audit slot was out of scope, so record_resolved_provider_route
  no-op'd and channel audit fell back to the requested route.
- a hard AbortHandle cancel dropped only the consumer stream; the spawned
  provider.chat ran to completion and was still billed.

Fix, entirely in the openhuman seam:
- Capture thread_id (current_thread_id) and the resolved-route audit slot
  (new current_route_slot) on the caller's task before spawning, and
  re-establish them inside via with_thread_id / with_route_slot so the
  detached provider.chat sees the same ambient context.
- Expose RouteSlot + current_route_slot + with_route_slot from resolved_route
  (explicit out-slot instead of a lost task-local).
- Add AbortOnDrop guard (tinyagents/abort_guard.rs) holding the producer
  JoinHandle and move it into the consumer stream state, so dropping the
  stream/turn future aborts the in-flight provider call. Debug-log aborts.

Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
…rites (tinyhumansai#4458)

The memory read->dedupe->write->update-index protocol can only close its
write cycle via a successful `update_memory_md` call, and every durable
write appends an instruction to call it — but `UpdateMemoryMdTool` was
never registered in `all_tools_with_runtime`; it was only re-exported. The
archivist's `[tools] named` allowlist selects it, but subagents only filter
the parent set, so it silently resolved away, producing a permanent
unsatisfiable "call update_memory_md" nag loop (unknown-tool error → the
tracker never observes IndexUpdate → escalating drift warnings + an
after_agent stale-index warning every run).

Changes:
- Register `UpdateMemoryMdTool` in `all_tools_with_runtime`, always-on like
  the other memory tools; per-agent visibility stays governed by each
  agent's `named` allowlist. Targets the workspace MEMORY.md/SKILL.md.
- Make writes atomic + serialized: temp-file + atomic rename, guarded by a
  process-global per-workspace async lock keyed on the canonicalized
  workspace path. Concurrent index updates now queue instead of racing, and
  a killed process can never leave a truncated MEMORY.md. (Becomes live the
  moment registration is fixed.)
- Classify `remember_preference`/`save_preference` with an explicit,
  documented `MemoryOp::Other` arm: they persist via `Memory::store` but
  into dedicated preference namespaces surfaced by prompt injection /
  per-query recall, not the MEMORY.md wiki — classifying them as writes
  would resurrect the same nag loop.

Deferred (final test pass): registry-inventory assertion mirroring the
adapter-inventory test, plus concurrency/atomicity tests.

Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
… failure in snapshot (tinyhumansai#4459)

Repairs tinyhumansai#4413 tool-failure classification.

A/B/E: the classifier now short-circuits POLICY_BLOCKED_MARKER and
POLICY_DENIED_MARKER *before* the `timed out` sniff, so a user deny →
`Denied` and an approval-TTL expiry → `ApprovalExpired` (both
`FailureCategory::UserDeclined`, non-retryable, honest copy) instead of a
recoverable Unknown "try again" or a Timeout "will retry". Middleware now
sniffs both `error` and `content`, and the dead `"policy denied"` needle is
removed.

C: `ToolTimelineEntry` gains `failure: Option<PersistedToolFailure>`
(camelCase, matching the TS `PersistedToolFailure` contract) threaded through
`mirror.rs`, so a failed tool's explanation survives thread switch / restart.

D: `SubagentToolCallCompleted` gains a `failure` field populated from the
already-computed classification (observability.rs) and forwarded on the web
event + persisted `SubagentToolCall`, so failed sub-agent rows carry an
explanation.

Frontend: TS types (`PersistedSubagentToolCall.failure`,
`SubagentToolCallEntry.failure`, slice mapping), the localized-failure-class
set gains `denied`/`approvalExpired`, and i18n keys land in en.ts + all
locales.

Tests deferred to the final pass.

Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
…yhumansai#4461)

The crate ContextCompressionMiddleware installed in assemble_turn_harness
regressed compaction vs the legacy engine in two ways:

1. A summarizer failure propagated (before_model does summarize(..).await?),
   mapping any provider hiccup to TinyAgentsError::Model and aborting the whole
   turn — on exactly the longest, most valuable threads.
2. Once over the 90% threshold, every model call re-ran a full-transcript
   summarizer LLM call: the crate rewrites only the per-call request clone, so
   the loop's working transcript never shrinks and re-summarizes each iteration.

Wrap the LLM-backed ProviderModelSummarizer in a per-turn
FaultTolerantCachingSummarizer adapter (seam-side, no crate change):

- On Err: warn, trip a per-turn circuit breaker, and return a deterministic
  (LLM-free) front-drop trim of the input instead of propagating. The turn
  continues; later compactions skip the known-bad LLM. Restores the legacy
  warn + circuit-breaker + deterministic-trim behavior.
- Single-slot content-hash cache (keyed by message count + slice hash) makes a
  repeat identical input slice free — retries / re-issued requests no longer
  re-dispatch the summarizer LLM until the transcript grows again.
- Fallback produces a proper SummaryRecord with provenance, so the existing
  compaction-provenance drain surfaces it (observability preserved).

Compaction failure can no longer fail a run; long over-threshold turns stop
paying a fresh summarizer LLM call for identical input.

Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
…eviction (tinyhumansai#4462)

Replace the crate `MessageTrimMiddleware` with a seam-owned
`ImageAwareMessageTrimMiddleware` that restores the deleted
`harness/token_budget.rs` semantics regressed by the TinyAgents migration:

1. Image marker cost: inline `[IMAGE:…]` markers and native
   `ContentBlock::Image` blocks are each charged a flat 1200 tokens instead of
   pricing the base64 payload at chars/4 (~2M tokens), so a single large image
   no longer reads as massively over budget and evicts the whole transcript.
2. Trim reserve: restore the legacy proportional clamp
   `clamp(window/10, >=512, <=max(8192, window/4))` — an 8k local model's input
   budget is ~7373 again, not the fixed `window - 16384` floored at 1024.
3. System survival + order: system messages are never dropped and never
   reordered to the front; only non-system history is evictable (oldest-first),
   leading orphaned tool results are snapped past.
4. Observability: any eviction emits a grep-able `[tinyagents::mw] message_trim`
   warn carrying messages-dropped and tokens/messages before-and-after.

Fixed entirely in the openhuman seam; vendor/tinyagents untouched.

Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
…om, terminal-inference halt, autonomous iter-cap lift (tinyhumansai#4463)

Restores loop-guard behaviors dropped with the legacy `tool_loop.rs` in the
TinyAgents migration; the crate `no_progress` ladder tracks *failures* only and
does not replace them. All fixes are seam-side (`src/openhuman/**`); vendored
crate untouched.

- RepeatProgressMiddleware (new, `after_model` + batch-aware `after_tool`):
  halts on 3× identical *successful* `(tool, args)` batches (tinyhumansai#4088) and 4×
  identical outputs (tinyhumansai#4095); `wait_subagent` polling exemption ported. Shares
  the halt-summary slot + steering handle with the failure breaker.
- RepeatedToolFailureMiddleware: differentiated headroom restored — recoverable
  classes (timeout/transient markers + tinyhumansai#4445 classifier: Timeout/
  ServiceUnavailable/ModelConnection) get 8 identical / 12 varied instead of the
  crate's fixed 3/6; POLICY_DENIED_MARKER rejoins POLICY_BLOCKED_MARKER for the
  2-repeat hard-reject fast-trip (tinyhumansai#4463 part 6).
- Terminal delegated-inference fast-halt (tinyhumansai#3104): budget-exhausted /
  provider-config-rejection delegation failures halt on first occurrence, gated
  on the delegated-inference envelope so recoverable tool stderr isn't caught.
- Autonomous iter-cap lift (part 7): `autonomous_iter_cap()` was a dead knob —
  wired back in via `subagent_iter_cap_with_autonomous_lift` at the subagent
  `effective_max_iterations()` sites, so task-dispatcher / skill subagents run
  up to TASK_RUN_MAX_ITERATIONS again instead of the normal per-agent cap.
- Updated the adapter-inventory middleware-count assertions (15→16, 11→12).

Tests deferred to the final parity test pass.

Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
…apture sees capped (tinyhumansai#4464)

The tinyagents crate runs after_tool hooks in REVERSE registration order
(MiddlewareStack::run_after_tool iterates .iter().rev()). Two seam
registrations assumed the opposite:

(A) TurnContextMiddleware::install pushed HandoffMiddleware before
ToolOutputMiddleware, so the 16 KiB byte cap truncated FIRST and a capped
~4k-token payload could never cross the 50k-token handoff threshold — the
ResultHandoffCache/extract_from_result drill-in was unreachable. Flip the
push order (tool-output budget first, handoff last) so the effective
after_tool chain is handoff(raw) -> summarizer/tokenjuice/caps.

(B) ToolOutcomeCaptureMiddleware was pushed AFTER context_mw.install, so its
after_tool ran BEFORE the caps and the sink held full raw output per call for
the whole turn (multi-MB) while failure classification / output_summary saw
pre-cap content. Move its push BEFORE context_mw.install so it runs AFTER the
caps and records the final capped content.

Comments now state the crate's reverse-order rule explicitly. Keeps tinyhumansai#4453
credential-scrub ordering consistent.

Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
…tinyhumansai#4465)

The tinyagents migration (model.rs -> agent/harness/parse.rs) kept the
XML/JSON/markdown/GLM grammars but dropped the legacy **P-Format**
positional grammar (`<tool_call>name[arg1|arg2]</tool_call>`) — even
though `PFormat` is the default `ToolCallFormat` and ~10 builtin agent
prompts still teach `name[a|b]`. A model that followed its own
instructions emitted calls the parser logged as "malformed <tool_call>
JSON" and silently dropped, so the turn continued as if no tool ran.

Restore parity (option (a)) entirely in the openhuman seam:

- `parse::parse_tool_calls_with_pformat(text, &PFormatRegistry)` walks
  the `<tool_call>`-family tags and, per tag, prefers the registry-driven
  positional parse (existing `agent::pformat::parse_call`) and falls back
  to the JSON entry the canonical parser produced — the exact per-tag
  selection the legacy `PFormatToolDispatcher` did. Empty registry
  short-circuits to `parse_tool_calls`, so non-PFormat paths are
  behaviour-neutral. `ParsedToolCall` gains `Clone` for the fallback copy.
- `tinyagents::model` builds a `PFormatRegistry` from the advertised
  `request.tools` schemas and routes the text-mode fallback through the
  new parser in both the buffered (`invoke`) and streaming (`stream`)
  paths.

Verbose `[agent_parse]` debug logging on the recovery path (tool name
only — never the argument body, which may carry user data).

vendor/tinyagents untouched (upstream/read-only).

Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
…yhumansai#4466)

The TinyAgents-migrated sub-agent path lost several behaviours the legacy
per-iteration observer provided. Fix them entirely in the openhuman seam:

1. Failed runs persisted nothing. The graph `?`-returned on a harness error
   before `persist_subagent_transcript` / `mirror_worker_thread`, so a crash
   mid-run left no `session_raw` transcript (breaking `transcript_ingest`) and
   an empty worker thread. Add a `TranscriptSnapshotMiddleware` that mirrors
   each `before_model` transcript into an openhuman-owned buffer; on the error
   path persist those completed rounds + mirror them to the worker thread with
   a failure marker, then return the error.

2. Worker-thread mirror metadata was reduced to `{scope, agent_id, task_id}`.
   Restore the legacy `iteration` / `final` / `tool_calls` / `tool_call_id` /
   `tool_name` / `mode` fields.

3. Lossy streaming: the progress bridge used `try_send` and dropped deltas on a
   full channel. Keep the synchronous fast path but, on `Full`, defer to an
   awaited `send()` on a spawned task (backpressure) instead of silently
   dropping — the `EventListener` callback is sync so it cannot `.await` inline.

4. Sub-agent TokenJuice compaction was silently off (`TurnContextMiddleware::
   defaults()` = compression Off). Build the sub-agent context middleware from
   the live `[context]` config + `definition.effective_tokenjuice_compression()`
   (compaction, byte budget, microcompact, autocompact opt-out), same as chat.

5. Breaker halt reported `Completed`: circuit-breaker-halted children returned
   `hit_cap=false`, so parents treated a halted child as finished. Carry a
   `breaker_halt` reason on the turn outcome and map it to `Incomplete`.

cargo check --lib: clean. Tests deferred to the final test pass.

Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
…-run usage, tool timing + DomainEvents (tinyhumansai#4467)

Restore cost-accounting and telemetry parity lost on the tinyagents path:

- Charged USD (item 1): thread the provider `UsageInfo` (backend-charged
  USD + context window + cache-creation/reasoning tokens the crate `Usage`
  drops) from the model adapter to the event bridge via a shared FIFO carry;
  the bridge now prefers the provider's charged amount over the catalogue
  estimate (charged > estimate precedence) and records a real context window.
- Cap-summary usage (item 2): fold all four token fields, price the call, and
  feed the global cost tracker directly for the sub-agent cap-hit checkpoint
  call (it bypasses the harness bridge, so its spend was previously lost).
- Failed unobserved runs (item 3): attach the event bridge for every turn —
  including unobserved background/cron turns — so per-call usage reaches the
  cost tracker during the run and a later Err return no longer drops spend.
- Tool timing (item 4): measure real `elapsed_ms` in `execute_openhuman_tool`
  and carry `elapsed_ms`/`output_chars` through the outcome side-channel so
  `ToolCallCompleted` shows real duration + output size instead of 0/0.
- SSE per-tool events (item 5): re-publish `DomainEvent::ToolExecutionStarted`
  / `ToolExecutionCompleted` around session tool execution.
- Child arg deltas (item 6): drop child-scoped `ToolCallArgsDelta` so a child
  run's argument composition no longer renders as the parent's own activity.
- Unknown tools (item 7): default a missing tool outcome to `success: false`
  (hallucinated/unknown tool) in the post-turn records and derive the cap
  digest's per-tool `[ok|failed]` status from the captured outcomes.

Entirely in the openhuman seam; vendor/tinyagents untouched. cargo check --lib
clean. Tests deferred to the final parity test pass.

Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
…nyhumansai#4469)

Batchable low-severity parity items from the TinyAgents-migration audit:

- item 1: correct the GoalBudgetStopHook "hard-stops the moment" docs
  (runtime.rs, turn/core.rs) to the actual graceful-pause semantics —
  the stop vote is drained at the next iteration boundary, so the current
  tool round + one wrap-up call still run (bounded overshoot).
- item 2: forward is_local_provider / is_local_provider_for_model /
  loaded_context_window / prompt_cache_capabilities through
  TextModeProvider so a local provider behind text mode keeps its
  n_keep>=n_ctx guard and KV-cache pricing.
- item 3: recover poisoned mutexes via PoisonError::into_inner on the
  error_slot recovery reads (tinyagents/mod.rs) and the EarlyExitHook
  slots (tinyagents/tools.rs) instead of panicking.
- item 5: cap-summary output cap now honours the sub-agent definition's
  own max_output_tokens budget instead of the global
  AGENT_TURN_MAX_OUTPUT_TOKENS floor.
- item 6: enforce a minimum envelope allowance in aggregate tool-result
  spill so a saturated allowed_len can't blank the [tool_result_preview]
  envelope and strip its artifact_path pointer.
- item 8: remove the dead rolling_segment_recap method (zero callers
  since the migration) + refresh its module/helper docs.
- item 9: fix the misleading "falling back to inline truncation" log —
  the envelope is kept regardless for its artifact_path pointer.
- item 10: drop stale model_vision_context doc links
  (turn_attachments_context.rs, tokenjuice/savings.rs).
- item 11: remove deleted harness/interrupt.rs from agent/README.md.
- item 12: payload-summarizer threshold doc default 500000 -> 4000.
- item 13: correct the TraceSpan doc — raw NDJSON is an OTel-style span
  dump, not directly Langfuse-ingestible (that is spans_to_langfuse_batch).

Items 4 (max_iterations==0 -> 10) and 7 (LazyToolkitResolver / mid-turn
resync) are already documented / tracked in-code; no change needed.

Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
@senamakel senamakel requested a review from a team July 4, 2026 10:19
@coderabbitai

coderabbitai Bot commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

You’ve reached a temporary PR review limit under our Fair Usage Limits Policy.

Your recent review volume is higher than typical usage, so adaptive limits are currently applied.

Next review available in: 23 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b643ee8d-0ef5-4b03-bd34-a28e27afd38a

📥 Commits

Reviewing files that changed from the base of the PR and between 26dbf59 and 4aa6937.

📒 Files selected for processing (4)
  • src/openhuman/agent/harness/parse.rs
  • src/openhuman/agent/progress_tracing/tests.rs
  • src/openhuman/tinyagents/steering_forwarder.rs
  • src/openhuman/tools/impl/filesystem/update_memory_md.rs
📝 Walkthrough

Walkthrough

This PR adds Denied/ApprovalExpired tool-failure handling across classification, persistence, progress propagation, and UI/i18n; restores TinyAgents steering, middleware, parsing, usage, and subagent execution paths; gates stored turn content and corrects Langfuse usage accounting; and adds atomic memory writes plus artifact preview budgeting fixes.

Changes

Tool Failure Classification and Persistence

Layer / File(s) Summary
Failure contract and classification
src/openhuman/tool_status/types.rs, src/openhuman/tool_status/ops.rs
Adds Denied/ApprovalExpired failure classes and UserDeclined, updates classification precedence, and adds user-facing copy.
Failure persistence across snapshots
src/openhuman/threads/turn_state/types.rs, src/openhuman/threads/turn_state/mirror.rs, src/openhuman/threads/turn_state/mirror_tests.rs, src/openhuman/threads/turn_state/store_tests.rs
Adds persisted failure payloads and threads them through timeline entries, subagent calls, mirror updates, and tests.
Progress bridge, runtime, and locale copy
src/openhuman/agent/progress.rs, src/openhuman/channels/providers/web/progress_bridge.rs, src/bin/harness_subagent_audit.rs, app/src/store/chatRuntimeSlice.ts, app/src/types/turnState.ts, app/src/pages/conversations/components/ProcessingTranscriptView.tsx, app/src/lib/i18n/*.ts
Carries failure data through progress events, web payloads, runtime parsing, UI rendering, and locale strings.

Estimated code review effort: 5 (Critical) | ~120 minutes

TinyAgents Harness Restoration

Layer / File(s) Summary
Run queue events and steering forwarder
src/core/event_bus/events.rs, src/openhuman/tinyagents/steering_forwarder.rs
Adds run-queue delivery/requeue events and the RAII steering forwarder guard that drains, injects, and requeues messages.
Turn orchestration and channel/session wiring
src/openhuman/tinyagents/mod.rs, src/openhuman/tinyagents/convert.rs, src/openhuman/agent/harness/graph.rs, src/openhuman/agent/harness/session/turn/core.rs, src/openhuman/agent/harness/session/turn/graph.rs, src/openhuman/agent/harness/turn_attachments_context.rs, src/openhuman/thread_goals/runtime.rs
Updates allowlist semantics, defer-completion behavior, request boundaries, empty-response handling, turn-content gating, and related orchestration comments.
Middleware stack, credential scrubbing, and schema guard
src/openhuman/tinyagents/middleware.rs, src/openhuman/agent/harness/credentials.rs, src/openhuman/agent/harness/mod.rs, src/openhuman/agent/harness/memory_protocol.rs
Restores loop guards, transcript snapshots, tool exposure, credential redaction, schema validation, and argument recovery.
Provider model parsing and usage carry
src/openhuman/tinyagents/model.rs, src/openhuman/tinyagents/abort_guard.rs, src/openhuman/agent/harness/parse.rs, src/openhuman/inference/provider/mod.rs, src/openhuman/inference/provider/resolved_route.rs, src/openhuman/agent/harness/agent_graph.rs
Adds abort-on-drop task handling, P-Format-aware parsing, route-slot propagation, and provider usage carry plumbing.
Fault-tolerant summarizer and observability bridge
src/openhuman/tinyagents/summarize.rs, src/openhuman/tinyagents/observability.rs
Adds cached deterministic fallback summarization and records richer tool-failure and usage data through the bridge.
Subagent runner tokenjuice and breaker halt
src/openhuman/agent/harness/subagent_runner/*, src/openhuman/tinyagents/tests.rs
Threads TokenJuice compression and breaker-halt state through subagent execution, worker-thread mirroring, and related tests.
Tool telemetry and harness tests
src/openhuman/tinyagents/tools.rs, src/openhuman/tinyagents/tests.rs
Publishes tool execution timing events and updates harness tests for allowlist, completion-deferment, and inventory changes.

Estimated code review effort: 5 (Critical) | ~150 minutes

Sequence Diagram(s)

sequenceDiagram
  participant ToolStatusOps
  participant TurnStateMirror
  participant ProgressBridge
  participant ProcessingTranscriptView
  ToolStatusOps->>TurnStateMirror: classify and persist failure
  TurnStateMirror->>ProgressBridge: forward failure payload
  ProgressBridge->>ProcessingTranscriptView: render localized failure copy
Loading

Observability Content-Capture Gating and Cost Accounting

Layer / File(s) Summary
SpanCollector content capture gate
src/openhuman/agent/progress_tracing.rs, src/openhuman/agent/progress_tracing/tests.rs, src/openhuman/channels/providers/web/progress_bridge.rs
Adds a storage-time content-capture flag, gates TurnContent attachment and export logging, and updates related tests and bridge wiring.
Langfuse usage bucket accounting
src/openhuman/agent/progress_tracing/langfuse.rs
Changes usageDetails.input to the non-cached input-token bucket while preserving total reconciliation.

Estimated code review effort: 3 (Moderate) | ~25 minutes

Memory Tool Atomicity and Supporting Fixes

Layer / File(s) Summary
Atomic memory writes and tool registration
src/openhuman/tools/impl/filesystem/update_memory_md.rs, src/openhuman/tools/ops.rs, src/openhuman/agent/harness/memory_protocol.rs
Adds per-workspace write locking and atomic rename writes for memory updates, registers the memory tool, and reclassifies preference writes.
Tool-result artifact preview budgeting
src/openhuman/agent/harness/tool_result_artifacts/mod.rs
Introduces a minimum envelope allowance and updates persistence/spill handling so artifact pointers survive very small budgets.
Documentation and comment updates
src/openhuman/agent/harness/archivist/recap.rs, src/openhuman/agent/README.md, src/openhuman/thread_goals/runtime.rs, src/openhuman/agent/harness/turn_attachments_context.rs, src/openhuman/tokenjuice/savings.rs, src/openhuman/tinyagents/payload_summarizer.rs
Updates comments/docs for the retired rolling recap entry point, public surface, pause semantics, turn attachment context, tokenjuice savings, and payload summarizer threshold.

Estimated code review effort: 3 (Moderate) | ~30 minutes

Possibly related issues

Possibly related PRs

Suggested labels: feature, rust-core, agent, memory, bug

Suggested reviewers: M3gA-Mind

Poem

A rabbit hops through code so bright,
Denied and expired now read right.
Steers return and drains unwind,
TinyAgents leave less behind.
Content hides till flags say "yes" —
A tidy burrow, more or less 🐇

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the PR’s broad TinyAgents migration parity and bug-fix scope.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

Comment @coderabbitai help to get the list of available commands.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 294a364372

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/openhuman/tinyagents/steering_forwarder.rs Outdated
Comment thread src/openhuman/agent/harness/parse.rs Outdated
senamakel added 3 commits July 4, 2026 10:29
# Conflicts:
#	src/openhuman/tinyagents/tests.rs
…er upstream merge

Post-merge integration fix: tinyhumansai#4459 added a `failure` field to
AgentProgress::SubagentToolCallCompleted; the merged unknown-tool
recovery path (observability.rs) needed to populate it.

Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
@coderabbitai coderabbitai Bot added agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. bug rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. labels Jul 4, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 277f6f6101

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +279 to +280
handle.spawn(async move {
let _ = tx.send(progress).await;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve progress ordering under backpressure

When the progress channel is full, each event is handed to its own spawned task. Those tasks are scheduled independently, so a later completion can be enqueued before the earlier start event; TurnStateMirror only updates an existing row for ToolCallCompleted by call_id, so an out-of-order completion is dropped and the UI/persisted turn can leave that tool row stuck in running. Use a single ordered drain/worker or otherwise serialize the fallback sends instead of spawning one task per event.

Useful? React with 👍 / 👎.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/openhuman/channels/providers/web/progress_bridge.rs (1)

416-426: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Align the live failure payload with the persisted shape ClassifiedFailure serializes as snake_case here (and in the subagent completion path), but PersistedToolFailure is camelCase (causePlain/nextAction). The same failure data ends up under different field names across live vs. reloaded turns; route the live payload through the same camelCase type or add a matching serde rename on ClassifiedFailure.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/channels/providers/web/progress_bridge.rs` around lines 416 -
426, The live ToolCallCompleted failure payload is being serialized from
ClassifiedFailure with snake_case field names, while the persisted/reloaded path
uses PersistedToolFailure with camelCase names. Update the failure handling in
progress_bridge.rs so the UI and ledger use the same shape as the persisted
data, either by routing the live serialization through PersistedToolFailure or
by adding matching serde rename annotations on ClassifiedFailure; make the same
adjustment in the subagent completion path if it uses the same failure payload.
🧹 Nitpick comments (2)
src/openhuman/agent/harness/archivist/recap.rs (1)

71-79: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Stale doc references removed "rolling recap" caller pattern.

The unchanged doc block above still describes callers driving "the live prompt (rolling recap → compaction)" as if rolling_segment_recap still exists, but this PR removes that entry point entirely. Worth a follow-up cleanup to avoid confusing future readers.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/harness/archivist/recap.rs` around lines 71 - 79, The doc
comment on the recap helper still references the removed live-prompt “rolling
recap → compaction” caller pattern, which is now stale. Update the documentation
on the recap function in recap.rs to remove that obsolete wording and describe
only the remaining finalize/fallback behavior, so the comments match the current
API and no longer mention the deleted rolling_segment_recap path.
src/bin/harness_subagent_audit.rs (1)

579-610: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Consider logging the new failure field for audit visibility.

Other branches in this handler log diagnostic fields (tool_name, success, elapsed_ms, etc.), but the newly added failure field is discarded here. Since this is an audit/diagnostics binary, surfacing failure.class/cause_plain when success is false could aid debugging subagent tool failures, consistent with the #4459 propagation work highlighted elsewhere in this PR.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/bin/harness_subagent_audit.rs` around lines 579 - 610, The
SubagentToolCallCompleted branch in harness_subagent_audit.rs is dropping the
newly added failure diagnostics, so update the handler to log and record the
failure field alongside tool_name, success, and elapsed_ms. Use the
AgentProgress::SubagentToolCallCompleted match arm and the
SubagentToolCompletedEvent push to surface failure.class and cause_plain when
success is false, keeping the audit output consistent with the rest of the
diagnostics in this handler.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/agent/harness/archivist/recap.rs`:
- Around line 1-6: The remaining test target still calls the removed
rolling_segment_recap entry point, so update those call sites before or
alongside deleting it. Find the ArchivistHook-related test usage in
tests/agent_archivist_debug_round21_raw_coverage_e2e.rs and replace
hook.rolling_segment_recap(...) and enabled.rolling_segment_recap(...) with the
current recap path used by ArchivistHook, keeping the test aligned with the
shared on_segment_closed-based summarization flow.

In `@src/openhuman/agent/harness/parse.rs`:
- Around line 758-806: The P-Format recovery loop in parse.rs only advances
through one json_calls entry per <tool_call> tag, so multi-call tags can lose
additional canonical calls and desynchronize later fallbacks. Update the combine
logic inside the while loop around find_first_tag, parse_call, and json_idx so
each recovered tag consumes and appends all canonical calls produced for that
tag rather than a single json_calls[json_idx] item. Keep the existing P-Format
recovery behavior, but ensure the JSON fallback stays in sync with the full set
of canonical calls for the current tag.

In `@src/openhuman/agent/harness/tool_result_artifacts/mod.rs`:
- Around line 381-388: The aggregate spill path in
tool_result_artifacts::mod::bound_text_to_budget is over-allocating because the
per-item floor is always forced to MIN_ENVELOPE_ALLOWANCE_BYTES, which can make
total exceed the overall budget. Update the logic around envelope_allowance so
the minimum is capped by the remaining aggregate budget for the spill batch, and
keep the existing preview-envelope protection only when it fits within the total
limit. Use the bound_text_to_budget call site and the envelope_allowance
calculation in the spill loop to ensure total stays within the requested budget
and the total <= budget assertion continues to hold.

In `@src/openhuman/tinyagents/observability.rs`:
- Around line 265-290: The overflow path in send() can reorder AgentProgress
events because each TrySendError::Full case spawns an independent task and
awaits tx.send(progress) out of sequence. Update the observability flow around
send() and spawn_progress_bridge so backpressure is handled by a single ordered
forwarder/queue rather than per-event spawned sends, preserving the original
progress order for text and tool updates.

In `@src/openhuman/tinyagents/steering_forwarder.rs`:
- Around line 225-298: Residual drained messages lose their original queue mode
because the requeue path in steering_forwarder.rs strips both STEER_PREFIX and
COLLECT_PREFIX but always pushes recovered texts back through RunQueue::push
with QueueMode::Steer. Update the drain/requeue flow around handle.drain(),
requeue_texts, and the detached rt.spawn block to preserve the matched prefix’s
mode and pass that mode through when constructing QueuedMessage, so collect
context is requeued as QueueMode::Collect instead of being mislabeled as steer.

In `@src/openhuman/tools/impl/filesystem/update_memory_md.rs`:
- Around line 55-85: The atomic_write helper in update_memory_md leaves behind a
temp file when tokio::fs::write fails before the rename step. Update
atomic_write to perform best-effort cleanup of tmp_path on the initial write
error as well as in the rename failure branch, so any partially written
.{file}.{pid}.{seq}.tmp is removed regardless of which filesystem operation
fails.
- Around line 15-46: The current workspace write guard in workspace_write_lock
and WORKSPACE_WRITE_LOCKS only prevents races within one process, so concurrent
cron runs can still overwrite MEMORY.md/SKILL.md during update_memory_md.
Replace or augment the in-process mutex with an inter-process locking mechanism
around the read-modify-write flow in update_memory_md so separate
tokio::process::Command invocations contend on the same workspace lock and
serialize writes across processes.

---

Outside diff comments:
In `@src/openhuman/channels/providers/web/progress_bridge.rs`:
- Around line 416-426: The live ToolCallCompleted failure payload is being
serialized from ClassifiedFailure with snake_case field names, while the
persisted/reloaded path uses PersistedToolFailure with camelCase names. Update
the failure handling in progress_bridge.rs so the UI and ledger use the same
shape as the persisted data, either by routing the live serialization through
PersistedToolFailure or by adding matching serde rename annotations on
ClassifiedFailure; make the same adjustment in the subagent completion path if
it uses the same failure payload.

---

Nitpick comments:
In `@src/bin/harness_subagent_audit.rs`:
- Around line 579-610: The SubagentToolCallCompleted branch in
harness_subagent_audit.rs is dropping the newly added failure diagnostics, so
update the handler to log and record the failure field alongside tool_name,
success, and elapsed_ms. Use the AgentProgress::SubagentToolCallCompleted match
arm and the SubagentToolCompletedEvent push to surface failure.class and
cause_plain when success is false, keeping the audit output consistent with the
rest of the diagnostics in this handler.

In `@src/openhuman/agent/harness/archivist/recap.rs`:
- Around line 71-79: The doc comment on the recap helper still references the
removed live-prompt “rolling recap → compaction” caller pattern, which is now
stale. Update the documentation on the recap function in recap.rs to remove that
obsolete wording and describe only the remaining finalize/fallback behavior, so
the comments match the current API and no longer mention the deleted
rolling_segment_recap path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 43fb4bad-d9d4-40d5-b9b7-843a1f216bb1

📥 Commits

Reviewing files that changed from the base of the PR and between 760e5cb and 277f6f6.

📒 Files selected for processing (63)
  • app/src/lib/i18n/ar.ts
  • app/src/lib/i18n/bn.ts
  • app/src/lib/i18n/de.ts
  • app/src/lib/i18n/en.ts
  • app/src/lib/i18n/es.ts
  • app/src/lib/i18n/fr.ts
  • app/src/lib/i18n/hi.ts
  • app/src/lib/i18n/id.ts
  • app/src/lib/i18n/it.ts
  • app/src/lib/i18n/ko.ts
  • app/src/lib/i18n/pl.ts
  • app/src/lib/i18n/pt.ts
  • app/src/lib/i18n/ru.ts
  • app/src/lib/i18n/zh-CN.ts
  • app/src/pages/conversations/components/ProcessingTranscriptView.tsx
  • app/src/store/chatRuntimeSlice.ts
  • app/src/types/turnState.ts
  • src/bin/harness_subagent_audit.rs
  • src/core/event_bus/events.rs
  • src/openhuman/agent/README.md
  • src/openhuman/agent/harness/agent_graph.rs
  • src/openhuman/agent/harness/archivist/recap.rs
  • src/openhuman/agent/harness/credentials.rs
  • src/openhuman/agent/harness/graph.rs
  • src/openhuman/agent/harness/memory_protocol.rs
  • src/openhuman/agent/harness/mod.rs
  • src/openhuman/agent/harness/parse.rs
  • src/openhuman/agent/harness/session/turn/core.rs
  • src/openhuman/agent/harness/session/turn/graph.rs
  • src/openhuman/agent/harness/subagent_runner/autonomous.rs
  • src/openhuman/agent/harness/subagent_runner/mod.rs
  • src/openhuman/agent/harness/subagent_runner/ops/graph.rs
  • src/openhuman/agent/harness/subagent_runner/ops/runner.rs
  • src/openhuman/agent/harness/tool_result_artifacts/mod.rs
  • src/openhuman/agent/harness/turn_attachments_context.rs
  • src/openhuman/agent/progress.rs
  • src/openhuman/agent/progress_tracing.rs
  • src/openhuman/agent/progress_tracing/langfuse.rs
  • src/openhuman/agent/progress_tracing/tests.rs
  • src/openhuman/channels/providers/web/progress_bridge.rs
  • src/openhuman/inference/provider/mod.rs
  • src/openhuman/inference/provider/resolved_route.rs
  • src/openhuman/thread_goals/runtime.rs
  • src/openhuman/threads/turn_state/mirror.rs
  • src/openhuman/threads/turn_state/mirror_tests.rs
  • src/openhuman/threads/turn_state/store_tests.rs
  • src/openhuman/threads/turn_state/types.rs
  • src/openhuman/tinyagents/abort_guard.rs
  • src/openhuman/tinyagents/convert.rs
  • src/openhuman/tinyagents/middleware.rs
  • src/openhuman/tinyagents/mod.rs
  • src/openhuman/tinyagents/model.rs
  • src/openhuman/tinyagents/observability.rs
  • src/openhuman/tinyagents/payload_summarizer.rs
  • src/openhuman/tinyagents/steering_forwarder.rs
  • src/openhuman/tinyagents/summarize.rs
  • src/openhuman/tinyagents/tests.rs
  • src/openhuman/tinyagents/tools.rs
  • src/openhuman/tokenjuice/savings.rs
  • src/openhuman/tool_status/ops.rs
  • src/openhuman/tool_status/types.rs
  • src/openhuman/tools/impl/filesystem/update_memory_md.rs
  • src/openhuman/tools/ops.rs
💤 Files with no reviewable changes (1)
  • src/openhuman/agent/README.md

Comment thread src/openhuman/agent/harness/archivist/recap.rs Outdated
Comment thread src/openhuman/agent/harness/parse.rs
Comment thread src/openhuman/agent/harness/tool_result_artifacts/mod.rs
Comment on lines 265 to 290
fn send(&self, progress: AgentProgress) {
if let Some(tx) = &self.on_progress {
let _ = tx.try_send(progress);
use tokio::sync::mpsc::error::TrySendError;
let Some(tx) = &self.on_progress else {
return;
};
match tx.try_send(progress) {
Ok(()) => {}
Err(TrySendError::Closed(_)) => {}
Err(TrySendError::Full(progress)) => {
// Backpressure, not capacity loss: hand the delta to an awaited
// `send()` on a spawned task rather than dropping it. Guard on a
// live runtime so a non-async construction path can't panic.
if let Ok(handle) = tokio::runtime::Handle::try_current() {
let tx = tx.clone();
handle.spawn(async move {
let _ = tx.send(progress).await;
});
} else {
tracing::debug!(
model = %self.model,
"[tinyagents] progress channel full and no runtime to defer send; dropping one delta"
);
}
}
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Map and inspect the relevant file.
ast-grep outline src/openhuman/tinyagents/observability.rs --view expanded || true

echo '--- send()/record_usage() around the cited lines ---'
sed -n '220,340p' src/openhuman/tinyagents/observability.rs

echo '--- find progress bridge / consumer ---'
rg -n "spawn_progress_bridge|on_progress|AgentProgress|TextDelta|ThinkingDelta|ToolCompleted|UsageRecorded" src/openhuman -S

Repository: tinyhumansai/openhuman

Length of output: 50378


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo '--- spawn_progress_bridge definition ---'
rg -n -A80 -B20 "fn spawn_progress_bridge" src/openhuman/channels/providers/web.rs src/openhuman -S

echo '--- turn-state mirror ordering assumptions ---'
sed -n '1,120p' src/openhuman/threads/turn_state/mirror.rs
echo '--- relevant observe arms ---'
sed -n '340,460p' src/openhuman/threads/turn_state/mirror.rs

echo '--- send() tests / ordering-related tests in observability.rs ---'
sed -n '840,1025p' src/openhuman/tinyagents/observability.rs

Repository: tinyhumansai/openhuman

Length of output: 10977


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo '--- progress bridge event loop / forwarding ---'
sed -n '197,320p' src/openhuman/channels/providers/web/progress_bridge.rs

echo '--- turn-state mirror relevant event handling ---'
sed -n '340,460p' src/openhuman/threads/turn_state/mirror.rs

echo '--- any tests asserting ordering on a sequential receiver ---'
rg -n -A8 -B8 "TextDelta|ThinkingDelta|ToolCallStarted|ToolCallCompleted|SubagentTextDelta|SubagentThinkingDelta" src/openhuman/tinyagents/observability.rs src/openhuman/threads/turn_state/mirror_tests.rs src/openhuman/channels/providers/web/progress_bridge.rs -S

Repository: tinyhumansai/openhuman

Length of output: 50378


Deferred send() can reorder progress events under backpressure.

spawn_progress_bridge drains AgentProgress sequentially, so a full-channel event handed to a spawned send().await can land after later fast-path sends and scramble text/tool ordering. Route overflow delivery through a single ordered forwarder instead of spawning per-event tasks.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/tinyagents/observability.rs` around lines 265 - 290, The
overflow path in send() can reorder AgentProgress events because each
TrySendError::Full case spawns an independent task and awaits tx.send(progress)
out of sequence. Update the observability flow around send() and
spawn_progress_bridge so backpressure is handled by a single ordered
forwarder/queue rather than per-event spawned sends, preserving the original
progress order for text and tool updates.

Comment thread src/openhuman/tinyagents/steering_forwarder.rs
Comment on lines +15 to +46
/// Process-global registry of per-workspace write locks (#4458).
///
/// `update_memory_md` performs a read-modify-write on `MEMORY.md`/`SKILL.md`.
/// The per-run `MemoryProtocolTracker` has zero cross-run awareness, so two
/// concurrent runs (parallel forks, cron) racing the same workspace file would
/// otherwise clobber each other's append. We serialize every write to a given
/// workspace directory through a shared async mutex, keyed by the canonicalized
/// (falling back to raw) workspace path, so concurrent index updates queue
/// instead of racing. Combined with the temp-file + atomic-rename write below,
/// a killed process can never leave a truncated file.
static WORKSPACE_WRITE_LOCKS: LazyLock<Mutex<HashMap<PathBuf, Arc<tokio::sync::Mutex<()>>>>> =
LazyLock::new(|| Mutex::new(HashMap::new()));

/// Monotonic counter for temp-file uniqueness within a process.
static TEMP_FILE_SEQ: AtomicU64 = AtomicU64::new(0);

/// Return (creating if needed) the shared async write lock for `workspace_dir`.
///
/// The lock key is the canonicalized workspace path when it resolves (so two
/// spellings of the same directory share one lock), else the raw path.
fn workspace_write_lock(workspace_dir: &Path) -> Arc<tokio::sync::Mutex<()>> {
let key = workspace_dir
.canonicalize()
.unwrap_or_else(|_| workspace_dir.to_path_buf());
let mut map = WORKSPACE_WRITE_LOCKS
.lock()
.unwrap_or_else(|poisoned| poisoned.into_inner());
Arc::clone(
map.entry(key)
.or_insert_with(|| Arc::new(tokio::sync::Mutex::new(()))),
)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check whether cron/background/parallel-agent execution spawns a new OS
# process (Command::new / std::process) vs staying in-process (tokio::spawn).
rg -n "std::process::Command|tokio::process::Command" src/openhuman -g '*.rs' | rg -i "cron|subagent|parallel|daemon"
fd -a cron src/openhuman
rg -n "fn run\b" -A 8 src/openhuman/tools/impl/system/cron 2>/dev/null

Repository: tinyhumansai/openhuman

Length of output: 328


Use cross-process locking for workspace writes
WORKSPACE_WRITE_LOCKS only serializes writers inside one OS process. src/openhuman/cron/scheduler.rs launches work via tokio::process::Command, so separate cron invocations won't share this mutex and can still clobber the same MEMORY.md/SKILL.md during the read-modify-write cycle. If those paths can overlap, this needs an inter-process lock.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/tools/impl/filesystem/update_memory_md.rs` around lines 15 -
46, The current workspace write guard in workspace_write_lock and
WORKSPACE_WRITE_LOCKS only prevents races within one process, so concurrent cron
runs can still overwrite MEMORY.md/SKILL.md during update_memory_md. Replace or
augment the in-process mutex with an inter-process locking mechanism around the
read-modify-write flow in update_memory_md so separate tokio::process::Command
invocations contend on the same workspace lock and serialize writes across
processes.

Comment thread src/openhuman/tools/impl/filesystem/update_memory_md.rs
senamakel added 2 commits July 4, 2026 11:35
…ings

Post-merge CI fix: the tinyhumansai#4459 denied/approvalExpired locale strings were
not prettier-wrapped. No key/text changes, formatting only.

Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
# Conflicts:
#	src/openhuman/agent/progress_tracing.rs
#	src/openhuman/agent/progress_tracing/langfuse.rs
#	src/openhuman/channels/providers/web/progress_bridge.rs
@coderabbitai coderabbitai Bot added feature Net-new user-facing capability or product behavior. memory Memory store, memory tree, recall, summarization, and embeddings in src/openhuman/memory/. labels Jul 4, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/openhuman/tinyagents/tests.rs (1)

512-536: 🎯 Functional Correctness | 🔵 Trivial

Unverified middleware-count assertions — flagged by the author as pending.

The inline NOTE explicitly states these counts (15, 4, and the analogous 13 at Line 591) are derived by addition/inference and "are NOT compiled by cargo check --lib (this is a #[cfg(test)] block) and are pending the deferred test pass". Given the PR's stated scope ("Tests were not written or run in this pass"), these hardcoded numbers are a real risk of drifting from the actual runtime middleware inventory once the test suite is compiled/run.

Since the PR explicitly earmarks a follow-up test pass, consider tracking this as a checklist item so the counts get validated rather than silently trusted.

Also applies to: 588-593

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/tinyagents/tests.rs` around lines 512 - 536, The middleware
inventory assertions in the TinyAgents tests are still unverified and were
explicitly marked as pending, so replace the hardcoded counts with values that
are validated by the test suite or defer them behind a clear follow-up check.
Update the assertions around assembled.harness.middleware(),
mw.tool_middleware_len(), and the matching inventory check near the later block
to reflect the actual compiled middleware stack, and keep the note synchronized
with the verified source of truth.
src/openhuman/agent/progress_tracing/langfuse.rs (1)

262-278: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Add test coverage for the cached-token subtraction path.

The core behavior change here is "input_tokens is INCLUSIVE of cached prompt tokens... so emit the NON-cached input (input - cached)", but none of the new tests set gen_ai.usage.cached_input_tokens to a nonzero value alongside a nonzero input_tokens to confirm usageDetails.input is actually reduced. All existing tests use cached = 0, so the saturating_sub branch is currently unexercised.

✅ Suggested test addition
+    #[test]
+    fn usage_details_input_excludes_cached_tokens() {
+        let mut gen = span(
+            "trace-1", "gen-1", Some("root"), "llm.agentic-v1",
+            SpanKind::Generation, SpanStatus::Ok, 1_000, Some(1_500),
+        );
+        gen.attributes.clear();
+        gen.attributes.insert("gen_ai.usage.input_tokens".into(), json!(1_000));
+        gen.attributes.insert("gen_ai.usage.output_tokens".into(), json!(200));
+        gen.attributes.insert("gen_ai.usage.cached_input_tokens".into(), json!(300));
+        let payload = spans_to_langfuse_batch(&[gen], false, "production");
+        let usage = &payload["batch"][1]["body"]["usageDetails"];
+        assert_eq!(usage["input"], 700, "non-cached input only");
+        assert_eq!(usage["cache_read_input_tokens"], 300);
+        assert_eq!(usage["total"], 1_200, "total = raw input + output");
+    }

Also applies to: 712-781

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/progress_tracing/langfuse.rs` around lines 262 - 278, Add
test coverage in the Langfuse tracing tests for the cached-token subtraction
path in langfuse:: usage conversion. Create a case that sets both
gen_ai.usage.input_tokens and gen_ai.usage.cached_input_tokens to nonzero
values, then assert usageDetails.input is reduced to input - cached while
cache_read_input_tokens reflects the cached amount and total still reconciles
correctly. Reuse the existing langfuse conversion helpers/tests so the new
assertion directly exercises the non_cached_input branch and the saturating_sub
logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/openhuman/agent/progress_tracing/langfuse.rs`:
- Around line 262-278: Add test coverage in the Langfuse tracing tests for the
cached-token subtraction path in langfuse:: usage conversion. Create a case that
sets both gen_ai.usage.input_tokens and gen_ai.usage.cached_input_tokens to
nonzero values, then assert usageDetails.input is reduced to input - cached
while cache_read_input_tokens reflects the cached amount and total still
reconciles correctly. Reuse the existing langfuse conversion helpers/tests so
the new assertion directly exercises the non_cached_input branch and the
saturating_sub logic.

In `@src/openhuman/tinyagents/tests.rs`:
- Around line 512-536: The middleware inventory assertions in the TinyAgents
tests are still unverified and were explicitly marked as pending, so replace the
hardcoded counts with values that are validated by the test suite or defer them
behind a clear follow-up check. Update the assertions around
assembled.harness.middleware(), mw.tool_middleware_len(), and the matching
inventory check near the later block to reflect the actual compiled middleware
stack, and keep the note synchronized with the verified source of truth.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f69142c2-dc79-467b-8c2b-4ea28f04f73a

📥 Commits

Reviewing files that changed from the base of the PR and between 277f6f6 and 26dbf59.

📒 Files selected for processing (23)
  • app/src/lib/i18n/ar.ts
  • app/src/lib/i18n/bn.ts
  • app/src/lib/i18n/de.ts
  • app/src/lib/i18n/es.ts
  • app/src/lib/i18n/fr.ts
  • app/src/lib/i18n/hi.ts
  • app/src/lib/i18n/id.ts
  • app/src/lib/i18n/it.ts
  • app/src/lib/i18n/ko.ts
  • app/src/lib/i18n/pl.ts
  • app/src/lib/i18n/pt.ts
  • app/src/lib/i18n/ru.ts
  • app/src/lib/i18n/zh-CN.ts
  • src/bin/harness_subagent_audit.rs
  • src/openhuman/agent/harness/tool_result_artifacts/mod.rs
  • src/openhuman/agent/progress.rs
  • src/openhuman/agent/progress_tracing.rs
  • src/openhuman/agent/progress_tracing/langfuse.rs
  • src/openhuman/agent/progress_tracing/tests.rs
  • src/openhuman/channels/providers/web/progress_bridge.rs
  • src/openhuman/threads/turn_state/mirror.rs
  • src/openhuman/tinyagents/observability.rs
  • src/openhuman/tinyagents/tests.rs
✅ Files skipped from review due to trivial changes (10)
  • app/src/lib/i18n/de.ts
  • app/src/lib/i18n/ar.ts
  • app/src/lib/i18n/ru.ts
  • app/src/lib/i18n/zh-CN.ts
  • app/src/lib/i18n/id.ts
  • app/src/lib/i18n/pt.ts
  • app/src/lib/i18n/es.ts
  • src/bin/harness_subagent_audit.rs
  • app/src/lib/i18n/fr.ts
  • app/src/lib/i18n/pl.ts
🚧 Files skipped from review as they are similar to previous changes (11)
  • app/src/lib/i18n/hi.ts
  • app/src/lib/i18n/it.ts
  • app/src/lib/i18n/bn.ts
  • src/openhuman/agent/progress_tracing/tests.rs
  • app/src/lib/i18n/ko.ts
  • src/openhuman/channels/providers/web/progress_bridge.rs
  • src/openhuman/agent/progress.rs
  • src/openhuman/agent/harness/tool_result_artifacts/mod.rs
  • src/openhuman/threads/turn_state/mirror.rs
  • src/openhuman/tinyagents/observability.rs
  • src/openhuman/agent/progress_tracing.rs

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 26dbf59924

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1168 to +1169
let mut provider_model = ProviderModel::new(provider, model, temperature)
.with_usage_carry(provider_usage_carry.clone());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Share usage carry with fallback route models

When the primary route fails and RunPolicy.fallback selects one of the route models registered by routes::build_route_models, that ProviderModel is still constructed with its default private usage_carry (see routes.rs), while the bridge only drains the provider_usage_carry created here. Successful fallback calls therefore never provide their provider-reported charged USD/context/cache/reasoning breakdown to the bridge, so the turn falls back to catalog estimates despite the new carry path; thread this same carry into the fallback route models as well.

Useful? React with 👍 / 👎.

Comment on lines +366 to +375
let cache_creation_tokens = carried
.as_ref()
.map(|u| u.cache_creation_tokens)
.filter(|t| *t > 0)
.unwrap_or(usage.cache_creation_tokens);
let reasoning_tokens = carried
.as_ref()
.map(|u| u.reasoning_tokens)
.filter(|t| *t > 0)
.unwrap_or(usage.reasoning_tokens);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use carried token breakdown in progress events

For providers where cache-creation or reasoning tokens only exist on the carried UsageInfo, these local values are recorded into UsageInfo but the ModelCallCompleted event below still sends usage.cache_creation_tokens and usage.reasoning_tokens. That leaves the per-call progress/Langfuse generation telemetry at zero for those fields even though the cost tracker got the right values; use the carried cache_creation_tokens/reasoning_tokens variables in the log and progress event too.

Useful? React with 👍 / 👎.

// bounded size for the wire (#4007); `output_chars` +
// `elapsed_ms` still ride along in `subagent` below.
output: Some(cap_wire_output(output)),
failure: failure_json,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Forward subagent failure payload into live state

This socket event now includes the structured failure, but the frontend ChatSubagentToolResultEvent type still has no failure field and onSubagentToolResult only updates status/timing/result. When a sub-agent tool fails during a live turn, the emitted explanation is dropped before it reaches SubagentToolCallEntry, so the child row loses the new “why/next” copy until a snapshot reload; add the field to the event type and parse/store it in the live handler.

Useful? React with 👍 / 👎.

senamakel added 3 commits July 4, 2026 12:41
…ests

The tinyhumansai#4459 failure field on AgentProgress::SubagentToolCallCompleted must be
set in the two test constructions upstream tinyhumansai#4498's telemetry tests added
(cargo check --lib does not compile #[cfg(test)], so this surfaced only in
the coverage job's cargo test).

Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
…p cleanup

- Restore ArchivistHook::rolling_segment_recap (tinyhumansai#4469 removed it but an E2E
  test target still calls it; the removal was optional 'remove or re-wire').
- Preserve QueueMode when re-queueing residual steer/collect on forwarder drop
  so a delivered collect line isn't re-framed as user steering (tinyhumansai#4456).
- Clean up the staged temp file if the initial write fails in update_memory_md
  (tinyhumansai#4458).

Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
…yhumansai#4465)

A <tool_call> tag that parses as P-Format has no json_calls entry (the JSON
pass can't parse it), so advancing json_idx shifted every later JSON tag onto
the wrong index and silently dropped a real JSON call. Both review bots flagged
this. Leave json_idx untouched for P-Format tags.

Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4aa69372e4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

next_span_seq: 0,
id_prefix: uuid::Uuid::new_v4().simple().to_string(),
// Metadata-only by default; opt in via `with_content_capture`.
capture_content: false,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Wire the capture flag into SpanCollector

When observability.agent_tracing.capture_content is enabled, the progress bridge sets the flag on TraceContext and then constructs the collector with SpanCollector::new(trace_ctx), but this new storage gate is always initialized to false unless the separate with_content_capture() builder is called. In the live path that builder is never used, so AgentProgress::TurnContent emitted by the session turn is dropped at storage time and opt-in prompt/reply capture never reaches the trace/exporter; initialize this from ctx.capture_content or call the builder at the construction site.

Useful? React with 👍 / 👎.

parsed_tool_calls = combined.len(),
"[agent_parse] P-Format-aware parse produced combined tool-call set"
);
(narrative, combined)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve canonical calls outside literal tool tags

When a P-Format response also contains a canonical call that is not a literal <tool_call> tag, such as a ```tool_call block or GLM-style line after a recovered name[...] tag, `parse_tool_calls(response)` can put that call in `json_calls`, but the loop above only walks `TOOL_CALL_OPEN_TAGS` and never consumes those non-tag calls. Because `combined` is nonempty, returning it here drops the remaining canonical calls and executes only the recovered P-Format call; append any unconsumed canonical calls (or merge by source span) before returning.

Useful? React with 👍 / 👎.

@senamakel senamakel merged commit f89c5e0 into tinyhumansai:main Jul 4, 2026
15 checks passed
@github-project-automation github-project-automation Bot moved this from Todo to Done in Team Openhuman Jul 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. bug feature Net-new user-facing capability or product behavior. memory Memory store, memory tree, recall, summarization, and embeddings in src/openhuman/memory/. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant