TinyAgents migration parity & bug fixes (#4451–#4469) by senamakel · Pull Request #4495 · tinyhumansai/openhuman

senamakel · 2026-07-04T10:19:25Z

Implements the TinyAgents migration parity & bug-fix tracker (#4470) — the audit of the 1.3→1.5 migration. All 19 child issues addressed in one branch: 18 fixed (each seam-side, no vendor/tinyagents changes), 1 deferred (#4468, pure test-parity — see below).

Built on current main (post #4473 / #4480 / #4483). Each fix was re-checked against live code (some issues were partly reshaped by the migration waves) and stacked sequentially, each passing cargo check --lib before landing; a final full cargo check --lib on the complete branch is green.

Phase 1 — critical correctness & security

Harness: fatal tool-arg schema validation aborts the whole turn (legacy loop self-corrected) #4451 — fatal tool-arg schema validation → recoverable model-visible tool errors (SchemaGuardMiddleware); ArgRecoveryMiddleware no longer coerces to {} when required fields exist.
Subagent empty tool allowlist is fail-open — child inherits parent's full tool surface (shell/spawn) #4452 — subagent tool allowlist made fail-closed (Option<HashSet>: None=all, Some(empty)=deny-all); spawn/delegate tools stripped at registration.
Credential scrubbing dropped from tool outputs on the tinyagents path #4453 — credential scrubbing re-introduced as an innermost tool-wrap middleware (covers parent, subagent, transcript, ToolCallOutcome); AWS/OpenAI/Bearer patterns added.
Trace exporter ignores capture_content — prompts/replies written to NDJSON/app log; Langfuse cached-token double-count #4454 — trace content gated at span-storage on capture_content; Langfuse usageDetails made disjoint (input − cached).

⚠️ Tests deferred — this PR is implementation + compile-verification only

Per the working constraint, no tests were written/run in this pass; every fix is verified via cargo check --lib (implementation compiles). A follow-up test pass must:

Add the unit/e2e tests each issue's acceptance criteria call for (the per-commit messages note what's needed), incl. Test parity matrix: port behavior clusters that lost all coverage in the tinyagents migration (spec Phase 11) #4468's loop-level parity matrix.
Reconcile the adapter-inventory assertions in src/openhuman/tinyagents/tests.rs (tool_middleware_len / lifecycle counts): several fixes added middlewares and bumped these, but cargo check --lib does not compile #[cfg(test)], so the cumulative counts are unverified and likely need adjustment once tests compile.

Generated via multi-agent orchestration. Full per-issue rationale is in each commit body.

https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR

Summary by CodeRabbit

New Features
- Expanded tool-failure UI messaging (localized) with new cases for denied actions and expired approvals, including clearer “cause” and “next step”.
- Enabled the memory update tool (update_memory_md) by default so agents can persist memory changes via the standard path.
- Improved sub-agent handling and reporting when execution halts due to repeated failures.
Bug Fixes
- Preserved tool-failure “why + what to do next” details across snapshot/restart flows.
- Improved transcript/progress handling and prevented empty assistant messages from polluting future requests.
- Strengthened credential redaction for more secret formats.
Chores / Other
- Improved tracing privacy: content is captured only when explicitly enabled.

… tool errors (tinyhumansai#4451) The tinyagents harness runs a fatal JSON-schema gate (`ToolSchema::validate_call`) on every tool call between `before_tool` and the tool-wrap onion; any missing-required/wrong-type/bad-enum call returns `TinyAgentsError::Validation`, which propagates out of `run_loop` and aborts the whole turn with `chat_error`. The legacy engine had no such gate — bad args came back as recoverable tool results the model self-corrected on. Fix entirely seam-side (vendor/tinyagents is upstream/read-only): - Add `SchemaGuardMiddleware` (in src/openhuman/tinyagents/middleware.rs) implementing both `Middleware` (before_tool) and `ToolMiddleware` (wrap_tool). before_tool re-runs the same validation via the identical `spec_to_schema` conversion the runner uses; on failure it records a descriptive `invalid arguments for <tool>: <detail>. Expected schema: ...` message keyed by call id and rewrites the args to a schema-satisfying stub so the crate's fatal gate passes. wrap_tool then short-circuits the flagged call with a synthetic failed `ToolResult` before the stub reaches the tool, so the loop continues and the model self-corrects. Installed as the outermost tool-wrap middleware. - Fix `ArgRecoveryMiddleware` to stop coercing non-object args to `{}` when the schema has required fields: it now decodes JSON-encoded-string args (stripping markdown fences), coerces to `{}` only for permissive schemas, and otherwise leaves the args for the schema-guard tool-error path (the old behaviour guaranteed a fatal "<field> is required" abort). - Update the middleware-inventory assertions (lifecycle 14->15, tool 2->3). Tests deferred: tests/agent_harness_e2e.rs needs chat + subagent regressions where a scripted provider emits one malformed call then a corrected one, asserting the turn continues with no chat_error. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR

Sub-agent tool exposure had inverted from fail-closed to fail-open after the TinyAgents migration: the registration predicate `allowed.is_empty() || allowed.contains(name)` treated an EMPTY allowlist as "register every tool", so a tool-less sub-agent (`ToolScope::Named([])`, a zero-match `skill_filter`, or a `named` list that resolved to nothing) silently inherited the parent's full tool surface — shell, file-write, and spawn/delegate — a prompt-injection privilege-escalation path. Change the plumbed allowlist type to `Option<HashSet<String>>` through the shared seam (`run_turn_via_tinyagents_shared` -> `assemble_turn_harness`): `None` = no filter (register all visible tools); `Some(set)` = register exactly those tools, so `Some(empty)` is a genuine deny-all. The sub-agent path now passes `Some(allowed_names)` (always a concrete, resolved set), so an empty resolution denies all instead of inheriting everything. The chat/channel paths keep their historical "empty = all" convention by mapping an empty/absent set to `None`. A warn is logged when the allowlist resolves empty so misconfigured agents are diagnosable. Also re-assert the "sub-agents never spawn" invariant at registration time: spawn/delegate/worker-thread tool names are stripped from any sub-agent run regardless of allowlist contents (defense-in-depth beyond the caller-side `allowed_indices` strip). The tool-exposure shadow middleware is updated to the same fail-closed `Option` semantics so its divergence reference matches what is actually registered. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR

…uts (tinyhumansai#4453) The legacy engine scrubbed every tool output before it entered model context; the tinyagents path dropped that call site and scrub_credentials had zero production callers. Secrets in tool output reached model context, on-disk transcripts, worker-thread mirrors, and the tool-outcome sink. Re-introduce scrubbing as CredentialScrubMiddleware, a wrap_tool middleware registered as the innermost tool wrap on the shared assemble_turn_harness seam, so it scrubs the RAW tool result before the after_tool chain (summarization/caps), the transcript push, and the tool-outcome capture sink — covering parent, subagent, persisted transcript, and ToolCallOutcome by construction. Extend the scrub patterns to catch bare AKIA/ASIA AWS keys, sk- OpenAI keys, and space-separated Bearer tokens; also scrub JSON raw payloads recursively. Update the adapter-inventory assertion (3 -> 4). Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR

…age (tinyhumansai#4454) The trace exporter attached the user prompt and model reply to the turn span unconditionally, so spans_to_ndjson/export_spans serialized plaintext to the NDJSON file (or printed it at info level to the app log) even with the default capture_content=false. Gate content at the span-STORAGE level instead: the SpanCollector now carries a capture_content flag and drops TurnContent unless opted in, so no exporter — NDJSON, app log, or Langfuse push — can ever serialize prompt/reply text. The web progress bridge wires the flag from config; the TurnContent emit in session/turn/core.rs also respects the gate as defense in depth; and export_spans no longer logs span content at info (metadata at info, NDJSON body at debug only). Separately fix Langfuse usageDetails double-counting: input_tokens is inclusive of cached tokens (cost.rs treats cached as a subset of input), but Langfuse sums usageDetails components as disjoint buckets. Emit input = input_tokens - cached_input_tokens so non_cached_input + cache_read + output reconcile to the explicit total. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR

…rs (tinyhumansai#4455) Mid-turn steer/collect messages are injected as `Message::user(...)` and appended to the harness working transcript. Post-run persistence sliced the returned conversation with the "suffix after the last user message" convention (`messages_since_last_user`), so an injected steer MOVED that boundary — every pre-steer assistant/tool round plus the steer text itself vanished from persisted history, the next-turn KV-cache prefix, and subagent checkpoints. Replace the last-user-suffix convention on the persistence path with an explicit boundary: capture `base_len = input.len()` (the request transcript length) BEFORE `run_loop` consumes it, then persist `run.messages[base_len..]` via the new `convert::messages_since_request`. The crate returns the full transcript in `run.messages` (the loop seeds `messages = input` and only appends; the compression/trim middleware rewrites only the per-call `request.messages.clone()`), so slicing at `base_len` yields the complete post-request transcript — pre-steer rounds, the framed steer messages, and post-steer rounds, in execution order. Applied to both the thin `run_turn_via_tinyagents` and the shared `run_turn_via_tinyagents_shared` used by the parent (`session/turn/core.rs`) and subagent (`subagent_runner/ops/graph.rs`) paths, so checkpoints, cap digests, and worker mirrors see the full transcript. Steer framing (`[User steering message]:` / `[Additional context from user]:`) is preserved automatically since the injected user messages now fall inside the persisted slice. `messages_since_last_user` is retained (test-covered) but gated `#[cfg_attr(not(test), allow(dead_code))]` as it no longer has a production caller. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR

…inyhumansai#4456) run_turn_via_tinyagents_shared spawned a 50ms steering-forwarder poll loop whose cleanup (abort + steering-registry deregister) only ran on normal return. Cancellation here is drop-based (web select! cancel token; sub-agent AbortHandle), so a cancelled turn detached the task to loop forever, pinning the session-owned Arc<RunQueue> + SteeringHandle and racing the next turn's forwarder into a dead handle. Aborted sub-agent registry entries were also never removed. Wrap the forwarder in an abort-on-drop RAII guard (SteeringForwarderGuard) held across the drive future: its Drop aborts the poll task, deregisters the sub-agent steering handle, and drains residual delivered-but-unapplied steers back into the session RunQueue so a late steer becomes the next turn's input instead of vanishing. Cleanup now runs identically on normal return, error, and drop-cancellation. A process-wide ACTIVE_FORWARDERS counter (active_steering_forwarders()) lets tests assert zero live tasks after a cancel. Re-emit delivery/requeue observability via new RunQueueMessageDelivered + RunQueueSteerRequeued DomainEvents. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR

…le error_slot (tinyhumansai#4457) Turn-finalization audit fixes for the TinyAgents migration: A. EmptyProviderResponse no longer leaves a blank assistant row. On the tool_calls==0 empty-completion path, turn/core.rs returned Err with the empty Chat(assistant("")) still folded into history (via history.extend), so the next request carried an empty-content assistant message and strict providers (Anthropic) 400'd the whole thread. Pop the trailing empty assistant row before returning, matching the tinyhumansai#4093 branch. B. Stale provider error no longer masks the real run failure. ProviderModel now clears error_slot on every successful (incl. fallback-recovered) model call (buffered + streaming), and the runner maps the model-call cap / spawn-depth failures BEFORE consulting error_slot, so a run that failed over successfully then failed for an unrelated reason surfaces the real classification instead of a leftover provider error. C. Exactly one TurnCompleted per turn, after the wrap-up streams. The seam emitted TurnCompleted per parent turn AND the chat session emitted it again after summarize_turn_wrapup — the web bridge recorded two ledger events + two Completed upserts and turn_active=false landed before the cap/tinyhumansai#4093 checkpoint finished streaming. New defer_turn_completed_to_caller seam param suppresses the seam emit on the chat path (core.rs is the single post-wrap-up emitter); channel/CLI keeps the seam emit. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR

…rovider calls (tinyhumansai#4460) ProviderModel::stream ran every streamed provider call in a detached tokio::spawn producer, which lost the caller's task-locals and left the call running after cancellation: - thread_id task-local was None inside the spawn, so the managed backend's thread_id extension was omitted on every streamed request. - the resolved-route audit slot was out of scope, so record_resolved_provider_route no-op'd and channel audit fell back to the requested route. - a hard AbortHandle cancel dropped only the consumer stream; the spawned provider.chat ran to completion and was still billed. Fix, entirely in the openhuman seam: - Capture thread_id (current_thread_id) and the resolved-route audit slot (new current_route_slot) on the caller's task before spawning, and re-establish them inside via with_thread_id / with_route_slot so the detached provider.chat sees the same ambient context. - Expose RouteSlot + current_route_slot + with_route_slot from resolved_route (explicit out-slot instead of a lost task-local). - Add AbortOnDrop guard (tinyagents/abort_guard.rs) holding the producer JoinHandle and move it into the consumer stream state, so dropping the stream/turn future aborts the in-flight provider call. Debug-log aborts. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR

…rites (tinyhumansai#4458) The memory read->dedupe->write->update-index protocol can only close its write cycle via a successful `update_memory_md` call, and every durable write appends an instruction to call it — but `UpdateMemoryMdTool` was never registered in `all_tools_with_runtime`; it was only re-exported. The archivist's `[tools] named` allowlist selects it, but subagents only filter the parent set, so it silently resolved away, producing a permanent unsatisfiable "call update_memory_md" nag loop (unknown-tool error → the tracker never observes IndexUpdate → escalating drift warnings + an after_agent stale-index warning every run). Changes: - Register `UpdateMemoryMdTool` in `all_tools_with_runtime`, always-on like the other memory tools; per-agent visibility stays governed by each agent's `named` allowlist. Targets the workspace MEMORY.md/SKILL.md. - Make writes atomic + serialized: temp-file + atomic rename, guarded by a process-global per-workspace async lock keyed on the canonicalized workspace path. Concurrent index updates now queue instead of racing, and a killed process can never leave a truncated MEMORY.md. (Becomes live the moment registration is fixed.) - Classify `remember_preference`/`save_preference` with an explicit, documented `MemoryOp::Other` arm: they persist via `Memory::store` but into dedicated preference namespaces surfaced by prompt injection / per-query recall, not the MEMORY.md wiki — classifying them as writes would resurrect the same nag loop. Deferred (final test pass): registry-inventory assertion mirroring the adapter-inventory test, plus concurrency/atomicity tests. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR

… failure in snapshot (tinyhumansai#4459) Repairs tinyhumansai#4413 tool-failure classification. A/B/E: the classifier now short-circuits POLICY_BLOCKED_MARKER and POLICY_DENIED_MARKER *before* the `timed out` sniff, so a user deny → `Denied` and an approval-TTL expiry → `ApprovalExpired` (both `FailureCategory::UserDeclined`, non-retryable, honest copy) instead of a recoverable Unknown "try again" or a Timeout "will retry". Middleware now sniffs both `error` and `content`, and the dead `"policy denied"` needle is removed. C: `ToolTimelineEntry` gains `failure: Option<PersistedToolFailure>` (camelCase, matching the TS `PersistedToolFailure` contract) threaded through `mirror.rs`, so a failed tool's explanation survives thread switch / restart. D: `SubagentToolCallCompleted` gains a `failure` field populated from the already-computed classification (observability.rs) and forwarded on the web event + persisted `SubagentToolCall`, so failed sub-agent rows carry an explanation. Frontend: TS types (`PersistedSubagentToolCall.failure`, `SubagentToolCallEntry.failure`, slice mapping), the localized-failure-class set gains `denied`/`approvalExpired`, and i18n keys land in en.ts + all locales. Tests deferred to the final pass. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR

…yhumansai#4461) The crate ContextCompressionMiddleware installed in assemble_turn_harness regressed compaction vs the legacy engine in two ways: 1. A summarizer failure propagated (before_model does summarize(..).await?), mapping any provider hiccup to TinyAgentsError::Model and aborting the whole turn — on exactly the longest, most valuable threads. 2. Once over the 90% threshold, every model call re-ran a full-transcript summarizer LLM call: the crate rewrites only the per-call request clone, so the loop's working transcript never shrinks and re-summarizes each iteration. Wrap the LLM-backed ProviderModelSummarizer in a per-turn FaultTolerantCachingSummarizer adapter (seam-side, no crate change): - On Err: warn, trip a per-turn circuit breaker, and return a deterministic (LLM-free) front-drop trim of the input instead of propagating. The turn continues; later compactions skip the known-bad LLM. Restores the legacy warn + circuit-breaker + deterministic-trim behavior. - Single-slot content-hash cache (keyed by message count + slice hash) makes a repeat identical input slice free — retries / re-issued requests no longer re-dispatch the summarizer LLM until the transcript grows again. - Fallback produces a proper SummaryRecord with provenance, so the existing compaction-provenance drain surfaces it (observability preserved). Compaction failure can no longer fail a run; long over-threshold turns stop paying a fresh summarizer LLM call for identical input. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR

…eviction (tinyhumansai#4462) Replace the crate `MessageTrimMiddleware` with a seam-owned `ImageAwareMessageTrimMiddleware` that restores the deleted `harness/token_budget.rs` semantics regressed by the TinyAgents migration: 1. Image marker cost: inline `[IMAGE:…]` markers and native `ContentBlock::Image` blocks are each charged a flat 1200 tokens instead of pricing the base64 payload at chars/4 (~2M tokens), so a single large image no longer reads as massively over budget and evicts the whole transcript. 2. Trim reserve: restore the legacy proportional clamp `clamp(window/10, >=512, <=max(8192, window/4))` — an 8k local model's input budget is ~7373 again, not the fixed `window - 16384` floored at 1024. 3. System survival + order: system messages are never dropped and never reordered to the front; only non-system history is evictable (oldest-first), leading orphaned tool results are snapped past. 4. Observability: any eviction emits a grep-able `[tinyagents::mw] message_trim` warn carrying messages-dropped and tokens/messages before-and-after. Fixed entirely in the openhuman seam; vendor/tinyagents untouched. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR

…om, terminal-inference halt, autonomous iter-cap lift (tinyhumansai#4463) Restores loop-guard behaviors dropped with the legacy `tool_loop.rs` in the TinyAgents migration; the crate `no_progress` ladder tracks *failures* only and does not replace them. All fixes are seam-side (`src/openhuman/**`); vendored crate untouched. - RepeatProgressMiddleware (new, `after_model` + batch-aware `after_tool`): halts on 3× identical *successful* `(tool, args)` batches (tinyhumansai#4088) and 4× identical outputs (tinyhumansai#4095); `wait_subagent` polling exemption ported. Shares the halt-summary slot + steering handle with the failure breaker. - RepeatedToolFailureMiddleware: differentiated headroom restored — recoverable classes (timeout/transient markers + tinyhumansai#4445 classifier: Timeout/ ServiceUnavailable/ModelConnection) get 8 identical / 12 varied instead of the crate's fixed 3/6; POLICY_DENIED_MARKER rejoins POLICY_BLOCKED_MARKER for the 2-repeat hard-reject fast-trip (tinyhumansai#4463 part 6). - Terminal delegated-inference fast-halt (tinyhumansai#3104): budget-exhausted / provider-config-rejection delegation failures halt on first occurrence, gated on the delegated-inference envelope so recoverable tool stderr isn't caught. - Autonomous iter-cap lift (part 7): `autonomous_iter_cap()` was a dead knob — wired back in via `subagent_iter_cap_with_autonomous_lift` at the subagent `effective_max_iterations()` sites, so task-dispatcher / skill subagents run up to TASK_RUN_MAX_ITERATIONS again instead of the normal per-agent cap. - Updated the adapter-inventory middleware-count assertions (15→16, 11→12). Tests deferred to the final parity test pass. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR

…apture sees capped (tinyhumansai#4464) The tinyagents crate runs after_tool hooks in REVERSE registration order (MiddlewareStack::run_after_tool iterates .iter().rev()). Two seam registrations assumed the opposite: (A) TurnContextMiddleware::install pushed HandoffMiddleware before ToolOutputMiddleware, so the 16 KiB byte cap truncated FIRST and a capped ~4k-token payload could never cross the 50k-token handoff threshold — the ResultHandoffCache/extract_from_result drill-in was unreachable. Flip the push order (tool-output budget first, handoff last) so the effective after_tool chain is handoff(raw) -> summarizer/tokenjuice/caps. (B) ToolOutcomeCaptureMiddleware was pushed AFTER context_mw.install, so its after_tool ran BEFORE the caps and the sink held full raw output per call for the whole turn (multi-MB) while failure classification / output_summary saw pre-cap content. Move its push BEFORE context_mw.install so it runs AFTER the caps and records the final capped content. Comments now state the crate's reverse-order rule explicitly. Keeps tinyhumansai#4453 credential-scrub ordering consistent. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR

…tinyhumansai#4465) The tinyagents migration (model.rs -> agent/harness/parse.rs) kept the XML/JSON/markdown/GLM grammars but dropped the legacy **P-Format** positional grammar (`<tool_call>name[arg1|arg2]</tool_call>`) — even though `PFormat` is the default `ToolCallFormat` and ~10 builtin agent prompts still teach `name[a|b]`. A model that followed its own instructions emitted calls the parser logged as "malformed <tool_call> JSON" and silently dropped, so the turn continued as if no tool ran. Restore parity (option (a)) entirely in the openhuman seam: - `parse::parse_tool_calls_with_pformat(text, &PFormatRegistry)` walks the `<tool_call>`-family tags and, per tag, prefers the registry-driven positional parse (existing `agent::pformat::parse_call`) and falls back to the JSON entry the canonical parser produced — the exact per-tag selection the legacy `PFormatToolDispatcher` did. Empty registry short-circuits to `parse_tool_calls`, so non-PFormat paths are behaviour-neutral. `ParsedToolCall` gains `Clone` for the fallback copy. - `tinyagents::model` builds a `PFormatRegistry` from the advertised `request.tools` schemas and routes the text-mode fallback through the new parser in both the buffered (`invoke`) and streaming (`stream`) paths. Verbose `[agent_parse]` debug logging on the recovery path (tool name only — never the argument body, which may carry user data). vendor/tinyagents untouched (upstream/read-only). Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR

…yhumansai#4466) The TinyAgents-migrated sub-agent path lost several behaviours the legacy per-iteration observer provided. Fix them entirely in the openhuman seam: 1. Failed runs persisted nothing. The graph `?`-returned on a harness error before `persist_subagent_transcript` / `mirror_worker_thread`, so a crash mid-run left no `session_raw` transcript (breaking `transcript_ingest`) and an empty worker thread. Add a `TranscriptSnapshotMiddleware` that mirrors each `before_model` transcript into an openhuman-owned buffer; on the error path persist those completed rounds + mirror them to the worker thread with a failure marker, then return the error. 2. Worker-thread mirror metadata was reduced to `{scope, agent_id, task_id}`. Restore the legacy `iteration` / `final` / `tool_calls` / `tool_call_id` / `tool_name` / `mode` fields. 3. Lossy streaming: the progress bridge used `try_send` and dropped deltas on a full channel. Keep the synchronous fast path but, on `Full`, defer to an awaited `send()` on a spawned task (backpressure) instead of silently dropping — the `EventListener` callback is sync so it cannot `.await` inline. 4. Sub-agent TokenJuice compaction was silently off (`TurnContextMiddleware:: defaults()` = compression Off). Build the sub-agent context middleware from the live `[context]` config + `definition.effective_tokenjuice_compression()` (compaction, byte budget, microcompact, autocompact opt-out), same as chat. 5. Breaker halt reported `Completed`: circuit-breaker-halted children returned `hit_cap=false`, so parents treated a halted child as finished. Carry a `breaker_halt` reason on the turn outcome and map it to `Incomplete`. cargo check --lib: clean. Tests deferred to the final test pass. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR

…-run usage, tool timing + DomainEvents (tinyhumansai#4467) Restore cost-accounting and telemetry parity lost on the tinyagents path: - Charged USD (item 1): thread the provider `UsageInfo` (backend-charged USD + context window + cache-creation/reasoning tokens the crate `Usage` drops) from the model adapter to the event bridge via a shared FIFO carry; the bridge now prefers the provider's charged amount over the catalogue estimate (charged > estimate precedence) and records a real context window. - Cap-summary usage (item 2): fold all four token fields, price the call, and feed the global cost tracker directly for the sub-agent cap-hit checkpoint call (it bypasses the harness bridge, so its spend was previously lost). - Failed unobserved runs (item 3): attach the event bridge for every turn — including unobserved background/cron turns — so per-call usage reaches the cost tracker during the run and a later Err return no longer drops spend. - Tool timing (item 4): measure real `elapsed_ms` in `execute_openhuman_tool` and carry `elapsed_ms`/`output_chars` through the outcome side-channel so `ToolCallCompleted` shows real duration + output size instead of 0/0. - SSE per-tool events (item 5): re-publish `DomainEvent::ToolExecutionStarted` / `ToolExecutionCompleted` around session tool execution. - Child arg deltas (item 6): drop child-scoped `ToolCallArgsDelta` so a child run's argument composition no longer renders as the parent's own activity. - Unknown tools (item 7): default a missing tool outcome to `success: false` (hallucinated/unknown tool) in the post-turn records and derive the cap digest's per-tool `[ok|failed]` status from the captured outcomes. Entirely in the openhuman seam; vendor/tinyagents untouched. cargo check --lib clean. Tests deferred to the final parity test pass. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR

…nyhumansai#4469) Batchable low-severity parity items from the TinyAgents-migration audit: - item 1: correct the GoalBudgetStopHook "hard-stops the moment" docs (runtime.rs, turn/core.rs) to the actual graceful-pause semantics — the stop vote is drained at the next iteration boundary, so the current tool round + one wrap-up call still run (bounded overshoot). - item 2: forward is_local_provider / is_local_provider_for_model / loaded_context_window / prompt_cache_capabilities through TextModeProvider so a local provider behind text mode keeps its n_keep>=n_ctx guard and KV-cache pricing. - item 3: recover poisoned mutexes via PoisonError::into_inner on the error_slot recovery reads (tinyagents/mod.rs) and the EarlyExitHook slots (tinyagents/tools.rs) instead of panicking. - item 5: cap-summary output cap now honours the sub-agent definition's own max_output_tokens budget instead of the global AGENT_TURN_MAX_OUTPUT_TOKENS floor. - item 6: enforce a minimum envelope allowance in aggregate tool-result spill so a saturated allowed_len can't blank the [tool_result_preview] envelope and strip its artifact_path pointer. - item 8: remove the dead rolling_segment_recap method (zero callers since the migration) + refresh its module/helper docs. - item 9: fix the misleading "falling back to inline truncation" log — the envelope is kept regardless for its artifact_path pointer. - item 10: drop stale model_vision_context doc links (turn_attachments_context.rs, tokenjuice/savings.rs). - item 11: remove deleted harness/interrupt.rs from agent/README.md. - item 12: payload-summarizer threshold doc default 500000 -> 4000. - item 13: correct the TraceSpan doc — raw NDJSON is an OTel-style span dump, not directly Langfuse-ingestible (that is spans_to_langfuse_batch). Items 4 (max_iterations==0 -> 10) and 7 (LazyToolkitResolver / mid-turn resync) are already documented / tracked in-code; no change needed. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR

coderabbitai · 2026-07-04T10:19:42Z

Warning

Review limit reached

You’ve reached a temporary PR review limit under our Fair Usage Limits Policy.

Your recent review volume is higher than typical usage, so adaptive limits are currently applied.

Next review available in: 23 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b643ee8d-0ef5-4b03-bd34-a28e27afd38a

📥 Commits

Reviewing files that changed from the base of the PR and between 26dbf59 and 4aa6937.

📒 Files selected for processing (4)

src/openhuman/agent/harness/parse.rs
src/openhuman/agent/progress_tracing/tests.rs
src/openhuman/tinyagents/steering_forwarder.rs
src/openhuman/tools/impl/filesystem/update_memory_md.rs

📝 Walkthrough

Walkthrough

This PR adds Denied/ApprovalExpired tool-failure handling across classification, persistence, progress propagation, and UI/i18n; restores TinyAgents steering, middleware, parsing, usage, and subagent execution paths; gates stored turn content and corrects Langfuse usage accounting; and adds atomic memory writes plus artifact preview budgeting fixes.

Changes

Tool Failure Classification and Persistence

Layer / File(s)	Summary
Failure contract and classification `src/openhuman/tool_status/types.rs`, `src/openhuman/tool_status/ops.rs`	Adds `Denied`/`ApprovalExpired` failure classes and `UserDeclined`, updates classification precedence, and adds user-facing copy.
Failure persistence across snapshots `src/openhuman/threads/turn_state/types.rs`, `src/openhuman/threads/turn_state/mirror.rs`, `src/openhuman/threads/turn_state/mirror_tests.rs`, `src/openhuman/threads/turn_state/store_tests.rs`	Adds persisted failure payloads and threads them through timeline entries, subagent calls, mirror updates, and tests.
Progress bridge, runtime, and locale copy `src/openhuman/agent/progress.rs`, `src/openhuman/channels/providers/web/progress_bridge.rs`, `src/bin/harness_subagent_audit.rs`, `app/src/store/chatRuntimeSlice.ts`, `app/src/types/turnState.ts`, `app/src/pages/conversations/components/ProcessingTranscriptView.tsx`, `app/src/lib/i18n/*.ts`	Carries failure data through progress events, web payloads, runtime parsing, UI rendering, and locale strings.

Estimated code review effort: 5 (Critical) | ~120 minutes

TinyAgents Harness Restoration

Layer / File(s)	Summary
Run queue events and steering forwarder `src/core/event_bus/events.rs`, `src/openhuman/tinyagents/steering_forwarder.rs`	Adds run-queue delivery/requeue events and the RAII steering forwarder guard that drains, injects, and requeues messages.
Turn orchestration and channel/session wiring `src/openhuman/tinyagents/mod.rs`, `src/openhuman/tinyagents/convert.rs`, `src/openhuman/agent/harness/graph.rs`, `src/openhuman/agent/harness/session/turn/core.rs`, `src/openhuman/agent/harness/session/turn/graph.rs`, `src/openhuman/agent/harness/turn_attachments_context.rs`, `src/openhuman/thread_goals/runtime.rs`	Updates allowlist semantics, defer-completion behavior, request boundaries, empty-response handling, turn-content gating, and related orchestration comments.
Middleware stack, credential scrubbing, and schema guard `src/openhuman/tinyagents/middleware.rs`, `src/openhuman/agent/harness/credentials.rs`, `src/openhuman/agent/harness/mod.rs`, `src/openhuman/agent/harness/memory_protocol.rs`	Restores loop guards, transcript snapshots, tool exposure, credential redaction, schema validation, and argument recovery.
Provider model parsing and usage carry `src/openhuman/tinyagents/model.rs`, `src/openhuman/tinyagents/abort_guard.rs`, `src/openhuman/agent/harness/parse.rs`, `src/openhuman/inference/provider/mod.rs`, `src/openhuman/inference/provider/resolved_route.rs`, `src/openhuman/agent/harness/agent_graph.rs`	Adds abort-on-drop task handling, P-Format-aware parsing, route-slot propagation, and provider usage carry plumbing.
Fault-tolerant summarizer and observability bridge `src/openhuman/tinyagents/summarize.rs`, `src/openhuman/tinyagents/observability.rs`	Adds cached deterministic fallback summarization and records richer tool-failure and usage data through the bridge.
Subagent runner tokenjuice and breaker halt `src/openhuman/agent/harness/subagent_runner/*`, `src/openhuman/tinyagents/tests.rs`	Threads TokenJuice compression and breaker-halt state through subagent execution, worker-thread mirroring, and related tests.
Tool telemetry and harness tests `src/openhuman/tinyagents/tools.rs`, `src/openhuman/tinyagents/tests.rs`	Publishes tool execution timing events and updates harness tests for allowlist, completion-deferment, and inventory changes.

Estimated code review effort: 5 (Critical) | ~150 minutes

Sequence Diagram(s)

sequenceDiagram
  participant ToolStatusOps
  participant TurnStateMirror
  participant ProgressBridge
  participant ProcessingTranscriptView
  ToolStatusOps->>TurnStateMirror: classify and persist failure
  TurnStateMirror->>ProgressBridge: forward failure payload
  ProgressBridge->>ProcessingTranscriptView: render localized failure copy

Observability Content-Capture Gating and Cost Accounting

Layer / File(s)	Summary
SpanCollector content capture gate `src/openhuman/agent/progress_tracing.rs`, `src/openhuman/agent/progress_tracing/tests.rs`, `src/openhuman/channels/providers/web/progress_bridge.rs`	Adds a storage-time content-capture flag, gates `TurnContent` attachment and export logging, and updates related tests and bridge wiring.
Langfuse usage bucket accounting `src/openhuman/agent/progress_tracing/langfuse.rs`	Changes `usageDetails.input` to the non-cached input-token bucket while preserving total reconciliation.

Estimated code review effort: 3 (Moderate) | ~25 minutes

Memory Tool Atomicity and Supporting Fixes

Layer / File(s)	Summary
Atomic memory writes and tool registration `src/openhuman/tools/impl/filesystem/update_memory_md.rs`, `src/openhuman/tools/ops.rs`, `src/openhuman/agent/harness/memory_protocol.rs`	Adds per-workspace write locking and atomic rename writes for memory updates, registers the memory tool, and reclassifies preference writes.
Tool-result artifact preview budgeting `src/openhuman/agent/harness/tool_result_artifacts/mod.rs`	Introduces a minimum envelope allowance and updates persistence/spill handling so artifact pointers survive very small budgets.
Documentation and comment updates `src/openhuman/agent/harness/archivist/recap.rs`, `src/openhuman/agent/README.md`, `src/openhuman/thread_goals/runtime.rs`, `src/openhuman/agent/harness/turn_attachments_context.rs`, `src/openhuman/tokenjuice/savings.rs`, `src/openhuman/tinyagents/payload_summarizer.rs`	Updates comments/docs for the retired rolling recap entry point, public surface, pause semantics, turn attachment context, tokenjuice savings, and payload summarizer threshold.

Estimated code review effort: 3 (Moderate) | ~30 minutes

Possibly related issues

Tool-failure classification: approval denial → 'try again', TTL expiry → auto-retry Timeout, failure missing from snapshot #4459: Matches the same tool-failure classification and persistence flow.
TinyAgents migration (v0.58.7 → main) parity & bug-fix tracker — harness audit findings #4470: Covers the new Denied/ApprovalExpired failure path and surfacing work.
Subagent persistence/observability: failed runs persist nothing, degraded mirror, lossy deltas, tokenjuice off, breaker halt reports Completed #4466: Relates to subagent failure propagation, tokenjuice compression, and breaker-halt handling.
Steering forwarder task leaks on cancelled turns; late steer messages silently lost #4456: Matches the steering-forwarder abort/requeue fix.
Trace exporter ignores capture_content — prompts/replies written to NDJSON/app log; Langfuse cached-token double-count #4454: Covers tracing content-capture gating and Langfuse accounting.
Memory protocol mandates unregistered update_memory_md tool — permanent index-drift nag loop; non-atomic MEMORY.md writes #4458: Matches the atomic memory write and tool registration changes.
Context compaction: summarizer failure aborts the turn; re-summarizes on every model call once over threshold #4461: Relates to the fault-tolerant summarizer and TinyAgents context-compaction path.
after_tool middleware order inverted: progressive-disclosure handoff is dead code, outcome capture sees pre-cap content #4464: Touches the same after-tool ordering and handoff/capture wiring.

Possibly related PRs

tinyhumansai/openhuman#4413: Extends the same tool-failure UI/i18n flow with denied/approvalExpired keys and rendering.
tinyhumansai/openhuman#4445: Extends the same tool-failure classification and progress-bridge pipeline.
tinyhumansai/openhuman#4434: Shares the same TurnContent capture and tracing changes.

Suggested labels: feature, rust-core, agent, memory, bug

Suggested reviewers: M3gA-Mind

Poem

A rabbit hops through code so bright,
Denied and expired now read right.
Steers return and drains unwind,
TinyAgents leave less behind.
Content hides till flags say "yes" —
A tidy burrow, more or less 🐇

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the PR’s broad TinyAgents migration parity and bug-fix scope.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{Comment @coderabbitai help to get the list of available commands.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 294a364372

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

# Conflicts: # src/openhuman/tinyagents/tests.rs

…er upstream merge Post-merge integration fix: tinyhumansai#4459 added a `failure` field to AgentProgress::SubagentToolCallCompleted; the merged unknown-tool recovery path (observability.rs) needed to populate it. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 277f6f6101

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-07-04T11:20:49Z

+                    handle.spawn(async move {
+                        let _ = tx.send(progress).await;


Preserve progress ordering under backpressure

When the progress channel is full, each event is handed to its own spawned task. Those tasks are scheduled independently, so a later completion can be enqueued before the earlier start event; TurnStateMirror only updates an existing row for ToolCallCompleted by call_id, so an out-of-order completion is dropped and the UI/persisted turn can leave that tool row stuck in running. Use a single ordered drain/worker or otherwise serialize the fallback sends instead of spawning one task per event.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/openhuman/channels/providers/web/progress_bridge.rs (1)
416-426: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Align the live failure payload with the persisted shape ClassifiedFailure serializes as snake_case here (and in the subagent completion path), but PersistedToolFailure is camelCase (causePlain/nextAction). The same failure data ends up under different field names across live vs. reloaded turns; route the live payload through the same camelCase type or add a matching serde rename on ClassifiedFailure.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/channels/providers/web/progress_bridge.rs` around lines 416 -
426, The live ToolCallCompleted failure payload is being serialized from
ClassifiedFailure with snake_case field names, while the persisted/reloaded path
uses PersistedToolFailure with camelCase names. Update the failure handling in
progress_bridge.rs so the UI and ledger use the same shape as the persisted
data, either by routing the live serialization through PersistedToolFailure or
by adding matching serde rename annotations on ClassifiedFailure; make the same
adjustment in the subagent completion path if it uses the same failure payload.

🧹 Nitpick comments (2)

src/openhuman/agent/harness/archivist/recap.rs (1)
71-79: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Stale doc references removed "rolling recap" caller pattern.

The unchanged doc block above still describes callers driving "the live prompt (rolling recap → compaction)" as if rolling_segment_recap still exists, but this PR removes that entry point entirely. Worth a follow-up cleanup to avoid confusing future readers.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/harness/archivist/recap.rs` around lines 71 - 79, The doc
comment on the recap helper still references the removed live-prompt “rolling
recap → compaction” caller pattern, which is now stale. Update the documentation
on the recap function in recap.rs to remove that obsolete wording and describe
only the remaining finalize/fallback behavior, so the comments match the current
API and no longer mention the deleted rolling_segment_recap path.
src/bin/harness_subagent_audit.rs (1)
579-610: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Consider logging the new failure field for audit visibility.

Other branches in this handler log diagnostic fields (tool_name, success, elapsed_ms, etc.), but the newly added failure field is discarded here. Since this is an audit/diagnostics binary, surfacing failure.class/cause_plain when success is false could aid debugging subagent tool failures, consistent with the #4459 propagation work highlighted elsewhere in this PR.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/bin/harness_subagent_audit.rs` around lines 579 - 610, The
SubagentToolCallCompleted branch in harness_subagent_audit.rs is dropping the
newly added failure diagnostics, so update the handler to log and record the
failure field alongside tool_name, success, and elapsed_ms. Use the
AgentProgress::SubagentToolCallCompleted match arm and the
SubagentToolCompletedEvent push to surface failure.class and cause_plain when
success is false, keeping the audit output consistent with the rest of the
diagnostics in this handler.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/agent/harness/archivist/recap.rs`:
- Around line 1-6: The remaining test target still calls the removed
rolling_segment_recap entry point, so update those call sites before or
alongside deleting it. Find the ArchivistHook-related test usage in
tests/agent_archivist_debug_round21_raw_coverage_e2e.rs and replace
hook.rolling_segment_recap(...) and enabled.rolling_segment_recap(...) with the
current recap path used by ArchivistHook, keeping the test aligned with the
shared on_segment_closed-based summarization flow.

In `@src/openhuman/agent/harness/parse.rs`:
- Around line 758-806: The P-Format recovery loop in parse.rs only advances
through one json_calls entry per <tool_call> tag, so multi-call tags can lose
additional canonical calls and desynchronize later fallbacks. Update the combine
logic inside the while loop around find_first_tag, parse_call, and json_idx so
each recovered tag consumes and appends all canonical calls produced for that
tag rather than a single json_calls[json_idx] item. Keep the existing P-Format
recovery behavior, but ensure the JSON fallback stays in sync with the full set
of canonical calls for the current tag.

In `@src/openhuman/agent/harness/tool_result_artifacts/mod.rs`:
- Around line 381-388: The aggregate spill path in
tool_result_artifacts::mod::bound_text_to_budget is over-allocating because the
per-item floor is always forced to MIN_ENVELOPE_ALLOWANCE_BYTES, which can make
total exceed the overall budget. Update the logic around envelope_allowance so
the minimum is capped by the remaining aggregate budget for the spill batch, and
keep the existing preview-envelope protection only when it fits within the total
limit. Use the bound_text_to_budget call site and the envelope_allowance
calculation in the spill loop to ensure total stays within the requested budget
and the total <= budget assertion continues to hold.

In `@src/openhuman/tinyagents/observability.rs`:
- Around line 265-290: The overflow path in send() can reorder AgentProgress
events because each TrySendError::Full case spawns an independent task and
awaits tx.send(progress) out of sequence. Update the observability flow around
send() and spawn_progress_bridge so backpressure is handled by a single ordered
forwarder/queue rather than per-event spawned sends, preserving the original
progress order for text and tool updates.

In `@src/openhuman/tinyagents/steering_forwarder.rs`:
- Around line 225-298: Residual drained messages lose their original queue mode
because the requeue path in steering_forwarder.rs strips both STEER_PREFIX and
COLLECT_PREFIX but always pushes recovered texts back through RunQueue::push
with QueueMode::Steer. Update the drain/requeue flow around handle.drain(),
requeue_texts, and the detached rt.spawn block to preserve the matched prefix’s
mode and pass that mode through when constructing QueuedMessage, so collect
context is requeued as QueueMode::Collect instead of being mislabeled as steer.

In `@src/openhuman/tools/impl/filesystem/update_memory_md.rs`:
- Around line 55-85: The atomic_write helper in update_memory_md leaves behind a
temp file when tokio::fs::write fails before the rename step. Update
atomic_write to perform best-effort cleanup of tmp_path on the initial write
error as well as in the rename failure branch, so any partially written
.{file}.{pid}.{seq}.tmp is removed regardless of which filesystem operation
fails.
- Around line 15-46: The current workspace write guard in workspace_write_lock
and WORKSPACE_WRITE_LOCKS only prevents races within one process, so concurrent
cron runs can still overwrite MEMORY.md/SKILL.md during update_memory_md.
Replace or augment the in-process mutex with an inter-process locking mechanism
around the read-modify-write flow in update_memory_md so separate
tokio::process::Command invocations contend on the same workspace lock and
serialize writes across processes.

---

Outside diff comments:
In `@src/openhuman/channels/providers/web/progress_bridge.rs`:
- Around line 416-426: The live ToolCallCompleted failure payload is being
serialized from ClassifiedFailure with snake_case field names, while the
persisted/reloaded path uses PersistedToolFailure with camelCase names. Update
the failure handling in progress_bridge.rs so the UI and ledger use the same
shape as the persisted data, either by routing the live serialization through
PersistedToolFailure or by adding matching serde rename annotations on
ClassifiedFailure; make the same adjustment in the subagent completion path if
it uses the same failure payload.

---

Nitpick comments:
In `@src/bin/harness_subagent_audit.rs`:
- Around line 579-610: The SubagentToolCallCompleted branch in
harness_subagent_audit.rs is dropping the newly added failure diagnostics, so
update the handler to log and record the failure field alongside tool_name,
success, and elapsed_ms. Use the AgentProgress::SubagentToolCallCompleted match
arm and the SubagentToolCompletedEvent push to surface failure.class and
cause_plain when success is false, keeping the audit output consistent with the
rest of the diagnostics in this handler.

In `@src/openhuman/agent/harness/archivist/recap.rs`:
- Around line 71-79: The doc comment on the recap helper still references the
removed live-prompt “rolling recap → compaction” caller pattern, which is now
stale. Update the documentation on the recap function in recap.rs to remove that
obsolete wording and describe only the remaining finalize/fallback behavior, so
the comments match the current API and no longer mention the deleted
rolling_segment_recap path.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 43fb4bad-d9d4-40d5-b9b7-843a1f216bb1

📥 Commits

Reviewing files that changed from the base of the PR and between 760e5cb and 277f6f6.

📒 Files selected for processing (63)

app/src/lib/i18n/ar.ts
app/src/lib/i18n/bn.ts
app/src/lib/i18n/de.ts
app/src/lib/i18n/en.ts
app/src/lib/i18n/es.ts
app/src/lib/i18n/fr.ts
app/src/lib/i18n/hi.ts
app/src/lib/i18n/id.ts
app/src/lib/i18n/it.ts
app/src/lib/i18n/ko.ts
app/src/lib/i18n/pl.ts
app/src/lib/i18n/pt.ts
app/src/lib/i18n/ru.ts
app/src/lib/i18n/zh-CN.ts
app/src/pages/conversations/components/ProcessingTranscriptView.tsx
app/src/store/chatRuntimeSlice.ts
app/src/types/turnState.ts
src/bin/harness_subagent_audit.rs
src/core/event_bus/events.rs
src/openhuman/agent/README.md
src/openhuman/agent/harness/agent_graph.rs
src/openhuman/agent/harness/archivist/recap.rs
src/openhuman/agent/harness/credentials.rs
src/openhuman/agent/harness/graph.rs
src/openhuman/agent/harness/memory_protocol.rs
src/openhuman/agent/harness/mod.rs
src/openhuman/agent/harness/parse.rs
src/openhuman/agent/harness/session/turn/core.rs
src/openhuman/agent/harness/session/turn/graph.rs
src/openhuman/agent/harness/subagent_runner/autonomous.rs
src/openhuman/agent/harness/subagent_runner/mod.rs
src/openhuman/agent/harness/subagent_runner/ops/graph.rs
src/openhuman/agent/harness/subagent_runner/ops/runner.rs
src/openhuman/agent/harness/tool_result_artifacts/mod.rs
src/openhuman/agent/harness/turn_attachments_context.rs
src/openhuman/agent/progress.rs
src/openhuman/agent/progress_tracing.rs
src/openhuman/agent/progress_tracing/langfuse.rs
src/openhuman/agent/progress_tracing/tests.rs
src/openhuman/channels/providers/web/progress_bridge.rs
src/openhuman/inference/provider/mod.rs
src/openhuman/inference/provider/resolved_route.rs
src/openhuman/thread_goals/runtime.rs
src/openhuman/threads/turn_state/mirror.rs
src/openhuman/threads/turn_state/mirror_tests.rs
src/openhuman/threads/turn_state/store_tests.rs
src/openhuman/threads/turn_state/types.rs
src/openhuman/tinyagents/abort_guard.rs
src/openhuman/tinyagents/convert.rs
src/openhuman/tinyagents/middleware.rs
src/openhuman/tinyagents/mod.rs
src/openhuman/tinyagents/model.rs
src/openhuman/tinyagents/observability.rs
src/openhuman/tinyagents/payload_summarizer.rs
src/openhuman/tinyagents/steering_forwarder.rs
src/openhuman/tinyagents/summarize.rs
src/openhuman/tinyagents/tests.rs
src/openhuman/tinyagents/tools.rs
src/openhuman/tokenjuice/savings.rs
src/openhuman/tool_status/ops.rs
src/openhuman/tool_status/types.rs
src/openhuman/tools/impl/filesystem/update_memory_md.rs
src/openhuman/tools/ops.rs

💤 Files with no reviewable changes (1)

src/openhuman/agent/README.md

coderabbitai · 2026-07-04T11:30:37Z

    fn send(&self, progress: AgentProgress) {
-        if let Some(tx) = &self.on_progress {
-            let _ = tx.try_send(progress);
+        use tokio::sync::mpsc::error::TrySendError;
+        let Some(tx) = &self.on_progress else {
+            return;
+        };
+        match tx.try_send(progress) {
+            Ok(()) => {}
+            Err(TrySendError::Closed(_)) => {}
+            Err(TrySendError::Full(progress)) => {
+                // Backpressure, not capacity loss: hand the delta to an awaited
+                // `send()` on a spawned task rather than dropping it. Guard on a
+                // live runtime so a non-async construction path can't panic.
+                if let Ok(handle) = tokio::runtime::Handle::try_current() {
+                    let tx = tx.clone();
+                    handle.spawn(async move {
+                        let _ = tx.send(progress).await;
+                    });
+                } else {
+                    tracing::debug!(
+                        model = %self.model,
+                        "[tinyagents] progress channel full and no runtime to defer send; dropping one delta"
+                    );
+                }
+            }
        }
    }


🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail # Map and inspect the relevant file. ast-grep outline src/openhuman/tinyagents/observability.rs --view expanded || true echo '--- send()/record_usage() around the cited lines ---' sed -n '220,340p' src/openhuman/tinyagents/observability.rs echo '--- find progress bridge / consumer ---' rg -n "spawn_progress_bridge|on_progress|AgentProgress|TextDelta|ThinkingDelta|ToolCompleted|UsageRecorded" src/openhuman -S

Repository: tinyhumansai/openhuman

Length of output: 50378

🏁 Script executed:

#!/bin/bash set -euo pipefail echo '--- spawn_progress_bridge definition ---' rg -n -A80 -B20 "fn spawn_progress_bridge" src/openhuman/channels/providers/web.rs src/openhuman -S echo '--- turn-state mirror ordering assumptions ---' sed -n '1,120p' src/openhuman/threads/turn_state/mirror.rs echo '--- relevant observe arms ---' sed -n '340,460p' src/openhuman/threads/turn_state/mirror.rs echo '--- send() tests / ordering-related tests in observability.rs ---' sed -n '840,1025p' src/openhuman/tinyagents/observability.rs

Repository: tinyhumansai/openhuman

Length of output: 10977

🏁 Script executed:

#!/bin/bash set -euo pipefail echo '--- progress bridge event loop / forwarding ---' sed -n '197,320p' src/openhuman/channels/providers/web/progress_bridge.rs echo '--- turn-state mirror relevant event handling ---' sed -n '340,460p' src/openhuman/threads/turn_state/mirror.rs echo '--- any tests asserting ordering on a sequential receiver ---' rg -n -A8 -B8 "TextDelta|ThinkingDelta|ToolCallStarted|ToolCallCompleted|SubagentTextDelta|SubagentThinkingDelta" src/openhuman/tinyagents/observability.rs src/openhuman/threads/turn_state/mirror_tests.rs src/openhuman/channels/providers/web/progress_bridge.rs -S

Repository: tinyhumansai/openhuman

Length of output: 50378

Deferred send() can reorder progress events under backpressure.

spawn_progress_bridge drains AgentProgress sequentially, so a full-channel event handed to a spawned send().await can land after later fast-path sends and scramble text/tool ordering. Route overflow delivery through a single ordered forwarder instead of spawning per-event tasks.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/tinyagents/observability.rs` around lines 265 - 290, The overflow path in send() can reorder AgentProgress events because each TrySendError::Full case spawns an independent task and awaits tx.send(progress) out of sequence. Update the observability flow around send() and spawn_progress_bridge so backpressure is handled by a single ordered forwarder/queue rather than per-event spawned sends, preserving the original progress order for text and tool updates.

coderabbitai · 2026-07-04T11:30:37Z

+/// Process-global registry of per-workspace write locks (#4458).
+///
+/// `update_memory_md` performs a read-modify-write on `MEMORY.md`/`SKILL.md`.
+/// The per-run `MemoryProtocolTracker` has zero cross-run awareness, so two
+/// concurrent runs (parallel forks, cron) racing the same workspace file would
+/// otherwise clobber each other's append. We serialize every write to a given
+/// workspace directory through a shared async mutex, keyed by the canonicalized
+/// (falling back to raw) workspace path, so concurrent index updates queue
+/// instead of racing. Combined with the temp-file + atomic-rename write below,
+/// a killed process can never leave a truncated file.
+static WORKSPACE_WRITE_LOCKS: LazyLock<Mutex<HashMap<PathBuf, Arc<tokio::sync::Mutex<()>>>>> =
+    LazyLock::new(|| Mutex::new(HashMap::new()));
+
+/// Monotonic counter for temp-file uniqueness within a process.
+static TEMP_FILE_SEQ: AtomicU64 = AtomicU64::new(0);
+
+/// Return (creating if needed) the shared async write lock for `workspace_dir`.
+///
+/// The lock key is the canonicalized workspace path when it resolves (so two
+/// spellings of the same directory share one lock), else the raw path.
+fn workspace_write_lock(workspace_dir: &Path) -> Arc<tokio::sync::Mutex<()>> {
+    let key = workspace_dir
+        .canonicalize()
+        .unwrap_or_else(|_| workspace_dir.to_path_buf());
+    let mut map = WORKSPACE_WRITE_LOCKS
+        .lock()
+        .unwrap_or_else(|poisoned| poisoned.into_inner());
+    Arc::clone(
+        map.entry(key)
+            .or_insert_with(|| Arc::new(tokio::sync::Mutex::new(()))),
+    )
+}


🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check whether cron/background/parallel-agent execution spawns a new OS # process (Command::new / std::process) vs staying in-process (tokio::spawn). rg -n "std::process::Command|tokio::process::Command" src/openhuman -g '*.rs' | rg -i "cron|subagent|parallel|daemon" fd -a cron src/openhuman rg -n "fn run\b" -A 8 src/openhuman/tools/impl/system/cron 2>/dev/null

Repository: tinyhumansai/openhuman

Length of output: 328

Use cross-process locking for workspace writes
WORKSPACE_WRITE_LOCKS only serializes writers inside one OS process. src/openhuman/cron/scheduler.rs launches work via tokio::process::Command, so separate cron invocations won't share this mutex and can still clobber the same MEMORY.md/SKILL.md during the read-modify-write cycle. If those paths can overlap, this needs an inter-process lock.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/tools/impl/filesystem/update_memory_md.rs` around lines 15 - 46, The current workspace write guard in workspace_write_lock and WORKSPACE_WRITE_LOCKS only prevents races within one process, so concurrent cron runs can still overwrite MEMORY.md/SKILL.md during update_memory_md. Replace or augment the in-process mutex with an inter-process locking mechanism around the read-modify-write flow in update_memory_md so separate tokio::process::Command invocations contend on the same workspace lock and serialize writes across processes.

…ings Post-merge CI fix: the tinyhumansai#4459 denied/approvalExpired locale strings were not prettier-wrapped. No key/text changes, formatting only. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR

# Conflicts: # src/openhuman/agent/progress_tracing.rs # src/openhuman/agent/progress_tracing/langfuse.rs # src/openhuman/channels/providers/web/progress_bridge.rs

coderabbitai

🧹 Nitpick comments (2)

src/openhuman/tinyagents/tests.rs (1)
512-536: 🎯 Functional Correctness | 🔵 Trivial

Unverified middleware-count assertions — flagged by the author as pending.

The inline NOTE explicitly states these counts (15, 4, and the analogous 13 at Line 591) are derived by addition/inference and "are NOT compiled by cargo check --lib (this is a #[cfg(test)] block) and are pending the deferred test pass". Given the PR's stated scope ("Tests were not written or run in this pass"), these hardcoded numbers are a real risk of drifting from the actual runtime middleware inventory once the test suite is compiled/run.

Since the PR explicitly earmarks a follow-up test pass, consider tracking this as a checklist item so the counts get validated rather than silently trusted.

Also applies to: 588-593
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/tinyagents/tests.rs` around lines 512 - 536, The middleware
inventory assertions in the TinyAgents tests are still unverified and were
explicitly marked as pending, so replace the hardcoded counts with values that
are validated by the test suite or defer them behind a clear follow-up check.
Update the assertions around assembled.harness.middleware(),
mw.tool_middleware_len(), and the matching inventory check near the later block
to reflect the actual compiled middleware stack, and keep the note synchronized
with the verified source of truth.
src/openhuman/agent/progress_tracing/langfuse.rs (1)
262-278: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Add test coverage for the cached-token subtraction path.

The core behavior change here is "input_tokens is INCLUSIVE of cached prompt tokens... so emit the NON-cached input (input - cached)", but none of the new tests set gen_ai.usage.cached_input_tokens to a nonzero value alongside a nonzero input_tokens to confirm usageDetails.input is actually reduced. All existing tests use cached = 0, so the saturating_sub branch is currently unexercised.
✅ Suggested test addition
+    #[test]
+    fn usage_details_input_excludes_cached_tokens() {
+        let mut gen = span(
+            "trace-1", "gen-1", Some("root"), "llm.agentic-v1",
+            SpanKind::Generation, SpanStatus::Ok, 1_000, Some(1_500),
+        );
+        gen.attributes.clear();
+        gen.attributes.insert("gen_ai.usage.input_tokens".into(), json!(1_000));
+        gen.attributes.insert("gen_ai.usage.output_tokens".into(), json!(200));
+        gen.attributes.insert("gen_ai.usage.cached_input_tokens".into(), json!(300));
+        let payload = spans_to_langfuse_batch(&[gen], false, "production");
+        let usage = &payload["batch"][1]["body"]["usageDetails"];
+        assert_eq!(usage["input"], 700, "non-cached input only");
+        assert_eq!(usage["cache_read_input_tokens"], 300);
+        assert_eq!(usage["total"], 1_200, "total = raw input + output");
+    }
Also applies to: 712-781
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/progress_tracing/langfuse.rs` around lines 262 - 278, Add
test coverage in the Langfuse tracing tests for the cached-token subtraction
path in langfuse:: usage conversion. Create a case that sets both
gen_ai.usage.input_tokens and gen_ai.usage.cached_input_tokens to nonzero
values, then assert usageDetails.input is reduced to input - cached while
cache_read_input_tokens reflects the cached amount and total still reconciles
correctly. Reuse the existing langfuse conversion helpers/tests so the new
assertion directly exercises the non_cached_input branch and the saturating_sub
logic.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/openhuman/agent/progress_tracing/langfuse.rs`:
- Around line 262-278: Add test coverage in the Langfuse tracing tests for the
cached-token subtraction path in langfuse:: usage conversion. Create a case that
sets both gen_ai.usage.input_tokens and gen_ai.usage.cached_input_tokens to
nonzero values, then assert usageDetails.input is reduced to input - cached
while cache_read_input_tokens reflects the cached amount and total still
reconciles correctly. Reuse the existing langfuse conversion helpers/tests so
the new assertion directly exercises the non_cached_input branch and the
saturating_sub logic.

In `@src/openhuman/tinyagents/tests.rs`:
- Around line 512-536: The middleware inventory assertions in the TinyAgents
tests are still unverified and were explicitly marked as pending, so replace the
hardcoded counts with values that are validated by the test suite or defer them
behind a clear follow-up check. Update the assertions around
assembled.harness.middleware(), mw.tool_middleware_len(), and the matching
inventory check near the later block to reflect the actual compiled middleware
stack, and keep the note synchronized with the verified source of truth.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f69142c2-dc79-467b-8c2b-4ea28f04f73a

📥 Commits

Reviewing files that changed from the base of the PR and between 277f6f6 and 26dbf59.

📒 Files selected for processing (23)

app/src/lib/i18n/ar.ts
app/src/lib/i18n/bn.ts
app/src/lib/i18n/de.ts
app/src/lib/i18n/es.ts
app/src/lib/i18n/fr.ts
app/src/lib/i18n/hi.ts
app/src/lib/i18n/id.ts
app/src/lib/i18n/it.ts
app/src/lib/i18n/ko.ts
app/src/lib/i18n/pl.ts
app/src/lib/i18n/pt.ts
app/src/lib/i18n/ru.ts
app/src/lib/i18n/zh-CN.ts
src/bin/harness_subagent_audit.rs
src/openhuman/agent/harness/tool_result_artifacts/mod.rs
src/openhuman/agent/progress.rs
src/openhuman/agent/progress_tracing.rs
src/openhuman/agent/progress_tracing/langfuse.rs
src/openhuman/agent/progress_tracing/tests.rs
src/openhuman/channels/providers/web/progress_bridge.rs
src/openhuman/threads/turn_state/mirror.rs
src/openhuman/tinyagents/observability.rs
src/openhuman/tinyagents/tests.rs

✅ Files skipped from review due to trivial changes (10)

app/src/lib/i18n/de.ts
app/src/lib/i18n/ar.ts
app/src/lib/i18n/ru.ts
app/src/lib/i18n/zh-CN.ts
app/src/lib/i18n/id.ts
app/src/lib/i18n/pt.ts
app/src/lib/i18n/es.ts
src/bin/harness_subagent_audit.rs
app/src/lib/i18n/fr.ts
app/src/lib/i18n/pl.ts

🚧 Files skipped from review as they are similar to previous changes (11)

app/src/lib/i18n/hi.ts
app/src/lib/i18n/it.ts
app/src/lib/i18n/bn.ts
src/openhuman/agent/progress_tracing/tests.rs
app/src/lib/i18n/ko.ts
src/openhuman/channels/providers/web/progress_bridge.rs
src/openhuman/agent/progress.rs
src/openhuman/agent/harness/tool_result_artifacts/mod.rs
src/openhuman/threads/turn_state/mirror.rs
src/openhuman/tinyagents/observability.rs
src/openhuman/agent/progress_tracing.rs

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 26dbf59924

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-07-04T12:24:44Z

+    let mut provider_model = ProviderModel::new(provider, model, temperature)
+        .with_usage_carry(provider_usage_carry.clone());


Share usage carry with fallback route models

When the primary route fails and RunPolicy.fallback selects one of the route models registered by routes::build_route_models, that ProviderModel is still constructed with its default private usage_carry (see routes.rs), while the bridge only drains the provider_usage_carry created here. Successful fallback calls therefore never provide their provider-reported charged USD/context/cache/reasoning breakdown to the bridge, so the turn falls back to catalog estimates despite the new carry path; thread this same carry into the fallback route models as well.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-04T12:24:44Z

+        let cache_creation_tokens = carried
+            .as_ref()
+            .map(|u| u.cache_creation_tokens)
+            .filter(|t| *t > 0)
+            .unwrap_or(usage.cache_creation_tokens);
+        let reasoning_tokens = carried
+            .as_ref()
+            .map(|u| u.reasoning_tokens)
+            .filter(|t| *t > 0)
+            .unwrap_or(usage.reasoning_tokens);


Use carried token breakdown in progress events

For providers where cache-creation or reasoning tokens only exist on the carried UsageInfo, these local values are recorded into UsageInfo but the ModelCallCompleted event below still sends usage.cache_creation_tokens and usage.reasoning_tokens. That leaves the per-call progress/Langfuse generation telemetry at zero for those fields even though the cost tracker got the right values; use the carried cache_creation_tokens/reasoning_tokens variables in the log and progress event too.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-04T12:24:44Z

                        // bounded size for the wire (#4007); `output_chars` +
                        // `elapsed_ms` still ride along in `subagent` below.
                        output: Some(cap_wire_output(output)),
+                        failure: failure_json,


Forward subagent failure payload into live state

This socket event now includes the structured failure, but the frontend ChatSubagentToolResultEvent type still has no failure field and onSubagentToolResult only updates status/timing/result. When a sub-agent tool fails during a live turn, the emitted explanation is dropped before it reaches SubagentToolCallEntry, so the child row loses the new “why/next” copy until a snapshot reload; add the field to the event type and parse/store it in the live handler.

Useful? React with 👍 / 👎.

…ests The tinyhumansai#4459 failure field on AgentProgress::SubagentToolCallCompleted must be set in the two test constructions upstream tinyhumansai#4498's telemetry tests added (cargo check --lib does not compile #[cfg(test)], so this surfaced only in the coverage job's cargo test). Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR

…p cleanup - Restore ArchivistHook::rolling_segment_recap (tinyhumansai#4469 removed it but an E2E test target still calls it; the removal was optional 'remove or re-wire'). - Preserve QueueMode when re-queueing residual steer/collect on forwarder drop so a delivered collect line isn't re-framed as user steering (tinyhumansai#4456). - Clean up the staged temp file if the initial write fails in update_memory_md (tinyhumansai#4458). Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR

…yhumansai#4465) A <tool_call> tag that parses as P-Format has no json_calls entry (the JSON pass can't parse it), so advancing json_idx shifted every later JSON tag onto the wrong index and silently dropped a real JSON call. Both review bots flagged this. Leave json_idx untouched for P-Format tags. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4aa69372e4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-07-04T12:59:39Z

            next_span_seq: 0,
            id_prefix: uuid::Uuid::new_v4().simple().to_string(),
+            // Metadata-only by default; opt in via `with_content_capture`.
+            capture_content: false,


Wire the capture flag into SpanCollector

When observability.agent_tracing.capture_content is enabled, the progress bridge sets the flag on TraceContext and then constructs the collector with SpanCollector::new(trace_ctx), but this new storage gate is always initialized to false unless the separate with_content_capture() builder is called. In the live path that builder is never used, so AgentProgress::TurnContent emitted by the session turn is dropped at storage time and opt-in prompt/reply capture never reaches the trace/exporter; initialize this from ctx.capture_content or call the builder at the construction site.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-04T12:59:39Z

+        parsed_tool_calls = combined.len(),
+        "[agent_parse] P-Format-aware parse produced combined tool-call set"
+    );
+    (narrative, combined)


Preserve canonical calls outside literal tool tags

When a P-Format response also contains a canonical call that is not a literal <tool_call> tag, such as a ```tool_call block or GLM-style line after a recovered name[...] tag, `parse_tool_calls(response)` can put that call in `json_calls`, but the loop above only walks `TOOL_CALL_OPEN_TAGS` and never consumes those non-tag calls. Because `combined` is nonempty, returning it here drops the remaining canonical calls and executes only the recovered P-Format call; append any unconsumed canonical calls (or merge by source span) before returning.

Useful? React with 👍 / 👎.

senamakel added 18 commits July 4, 2026 03:21

senamakel requested a review from a team July 4, 2026 10:19

chatgpt-codex-connector Bot reviewed Jul 4, 2026

View reviewed changes

Comment thread src/openhuman/tinyagents/steering_forwarder.rs Outdated

Comment thread src/openhuman/agent/harness/parse.rs Outdated

senamakel added 3 commits July 4, 2026 10:29

Merge remote-tracking branch 'upstream/main' into agent/parity-all

a35d861

# Conflicts: # src/openhuman/tinyagents/tests.rs

Merge remote-tracking branch 'upstream/main' into agent/parity-all

518ac5b

coderabbitai Bot added agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. bug rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. labels Jul 4, 2026

github-project-automation Bot added this to Team Openhuman Jul 4, 2026

github-project-automation Bot moved this to Todo in Team Openhuman Jul 4, 2026

chatgpt-codex-connector Bot reviewed Jul 4, 2026

View reviewed changes

coderabbitai Bot requested changes Jul 4, 2026

View reviewed changes

senamakel added 2 commits July 4, 2026 11:35

Merge remote-tracking branch 'upstream/main' into agent/parity-all

26dbf59

# Conflicts: # src/openhuman/agent/progress_tracing.rs # src/openhuman/agent/progress_tracing/langfuse.rs # src/openhuman/channels/providers/web/progress_bridge.rs

coderabbitai Bot added feature Net-new user-facing capability or product behavior. memory Memory store, memory tree, recall, summarization, and embeddings in src/openhuman/memory/. labels Jul 4, 2026

coderabbitai Bot reviewed Jul 4, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jul 4, 2026

View reviewed changes

senamakel added 3 commits July 4, 2026 12:41

chatgpt-codex-connector Bot reviewed Jul 4, 2026

View reviewed changes

senamakel merged commit f89c5e0 into tinyhumansai:main Jul 4, 2026
15 checks passed

github-project-automation Bot moved this from Todo to Done in Team Openhuman Jul 4, 2026

This was referenced Jul 4, 2026

fix(agent): follow-up review fixes for tinyagents parity (#4451–#4469) #4504

Merged

test(agent): parity test backfill — credential scrub + failure classification (#4468) #4509

Merged

		let mut provider_model = ProviderModel::new(provider, model, temperature)
		.with_usage_carry(provider_usage_carry.clone());

Uh oh!

Conversation

senamakel commented Jul 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Phase 1 — critical correctness & security

Phase 2 — durable-state corruption & leaks

Phase 3 — follow-up PR repairs

Phase 4 — parity restoration

Deferred

⚠️ Tests deferred — this PR is implementation + compile-verification only

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jul 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Sequence Diagram(s)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jul 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jul 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Jul 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jul 4, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jul 4, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jul 4, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jul 4, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jul 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

senamakel commented Jul 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jul 4, 2026 •

edited

Loading