TinyAgents migration parity & bug fixes (#4451–#4469)#4495
Conversation
… tool errors (tinyhumansai#4451) The tinyagents harness runs a fatal JSON-schema gate (`ToolSchema::validate_call`) on every tool call between `before_tool` and the tool-wrap onion; any missing-required/wrong-type/bad-enum call returns `TinyAgentsError::Validation`, which propagates out of `run_loop` and aborts the whole turn with `chat_error`. The legacy engine had no such gate — bad args came back as recoverable tool results the model self-corrected on. Fix entirely seam-side (vendor/tinyagents is upstream/read-only): - Add `SchemaGuardMiddleware` (in src/openhuman/tinyagents/middleware.rs) implementing both `Middleware` (before_tool) and `ToolMiddleware` (wrap_tool). before_tool re-runs the same validation via the identical `spec_to_schema` conversion the runner uses; on failure it records a descriptive `invalid arguments for <tool>: <detail>. Expected schema: ...` message keyed by call id and rewrites the args to a schema-satisfying stub so the crate's fatal gate passes. wrap_tool then short-circuits the flagged call with a synthetic failed `ToolResult` before the stub reaches the tool, so the loop continues and the model self-corrects. Installed as the outermost tool-wrap middleware. - Fix `ArgRecoveryMiddleware` to stop coercing non-object args to `{}` when the schema has required fields: it now decodes JSON-encoded-string args (stripping markdown fences), coerces to `{}` only for permissive schemas, and otherwise leaves the args for the schema-guard tool-error path (the old behaviour guaranteed a fatal "<field> is required" abort). - Update the middleware-inventory assertions (lifecycle 14->15, tool 2->3). Tests deferred: tests/agent_harness_e2e.rs needs chat + subagent regressions where a scripted provider emits one malformed call then a corrected one, asserting the turn continues with no chat_error. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
Sub-agent tool exposure had inverted from fail-closed to fail-open after the TinyAgents migration: the registration predicate `allowed.is_empty() || allowed.contains(name)` treated an EMPTY allowlist as "register every tool", so a tool-less sub-agent (`ToolScope::Named([])`, a zero-match `skill_filter`, or a `named` list that resolved to nothing) silently inherited the parent's full tool surface — shell, file-write, and spawn/delegate — a prompt-injection privilege-escalation path. Change the plumbed allowlist type to `Option<HashSet<String>>` through the shared seam (`run_turn_via_tinyagents_shared` -> `assemble_turn_harness`): `None` = no filter (register all visible tools); `Some(set)` = register exactly those tools, so `Some(empty)` is a genuine deny-all. The sub-agent path now passes `Some(allowed_names)` (always a concrete, resolved set), so an empty resolution denies all instead of inheriting everything. The chat/channel paths keep their historical "empty = all" convention by mapping an empty/absent set to `None`. A warn is logged when the allowlist resolves empty so misconfigured agents are diagnosable. Also re-assert the "sub-agents never spawn" invariant at registration time: spawn/delegate/worker-thread tool names are stripped from any sub-agent run regardless of allowlist contents (defense-in-depth beyond the caller-side `allowed_indices` strip). The tool-exposure shadow middleware is updated to the same fail-closed `Option` semantics so its divergence reference matches what is actually registered. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
…uts (tinyhumansai#4453) The legacy engine scrubbed every tool output before it entered model context; the tinyagents path dropped that call site and scrub_credentials had zero production callers. Secrets in tool output reached model context, on-disk transcripts, worker-thread mirrors, and the tool-outcome sink. Re-introduce scrubbing as CredentialScrubMiddleware, a wrap_tool middleware registered as the innermost tool wrap on the shared assemble_turn_harness seam, so it scrubs the RAW tool result before the after_tool chain (summarization/caps), the transcript push, and the tool-outcome capture sink — covering parent, subagent, persisted transcript, and ToolCallOutcome by construction. Extend the scrub patterns to catch bare AKIA/ASIA AWS keys, sk- OpenAI keys, and space-separated Bearer tokens; also scrub JSON raw payloads recursively. Update the adapter-inventory assertion (3 -> 4). Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
…age (tinyhumansai#4454) The trace exporter attached the user prompt and model reply to the turn span unconditionally, so spans_to_ndjson/export_spans serialized plaintext to the NDJSON file (or printed it at info level to the app log) even with the default capture_content=false. Gate content at the span-STORAGE level instead: the SpanCollector now carries a capture_content flag and drops TurnContent unless opted in, so no exporter — NDJSON, app log, or Langfuse push — can ever serialize prompt/reply text. The web progress bridge wires the flag from config; the TurnContent emit in session/turn/core.rs also respects the gate as defense in depth; and export_spans no longer logs span content at info (metadata at info, NDJSON body at debug only). Separately fix Langfuse usageDetails double-counting: input_tokens is inclusive of cached tokens (cost.rs treats cached as a subset of input), but Langfuse sums usageDetails components as disjoint buckets. Emit input = input_tokens - cached_input_tokens so non_cached_input + cache_read + output reconcile to the explicit total. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
…rs (tinyhumansai#4455) Mid-turn steer/collect messages are injected as `Message::user(...)` and appended to the harness working transcript. Post-run persistence sliced the returned conversation with the "suffix after the last user message" convention (`messages_since_last_user`), so an injected steer MOVED that boundary — every pre-steer assistant/tool round plus the steer text itself vanished from persisted history, the next-turn KV-cache prefix, and subagent checkpoints. Replace the last-user-suffix convention on the persistence path with an explicit boundary: capture `base_len = input.len()` (the request transcript length) BEFORE `run_loop` consumes it, then persist `run.messages[base_len..]` via the new `convert::messages_since_request`. The crate returns the full transcript in `run.messages` (the loop seeds `messages = input` and only appends; the compression/trim middleware rewrites only the per-call `request.messages.clone()`), so slicing at `base_len` yields the complete post-request transcript — pre-steer rounds, the framed steer messages, and post-steer rounds, in execution order. Applied to both the thin `run_turn_via_tinyagents` and the shared `run_turn_via_tinyagents_shared` used by the parent (`session/turn/core.rs`) and subagent (`subagent_runner/ops/graph.rs`) paths, so checkpoints, cap digests, and worker mirrors see the full transcript. Steer framing (`[User steering message]:` / `[Additional context from user]:`) is preserved automatically since the injected user messages now fall inside the persisted slice. `messages_since_last_user` is retained (test-covered) but gated `#[cfg_attr(not(test), allow(dead_code))]` as it no longer has a production caller. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
…inyhumansai#4456) run_turn_via_tinyagents_shared spawned a 50ms steering-forwarder poll loop whose cleanup (abort + steering-registry deregister) only ran on normal return. Cancellation here is drop-based (web select! cancel token; sub-agent AbortHandle), so a cancelled turn detached the task to loop forever, pinning the session-owned Arc<RunQueue> + SteeringHandle and racing the next turn's forwarder into a dead handle. Aborted sub-agent registry entries were also never removed. Wrap the forwarder in an abort-on-drop RAII guard (SteeringForwarderGuard) held across the drive future: its Drop aborts the poll task, deregisters the sub-agent steering handle, and drains residual delivered-but-unapplied steers back into the session RunQueue so a late steer becomes the next turn's input instead of vanishing. Cleanup now runs identically on normal return, error, and drop-cancellation. A process-wide ACTIVE_FORWARDERS counter (active_steering_forwarders()) lets tests assert zero live tasks after a cancel. Re-emit delivery/requeue observability via new RunQueueMessageDelivered + RunQueueSteerRequeued DomainEvents. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
…le error_slot (tinyhumansai#4457) Turn-finalization audit fixes for the TinyAgents migration: A. EmptyProviderResponse no longer leaves a blank assistant row. On the tool_calls==0 empty-completion path, turn/core.rs returned Err with the empty Chat(assistant("")) still folded into history (via history.extend), so the next request carried an empty-content assistant message and strict providers (Anthropic) 400'd the whole thread. Pop the trailing empty assistant row before returning, matching the tinyhumansai#4093 branch. B. Stale provider error no longer masks the real run failure. ProviderModel now clears error_slot on every successful (incl. fallback-recovered) model call (buffered + streaming), and the runner maps the model-call cap / spawn-depth failures BEFORE consulting error_slot, so a run that failed over successfully then failed for an unrelated reason surfaces the real classification instead of a leftover provider error. C. Exactly one TurnCompleted per turn, after the wrap-up streams. The seam emitted TurnCompleted per parent turn AND the chat session emitted it again after summarize_turn_wrapup — the web bridge recorded two ledger events + two Completed upserts and turn_active=false landed before the cap/tinyhumansai#4093 checkpoint finished streaming. New defer_turn_completed_to_caller seam param suppresses the seam emit on the chat path (core.rs is the single post-wrap-up emitter); channel/CLI keeps the seam emit. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
…rovider calls (tinyhumansai#4460) ProviderModel::stream ran every streamed provider call in a detached tokio::spawn producer, which lost the caller's task-locals and left the call running after cancellation: - thread_id task-local was None inside the spawn, so the managed backend's thread_id extension was omitted on every streamed request. - the resolved-route audit slot was out of scope, so record_resolved_provider_route no-op'd and channel audit fell back to the requested route. - a hard AbortHandle cancel dropped only the consumer stream; the spawned provider.chat ran to completion and was still billed. Fix, entirely in the openhuman seam: - Capture thread_id (current_thread_id) and the resolved-route audit slot (new current_route_slot) on the caller's task before spawning, and re-establish them inside via with_thread_id / with_route_slot so the detached provider.chat sees the same ambient context. - Expose RouteSlot + current_route_slot + with_route_slot from resolved_route (explicit out-slot instead of a lost task-local). - Add AbortOnDrop guard (tinyagents/abort_guard.rs) holding the producer JoinHandle and move it into the consumer stream state, so dropping the stream/turn future aborts the in-flight provider call. Debug-log aborts. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
…rites (tinyhumansai#4458) The memory read->dedupe->write->update-index protocol can only close its write cycle via a successful `update_memory_md` call, and every durable write appends an instruction to call it — but `UpdateMemoryMdTool` was never registered in `all_tools_with_runtime`; it was only re-exported. The archivist's `[tools] named` allowlist selects it, but subagents only filter the parent set, so it silently resolved away, producing a permanent unsatisfiable "call update_memory_md" nag loop (unknown-tool error → the tracker never observes IndexUpdate → escalating drift warnings + an after_agent stale-index warning every run). Changes: - Register `UpdateMemoryMdTool` in `all_tools_with_runtime`, always-on like the other memory tools; per-agent visibility stays governed by each agent's `named` allowlist. Targets the workspace MEMORY.md/SKILL.md. - Make writes atomic + serialized: temp-file + atomic rename, guarded by a process-global per-workspace async lock keyed on the canonicalized workspace path. Concurrent index updates now queue instead of racing, and a killed process can never leave a truncated MEMORY.md. (Becomes live the moment registration is fixed.) - Classify `remember_preference`/`save_preference` with an explicit, documented `MemoryOp::Other` arm: they persist via `Memory::store` but into dedicated preference namespaces surfaced by prompt injection / per-query recall, not the MEMORY.md wiki — classifying them as writes would resurrect the same nag loop. Deferred (final test pass): registry-inventory assertion mirroring the adapter-inventory test, plus concurrency/atomicity tests. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
… failure in snapshot (tinyhumansai#4459) Repairs tinyhumansai#4413 tool-failure classification. A/B/E: the classifier now short-circuits POLICY_BLOCKED_MARKER and POLICY_DENIED_MARKER *before* the `timed out` sniff, so a user deny → `Denied` and an approval-TTL expiry → `ApprovalExpired` (both `FailureCategory::UserDeclined`, non-retryable, honest copy) instead of a recoverable Unknown "try again" or a Timeout "will retry". Middleware now sniffs both `error` and `content`, and the dead `"policy denied"` needle is removed. C: `ToolTimelineEntry` gains `failure: Option<PersistedToolFailure>` (camelCase, matching the TS `PersistedToolFailure` contract) threaded through `mirror.rs`, so a failed tool's explanation survives thread switch / restart. D: `SubagentToolCallCompleted` gains a `failure` field populated from the already-computed classification (observability.rs) and forwarded on the web event + persisted `SubagentToolCall`, so failed sub-agent rows carry an explanation. Frontend: TS types (`PersistedSubagentToolCall.failure`, `SubagentToolCallEntry.failure`, slice mapping), the localized-failure-class set gains `denied`/`approvalExpired`, and i18n keys land in en.ts + all locales. Tests deferred to the final pass. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
…yhumansai#4461) The crate ContextCompressionMiddleware installed in assemble_turn_harness regressed compaction vs the legacy engine in two ways: 1. A summarizer failure propagated (before_model does summarize(..).await?), mapping any provider hiccup to TinyAgentsError::Model and aborting the whole turn — on exactly the longest, most valuable threads. 2. Once over the 90% threshold, every model call re-ran a full-transcript summarizer LLM call: the crate rewrites only the per-call request clone, so the loop's working transcript never shrinks and re-summarizes each iteration. Wrap the LLM-backed ProviderModelSummarizer in a per-turn FaultTolerantCachingSummarizer adapter (seam-side, no crate change): - On Err: warn, trip a per-turn circuit breaker, and return a deterministic (LLM-free) front-drop trim of the input instead of propagating. The turn continues; later compactions skip the known-bad LLM. Restores the legacy warn + circuit-breaker + deterministic-trim behavior. - Single-slot content-hash cache (keyed by message count + slice hash) makes a repeat identical input slice free — retries / re-issued requests no longer re-dispatch the summarizer LLM until the transcript grows again. - Fallback produces a proper SummaryRecord with provenance, so the existing compaction-provenance drain surfaces it (observability preserved). Compaction failure can no longer fail a run; long over-threshold turns stop paying a fresh summarizer LLM call for identical input. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
…eviction (tinyhumansai#4462) Replace the crate `MessageTrimMiddleware` with a seam-owned `ImageAwareMessageTrimMiddleware` that restores the deleted `harness/token_budget.rs` semantics regressed by the TinyAgents migration: 1. Image marker cost: inline `[IMAGE:…]` markers and native `ContentBlock::Image` blocks are each charged a flat 1200 tokens instead of pricing the base64 payload at chars/4 (~2M tokens), so a single large image no longer reads as massively over budget and evicts the whole transcript. 2. Trim reserve: restore the legacy proportional clamp `clamp(window/10, >=512, <=max(8192, window/4))` — an 8k local model's input budget is ~7373 again, not the fixed `window - 16384` floored at 1024. 3. System survival + order: system messages are never dropped and never reordered to the front; only non-system history is evictable (oldest-first), leading orphaned tool results are snapped past. 4. Observability: any eviction emits a grep-able `[tinyagents::mw] message_trim` warn carrying messages-dropped and tokens/messages before-and-after. Fixed entirely in the openhuman seam; vendor/tinyagents untouched. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
…om, terminal-inference halt, autonomous iter-cap lift (tinyhumansai#4463) Restores loop-guard behaviors dropped with the legacy `tool_loop.rs` in the TinyAgents migration; the crate `no_progress` ladder tracks *failures* only and does not replace them. All fixes are seam-side (`src/openhuman/**`); vendored crate untouched. - RepeatProgressMiddleware (new, `after_model` + batch-aware `after_tool`): halts on 3× identical *successful* `(tool, args)` batches (tinyhumansai#4088) and 4× identical outputs (tinyhumansai#4095); `wait_subagent` polling exemption ported. Shares the halt-summary slot + steering handle with the failure breaker. - RepeatedToolFailureMiddleware: differentiated headroom restored — recoverable classes (timeout/transient markers + tinyhumansai#4445 classifier: Timeout/ ServiceUnavailable/ModelConnection) get 8 identical / 12 varied instead of the crate's fixed 3/6; POLICY_DENIED_MARKER rejoins POLICY_BLOCKED_MARKER for the 2-repeat hard-reject fast-trip (tinyhumansai#4463 part 6). - Terminal delegated-inference fast-halt (tinyhumansai#3104): budget-exhausted / provider-config-rejection delegation failures halt on first occurrence, gated on the delegated-inference envelope so recoverable tool stderr isn't caught. - Autonomous iter-cap lift (part 7): `autonomous_iter_cap()` was a dead knob — wired back in via `subagent_iter_cap_with_autonomous_lift` at the subagent `effective_max_iterations()` sites, so task-dispatcher / skill subagents run up to TASK_RUN_MAX_ITERATIONS again instead of the normal per-agent cap. - Updated the adapter-inventory middleware-count assertions (15→16, 11→12). Tests deferred to the final parity test pass. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
…apture sees capped (tinyhumansai#4464) The tinyagents crate runs after_tool hooks in REVERSE registration order (MiddlewareStack::run_after_tool iterates .iter().rev()). Two seam registrations assumed the opposite: (A) TurnContextMiddleware::install pushed HandoffMiddleware before ToolOutputMiddleware, so the 16 KiB byte cap truncated FIRST and a capped ~4k-token payload could never cross the 50k-token handoff threshold — the ResultHandoffCache/extract_from_result drill-in was unreachable. Flip the push order (tool-output budget first, handoff last) so the effective after_tool chain is handoff(raw) -> summarizer/tokenjuice/caps. (B) ToolOutcomeCaptureMiddleware was pushed AFTER context_mw.install, so its after_tool ran BEFORE the caps and the sink held full raw output per call for the whole turn (multi-MB) while failure classification / output_summary saw pre-cap content. Move its push BEFORE context_mw.install so it runs AFTER the caps and records the final capped content. Comments now state the crate's reverse-order rule explicitly. Keeps tinyhumansai#4453 credential-scrub ordering consistent. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
…tinyhumansai#4465) The tinyagents migration (model.rs -> agent/harness/parse.rs) kept the XML/JSON/markdown/GLM grammars but dropped the legacy **P-Format** positional grammar (`<tool_call>name[arg1|arg2]</tool_call>`) — even though `PFormat` is the default `ToolCallFormat` and ~10 builtin agent prompts still teach `name[a|b]`. A model that followed its own instructions emitted calls the parser logged as "malformed <tool_call> JSON" and silently dropped, so the turn continued as if no tool ran. Restore parity (option (a)) entirely in the openhuman seam: - `parse::parse_tool_calls_with_pformat(text, &PFormatRegistry)` walks the `<tool_call>`-family tags and, per tag, prefers the registry-driven positional parse (existing `agent::pformat::parse_call`) and falls back to the JSON entry the canonical parser produced — the exact per-tag selection the legacy `PFormatToolDispatcher` did. Empty registry short-circuits to `parse_tool_calls`, so non-PFormat paths are behaviour-neutral. `ParsedToolCall` gains `Clone` for the fallback copy. - `tinyagents::model` builds a `PFormatRegistry` from the advertised `request.tools` schemas and routes the text-mode fallback through the new parser in both the buffered (`invoke`) and streaming (`stream`) paths. Verbose `[agent_parse]` debug logging on the recovery path (tool name only — never the argument body, which may carry user data). vendor/tinyagents untouched (upstream/read-only). Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
…yhumansai#4466) The TinyAgents-migrated sub-agent path lost several behaviours the legacy per-iteration observer provided. Fix them entirely in the openhuman seam: 1. Failed runs persisted nothing. The graph `?`-returned on a harness error before `persist_subagent_transcript` / `mirror_worker_thread`, so a crash mid-run left no `session_raw` transcript (breaking `transcript_ingest`) and an empty worker thread. Add a `TranscriptSnapshotMiddleware` that mirrors each `before_model` transcript into an openhuman-owned buffer; on the error path persist those completed rounds + mirror them to the worker thread with a failure marker, then return the error. 2. Worker-thread mirror metadata was reduced to `{scope, agent_id, task_id}`. Restore the legacy `iteration` / `final` / `tool_calls` / `tool_call_id` / `tool_name` / `mode` fields. 3. Lossy streaming: the progress bridge used `try_send` and dropped deltas on a full channel. Keep the synchronous fast path but, on `Full`, defer to an awaited `send()` on a spawned task (backpressure) instead of silently dropping — the `EventListener` callback is sync so it cannot `.await` inline. 4. Sub-agent TokenJuice compaction was silently off (`TurnContextMiddleware:: defaults()` = compression Off). Build the sub-agent context middleware from the live `[context]` config + `definition.effective_tokenjuice_compression()` (compaction, byte budget, microcompact, autocompact opt-out), same as chat. 5. Breaker halt reported `Completed`: circuit-breaker-halted children returned `hit_cap=false`, so parents treated a halted child as finished. Carry a `breaker_halt` reason on the turn outcome and map it to `Incomplete`. cargo check --lib: clean. Tests deferred to the final test pass. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
…-run usage, tool timing + DomainEvents (tinyhumansai#4467) Restore cost-accounting and telemetry parity lost on the tinyagents path: - Charged USD (item 1): thread the provider `UsageInfo` (backend-charged USD + context window + cache-creation/reasoning tokens the crate `Usage` drops) from the model adapter to the event bridge via a shared FIFO carry; the bridge now prefers the provider's charged amount over the catalogue estimate (charged > estimate precedence) and records a real context window. - Cap-summary usage (item 2): fold all four token fields, price the call, and feed the global cost tracker directly for the sub-agent cap-hit checkpoint call (it bypasses the harness bridge, so its spend was previously lost). - Failed unobserved runs (item 3): attach the event bridge for every turn — including unobserved background/cron turns — so per-call usage reaches the cost tracker during the run and a later Err return no longer drops spend. - Tool timing (item 4): measure real `elapsed_ms` in `execute_openhuman_tool` and carry `elapsed_ms`/`output_chars` through the outcome side-channel so `ToolCallCompleted` shows real duration + output size instead of 0/0. - SSE per-tool events (item 5): re-publish `DomainEvent::ToolExecutionStarted` / `ToolExecutionCompleted` around session tool execution. - Child arg deltas (item 6): drop child-scoped `ToolCallArgsDelta` so a child run's argument composition no longer renders as the parent's own activity. - Unknown tools (item 7): default a missing tool outcome to `success: false` (hallucinated/unknown tool) in the post-turn records and derive the cap digest's per-tool `[ok|failed]` status from the captured outcomes. Entirely in the openhuman seam; vendor/tinyagents untouched. cargo check --lib clean. Tests deferred to the final parity test pass. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
…nyhumansai#4469) Batchable low-severity parity items from the TinyAgents-migration audit: - item 1: correct the GoalBudgetStopHook "hard-stops the moment" docs (runtime.rs, turn/core.rs) to the actual graceful-pause semantics — the stop vote is drained at the next iteration boundary, so the current tool round + one wrap-up call still run (bounded overshoot). - item 2: forward is_local_provider / is_local_provider_for_model / loaded_context_window / prompt_cache_capabilities through TextModeProvider so a local provider behind text mode keeps its n_keep>=n_ctx guard and KV-cache pricing. - item 3: recover poisoned mutexes via PoisonError::into_inner on the error_slot recovery reads (tinyagents/mod.rs) and the EarlyExitHook slots (tinyagents/tools.rs) instead of panicking. - item 5: cap-summary output cap now honours the sub-agent definition's own max_output_tokens budget instead of the global AGENT_TURN_MAX_OUTPUT_TOKENS floor. - item 6: enforce a minimum envelope allowance in aggregate tool-result spill so a saturated allowed_len can't blank the [tool_result_preview] envelope and strip its artifact_path pointer. - item 8: remove the dead rolling_segment_recap method (zero callers since the migration) + refresh its module/helper docs. - item 9: fix the misleading "falling back to inline truncation" log — the envelope is kept regardless for its artifact_path pointer. - item 10: drop stale model_vision_context doc links (turn_attachments_context.rs, tokenjuice/savings.rs). - item 11: remove deleted harness/interrupt.rs from agent/README.md. - item 12: payload-summarizer threshold doc default 500000 -> 4000. - item 13: correct the TraceSpan doc — raw NDJSON is an OTel-style span dump, not directly Langfuse-ingestible (that is spans_to_langfuse_batch). Items 4 (max_iterations==0 -> 10) and 7 (LazyToolkitResolver / mid-turn resync) are already documented / tracked in-code; no change needed. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
|
Warning Review limit reachedYou’ve reached a temporary PR review limit under our Fair Usage Limits Policy. Next review available in: 23 minutes Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available. How can I continue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews. How do review limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please refer docs for additional details. Review details⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughThis PR adds Denied/ApprovalExpired tool-failure handling across classification, persistence, progress propagation, and UI/i18n; restores TinyAgents steering, middleware, parsing, usage, and subagent execution paths; gates stored turn content and corrects Langfuse usage accounting; and adds atomic memory writes plus artifact preview budgeting fixes. ChangesTool Failure Classification and Persistence
Estimated code review effort: 5 (Critical) | ~120 minutes TinyAgents Harness Restoration
Estimated code review effort: 5 (Critical) | ~150 minutes Sequence Diagram(s)sequenceDiagram
participant ToolStatusOps
participant TurnStateMirror
participant ProgressBridge
participant ProcessingTranscriptView
ToolStatusOps->>TurnStateMirror: classify and persist failure
TurnStateMirror->>ProgressBridge: forward failure payload
ProgressBridge->>ProcessingTranscriptView: render localized failure copy
Observability Content-Capture Gating and Cost Accounting
Estimated code review effort: 3 (Moderate) | ~25 minutes Memory Tool Atomicity and Supporting Fixes
Estimated code review effort: 3 (Moderate) | ~30 minutes Possibly related issues
Possibly related PRs
Suggested labels: Suggested reviewers: Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 294a364372
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
# Conflicts: # src/openhuman/tinyagents/tests.rs
…er upstream merge Post-merge integration fix: tinyhumansai#4459 added a `failure` field to AgentProgress::SubagentToolCallCompleted; the merged unknown-tool recovery path (observability.rs) needed to populate it. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 277f6f6101
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| handle.spawn(async move { | ||
| let _ = tx.send(progress).await; |
There was a problem hiding this comment.
Preserve progress ordering under backpressure
When the progress channel is full, each event is handed to its own spawned task. Those tasks are scheduled independently, so a later completion can be enqueued before the earlier start event; TurnStateMirror only updates an existing row for ToolCallCompleted by call_id, so an out-of-order completion is dropped and the UI/persisted turn can leave that tool row stuck in running. Use a single ordered drain/worker or otherwise serialize the fallback sends instead of spawning one task per event.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/openhuman/channels/providers/web/progress_bridge.rs (1)
416-426: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winAlign the live
failurepayload with the persisted shapeClassifiedFailureserializes as snake_case here (and in the subagent completion path), butPersistedToolFailureis camelCase (causePlain/nextAction). The same failure data ends up under different field names across live vs. reloaded turns; route the live payload through the same camelCase type or add a matching serde rename onClassifiedFailure.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/channels/providers/web/progress_bridge.rs` around lines 416 - 426, The live ToolCallCompleted failure payload is being serialized from ClassifiedFailure with snake_case field names, while the persisted/reloaded path uses PersistedToolFailure with camelCase names. Update the failure handling in progress_bridge.rs so the UI and ledger use the same shape as the persisted data, either by routing the live serialization through PersistedToolFailure or by adding matching serde rename annotations on ClassifiedFailure; make the same adjustment in the subagent completion path if it uses the same failure payload.
🧹 Nitpick comments (2)
src/openhuman/agent/harness/archivist/recap.rs (1)
71-79: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueStale doc references removed "rolling recap" caller pattern.
The unchanged doc block above still describes callers driving "the live prompt (rolling recap → compaction)" as if
rolling_segment_recapstill exists, but this PR removes that entry point entirely. Worth a follow-up cleanup to avoid confusing future readers.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/agent/harness/archivist/recap.rs` around lines 71 - 79, The doc comment on the recap helper still references the removed live-prompt “rolling recap → compaction” caller pattern, which is now stale. Update the documentation on the recap function in recap.rs to remove that obsolete wording and describe only the remaining finalize/fallback behavior, so the comments match the current API and no longer mention the deleted rolling_segment_recap path.src/bin/harness_subagent_audit.rs (1)
579-610: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueConsider logging the new
failurefield for audit visibility.Other branches in this handler log diagnostic fields (tool_name, success, elapsed_ms, etc.), but the newly added
failurefield is discarded here. Since this is an audit/diagnostics binary, surfacingfailure.class/cause_plainwhensuccessis false could aid debugging subagent tool failures, consistent with the#4459propagation work highlighted elsewhere in this PR.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/bin/harness_subagent_audit.rs` around lines 579 - 610, The SubagentToolCallCompleted branch in harness_subagent_audit.rs is dropping the newly added failure diagnostics, so update the handler to log and record the failure field alongside tool_name, success, and elapsed_ms. Use the AgentProgress::SubagentToolCallCompleted match arm and the SubagentToolCompletedEvent push to surface failure.class and cause_plain when success is false, keeping the audit output consistent with the rest of the diagnostics in this handler.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/agent/harness/archivist/recap.rs`:
- Around line 1-6: The remaining test target still calls the removed
rolling_segment_recap entry point, so update those call sites before or
alongside deleting it. Find the ArchivistHook-related test usage in
tests/agent_archivist_debug_round21_raw_coverage_e2e.rs and replace
hook.rolling_segment_recap(...) and enabled.rolling_segment_recap(...) with the
current recap path used by ArchivistHook, keeping the test aligned with the
shared on_segment_closed-based summarization flow.
In `@src/openhuman/agent/harness/parse.rs`:
- Around line 758-806: The P-Format recovery loop in parse.rs only advances
through one json_calls entry per <tool_call> tag, so multi-call tags can lose
additional canonical calls and desynchronize later fallbacks. Update the combine
logic inside the while loop around find_first_tag, parse_call, and json_idx so
each recovered tag consumes and appends all canonical calls produced for that
tag rather than a single json_calls[json_idx] item. Keep the existing P-Format
recovery behavior, but ensure the JSON fallback stays in sync with the full set
of canonical calls for the current tag.
In `@src/openhuman/agent/harness/tool_result_artifacts/mod.rs`:
- Around line 381-388: The aggregate spill path in
tool_result_artifacts::mod::bound_text_to_budget is over-allocating because the
per-item floor is always forced to MIN_ENVELOPE_ALLOWANCE_BYTES, which can make
total exceed the overall budget. Update the logic around envelope_allowance so
the minimum is capped by the remaining aggregate budget for the spill batch, and
keep the existing preview-envelope protection only when it fits within the total
limit. Use the bound_text_to_budget call site and the envelope_allowance
calculation in the spill loop to ensure total stays within the requested budget
and the total <= budget assertion continues to hold.
In `@src/openhuman/tinyagents/observability.rs`:
- Around line 265-290: The overflow path in send() can reorder AgentProgress
events because each TrySendError::Full case spawns an independent task and
awaits tx.send(progress) out of sequence. Update the observability flow around
send() and spawn_progress_bridge so backpressure is handled by a single ordered
forwarder/queue rather than per-event spawned sends, preserving the original
progress order for text and tool updates.
In `@src/openhuman/tinyagents/steering_forwarder.rs`:
- Around line 225-298: Residual drained messages lose their original queue mode
because the requeue path in steering_forwarder.rs strips both STEER_PREFIX and
COLLECT_PREFIX but always pushes recovered texts back through RunQueue::push
with QueueMode::Steer. Update the drain/requeue flow around handle.drain(),
requeue_texts, and the detached rt.spawn block to preserve the matched prefix’s
mode and pass that mode through when constructing QueuedMessage, so collect
context is requeued as QueueMode::Collect instead of being mislabeled as steer.
In `@src/openhuman/tools/impl/filesystem/update_memory_md.rs`:
- Around line 55-85: The atomic_write helper in update_memory_md leaves behind a
temp file when tokio::fs::write fails before the rename step. Update
atomic_write to perform best-effort cleanup of tmp_path on the initial write
error as well as in the rename failure branch, so any partially written
.{file}.{pid}.{seq}.tmp is removed regardless of which filesystem operation
fails.
- Around line 15-46: The current workspace write guard in workspace_write_lock
and WORKSPACE_WRITE_LOCKS only prevents races within one process, so concurrent
cron runs can still overwrite MEMORY.md/SKILL.md during update_memory_md.
Replace or augment the in-process mutex with an inter-process locking mechanism
around the read-modify-write flow in update_memory_md so separate
tokio::process::Command invocations contend on the same workspace lock and
serialize writes across processes.
---
Outside diff comments:
In `@src/openhuman/channels/providers/web/progress_bridge.rs`:
- Around line 416-426: The live ToolCallCompleted failure payload is being
serialized from ClassifiedFailure with snake_case field names, while the
persisted/reloaded path uses PersistedToolFailure with camelCase names. Update
the failure handling in progress_bridge.rs so the UI and ledger use the same
shape as the persisted data, either by routing the live serialization through
PersistedToolFailure or by adding matching serde rename annotations on
ClassifiedFailure; make the same adjustment in the subagent completion path if
it uses the same failure payload.
---
Nitpick comments:
In `@src/bin/harness_subagent_audit.rs`:
- Around line 579-610: The SubagentToolCallCompleted branch in
harness_subagent_audit.rs is dropping the newly added failure diagnostics, so
update the handler to log and record the failure field alongside tool_name,
success, and elapsed_ms. Use the AgentProgress::SubagentToolCallCompleted match
arm and the SubagentToolCompletedEvent push to surface failure.class and
cause_plain when success is false, keeping the audit output consistent with the
rest of the diagnostics in this handler.
In `@src/openhuman/agent/harness/archivist/recap.rs`:
- Around line 71-79: The doc comment on the recap helper still references the
removed live-prompt “rolling recap → compaction” caller pattern, which is now
stale. Update the documentation on the recap function in recap.rs to remove that
obsolete wording and describe only the remaining finalize/fallback behavior, so
the comments match the current API and no longer mention the deleted
rolling_segment_recap path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 43fb4bad-d9d4-40d5-b9b7-843a1f216bb1
📒 Files selected for processing (63)
app/src/lib/i18n/ar.tsapp/src/lib/i18n/bn.tsapp/src/lib/i18n/de.tsapp/src/lib/i18n/en.tsapp/src/lib/i18n/es.tsapp/src/lib/i18n/fr.tsapp/src/lib/i18n/hi.tsapp/src/lib/i18n/id.tsapp/src/lib/i18n/it.tsapp/src/lib/i18n/ko.tsapp/src/lib/i18n/pl.tsapp/src/lib/i18n/pt.tsapp/src/lib/i18n/ru.tsapp/src/lib/i18n/zh-CN.tsapp/src/pages/conversations/components/ProcessingTranscriptView.tsxapp/src/store/chatRuntimeSlice.tsapp/src/types/turnState.tssrc/bin/harness_subagent_audit.rssrc/core/event_bus/events.rssrc/openhuman/agent/README.mdsrc/openhuman/agent/harness/agent_graph.rssrc/openhuman/agent/harness/archivist/recap.rssrc/openhuman/agent/harness/credentials.rssrc/openhuman/agent/harness/graph.rssrc/openhuman/agent/harness/memory_protocol.rssrc/openhuman/agent/harness/mod.rssrc/openhuman/agent/harness/parse.rssrc/openhuman/agent/harness/session/turn/core.rssrc/openhuman/agent/harness/session/turn/graph.rssrc/openhuman/agent/harness/subagent_runner/autonomous.rssrc/openhuman/agent/harness/subagent_runner/mod.rssrc/openhuman/agent/harness/subagent_runner/ops/graph.rssrc/openhuman/agent/harness/subagent_runner/ops/runner.rssrc/openhuman/agent/harness/tool_result_artifacts/mod.rssrc/openhuman/agent/harness/turn_attachments_context.rssrc/openhuman/agent/progress.rssrc/openhuman/agent/progress_tracing.rssrc/openhuman/agent/progress_tracing/langfuse.rssrc/openhuman/agent/progress_tracing/tests.rssrc/openhuman/channels/providers/web/progress_bridge.rssrc/openhuman/inference/provider/mod.rssrc/openhuman/inference/provider/resolved_route.rssrc/openhuman/thread_goals/runtime.rssrc/openhuman/threads/turn_state/mirror.rssrc/openhuman/threads/turn_state/mirror_tests.rssrc/openhuman/threads/turn_state/store_tests.rssrc/openhuman/threads/turn_state/types.rssrc/openhuman/tinyagents/abort_guard.rssrc/openhuman/tinyagents/convert.rssrc/openhuman/tinyagents/middleware.rssrc/openhuman/tinyagents/mod.rssrc/openhuman/tinyagents/model.rssrc/openhuman/tinyagents/observability.rssrc/openhuman/tinyagents/payload_summarizer.rssrc/openhuman/tinyagents/steering_forwarder.rssrc/openhuman/tinyagents/summarize.rssrc/openhuman/tinyagents/tests.rssrc/openhuman/tinyagents/tools.rssrc/openhuman/tokenjuice/savings.rssrc/openhuman/tool_status/ops.rssrc/openhuman/tool_status/types.rssrc/openhuman/tools/impl/filesystem/update_memory_md.rssrc/openhuman/tools/ops.rs
💤 Files with no reviewable changes (1)
- src/openhuman/agent/README.md
| fn send(&self, progress: AgentProgress) { | ||
| if let Some(tx) = &self.on_progress { | ||
| let _ = tx.try_send(progress); | ||
| use tokio::sync::mpsc::error::TrySendError; | ||
| let Some(tx) = &self.on_progress else { | ||
| return; | ||
| }; | ||
| match tx.try_send(progress) { | ||
| Ok(()) => {} | ||
| Err(TrySendError::Closed(_)) => {} | ||
| Err(TrySendError::Full(progress)) => { | ||
| // Backpressure, not capacity loss: hand the delta to an awaited | ||
| // `send()` on a spawned task rather than dropping it. Guard on a | ||
| // live runtime so a non-async construction path can't panic. | ||
| if let Ok(handle) = tokio::runtime::Handle::try_current() { | ||
| let tx = tx.clone(); | ||
| handle.spawn(async move { | ||
| let _ = tx.send(progress).await; | ||
| }); | ||
| } else { | ||
| tracing::debug!( | ||
| model = %self.model, | ||
| "[tinyagents] progress channel full and no runtime to defer send; dropping one delta" | ||
| ); | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy lift
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Map and inspect the relevant file.
ast-grep outline src/openhuman/tinyagents/observability.rs --view expanded || true
echo '--- send()/record_usage() around the cited lines ---'
sed -n '220,340p' src/openhuman/tinyagents/observability.rs
echo '--- find progress bridge / consumer ---'
rg -n "spawn_progress_bridge|on_progress|AgentProgress|TextDelta|ThinkingDelta|ToolCompleted|UsageRecorded" src/openhuman -SRepository: tinyhumansai/openhuman
Length of output: 50378
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo '--- spawn_progress_bridge definition ---'
rg -n -A80 -B20 "fn spawn_progress_bridge" src/openhuman/channels/providers/web.rs src/openhuman -S
echo '--- turn-state mirror ordering assumptions ---'
sed -n '1,120p' src/openhuman/threads/turn_state/mirror.rs
echo '--- relevant observe arms ---'
sed -n '340,460p' src/openhuman/threads/turn_state/mirror.rs
echo '--- send() tests / ordering-related tests in observability.rs ---'
sed -n '840,1025p' src/openhuman/tinyagents/observability.rsRepository: tinyhumansai/openhuman
Length of output: 10977
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo '--- progress bridge event loop / forwarding ---'
sed -n '197,320p' src/openhuman/channels/providers/web/progress_bridge.rs
echo '--- turn-state mirror relevant event handling ---'
sed -n '340,460p' src/openhuman/threads/turn_state/mirror.rs
echo '--- any tests asserting ordering on a sequential receiver ---'
rg -n -A8 -B8 "TextDelta|ThinkingDelta|ToolCallStarted|ToolCallCompleted|SubagentTextDelta|SubagentThinkingDelta" src/openhuman/tinyagents/observability.rs src/openhuman/threads/turn_state/mirror_tests.rs src/openhuman/channels/providers/web/progress_bridge.rs -SRepository: tinyhumansai/openhuman
Length of output: 50378
Deferred send() can reorder progress events under backpressure.
spawn_progress_bridge drains AgentProgress sequentially, so a full-channel event handed to a spawned send().await can land after later fast-path sends and scramble text/tool ordering. Route overflow delivery through a single ordered forwarder instead of spawning per-event tasks.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/openhuman/tinyagents/observability.rs` around lines 265 - 290, The
overflow path in send() can reorder AgentProgress events because each
TrySendError::Full case spawns an independent task and awaits tx.send(progress)
out of sequence. Update the observability flow around send() and
spawn_progress_bridge so backpressure is handled by a single ordered
forwarder/queue rather than per-event spawned sends, preserving the original
progress order for text and tool updates.
| /// Process-global registry of per-workspace write locks (#4458). | ||
| /// | ||
| /// `update_memory_md` performs a read-modify-write on `MEMORY.md`/`SKILL.md`. | ||
| /// The per-run `MemoryProtocolTracker` has zero cross-run awareness, so two | ||
| /// concurrent runs (parallel forks, cron) racing the same workspace file would | ||
| /// otherwise clobber each other's append. We serialize every write to a given | ||
| /// workspace directory through a shared async mutex, keyed by the canonicalized | ||
| /// (falling back to raw) workspace path, so concurrent index updates queue | ||
| /// instead of racing. Combined with the temp-file + atomic-rename write below, | ||
| /// a killed process can never leave a truncated file. | ||
| static WORKSPACE_WRITE_LOCKS: LazyLock<Mutex<HashMap<PathBuf, Arc<tokio::sync::Mutex<()>>>>> = | ||
| LazyLock::new(|| Mutex::new(HashMap::new())); | ||
|
|
||
| /// Monotonic counter for temp-file uniqueness within a process. | ||
| static TEMP_FILE_SEQ: AtomicU64 = AtomicU64::new(0); | ||
|
|
||
| /// Return (creating if needed) the shared async write lock for `workspace_dir`. | ||
| /// | ||
| /// The lock key is the canonicalized workspace path when it resolves (so two | ||
| /// spellings of the same directory share one lock), else the raw path. | ||
| fn workspace_write_lock(workspace_dir: &Path) -> Arc<tokio::sync::Mutex<()>> { | ||
| let key = workspace_dir | ||
| .canonicalize() | ||
| .unwrap_or_else(|_| workspace_dir.to_path_buf()); | ||
| let mut map = WORKSPACE_WRITE_LOCKS | ||
| .lock() | ||
| .unwrap_or_else(|poisoned| poisoned.into_inner()); | ||
| Arc::clone( | ||
| map.entry(key) | ||
| .or_insert_with(|| Arc::new(tokio::sync::Mutex::new(()))), | ||
| ) | ||
| } |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check whether cron/background/parallel-agent execution spawns a new OS
# process (Command::new / std::process) vs staying in-process (tokio::spawn).
rg -n "std::process::Command|tokio::process::Command" src/openhuman -g '*.rs' | rg -i "cron|subagent|parallel|daemon"
fd -a cron src/openhuman
rg -n "fn run\b" -A 8 src/openhuman/tools/impl/system/cron 2>/dev/nullRepository: tinyhumansai/openhuman
Length of output: 328
Use cross-process locking for workspace writes
WORKSPACE_WRITE_LOCKS only serializes writers inside one OS process. src/openhuman/cron/scheduler.rs launches work via tokio::process::Command, so separate cron invocations won't share this mutex and can still clobber the same MEMORY.md/SKILL.md during the read-modify-write cycle. If those paths can overlap, this needs an inter-process lock.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/openhuman/tools/impl/filesystem/update_memory_md.rs` around lines 15 -
46, The current workspace write guard in workspace_write_lock and
WORKSPACE_WRITE_LOCKS only prevents races within one process, so concurrent cron
runs can still overwrite MEMORY.md/SKILL.md during update_memory_md. Replace or
augment the in-process mutex with an inter-process locking mechanism around the
read-modify-write flow in update_memory_md so separate tokio::process::Command
invocations contend on the same workspace lock and serialize writes across
processes.
…ings Post-merge CI fix: the tinyhumansai#4459 denied/approvalExpired locale strings were not prettier-wrapped. No key/text changes, formatting only. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
# Conflicts: # src/openhuman/agent/progress_tracing.rs # src/openhuman/agent/progress_tracing/langfuse.rs # src/openhuman/channels/providers/web/progress_bridge.rs
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/openhuman/tinyagents/tests.rs (1)
512-536: 🎯 Functional Correctness | 🔵 TrivialUnverified middleware-count assertions — flagged by the author as pending.
The inline NOTE explicitly states these counts (
15,4, and the analogous13at Line 591) are derived by addition/inference and "are NOT compiled bycargo check --lib(this is a #[cfg(test)] block) and are pending the deferred test pass". Given the PR's stated scope ("Tests were not written or run in this pass"), these hardcoded numbers are a real risk of drifting from the actual runtime middleware inventory once the test suite is compiled/run.Since the PR explicitly earmarks a follow-up test pass, consider tracking this as a checklist item so the counts get validated rather than silently trusted.
Also applies to: 588-593
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/tinyagents/tests.rs` around lines 512 - 536, The middleware inventory assertions in the TinyAgents tests are still unverified and were explicitly marked as pending, so replace the hardcoded counts with values that are validated by the test suite or defer them behind a clear follow-up check. Update the assertions around assembled.harness.middleware(), mw.tool_middleware_len(), and the matching inventory check near the later block to reflect the actual compiled middleware stack, and keep the note synchronized with the verified source of truth.src/openhuman/agent/progress_tracing/langfuse.rs (1)
262-278: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick winAdd test coverage for the cached-token subtraction path.
The core behavior change here is "
input_tokensis INCLUSIVE of cached prompt tokens... so emit the NON-cached input (input - cached)", but none of the new tests setgen_ai.usage.cached_input_tokensto a nonzero value alongside a nonzeroinput_tokensto confirmusageDetails.inputis actually reduced. All existing tests usecached = 0, so thesaturating_subbranch is currently unexercised.✅ Suggested test addition
+ #[test] + fn usage_details_input_excludes_cached_tokens() { + let mut gen = span( + "trace-1", "gen-1", Some("root"), "llm.agentic-v1", + SpanKind::Generation, SpanStatus::Ok, 1_000, Some(1_500), + ); + gen.attributes.clear(); + gen.attributes.insert("gen_ai.usage.input_tokens".into(), json!(1_000)); + gen.attributes.insert("gen_ai.usage.output_tokens".into(), json!(200)); + gen.attributes.insert("gen_ai.usage.cached_input_tokens".into(), json!(300)); + let payload = spans_to_langfuse_batch(&[gen], false, "production"); + let usage = &payload["batch"][1]["body"]["usageDetails"]; + assert_eq!(usage["input"], 700, "non-cached input only"); + assert_eq!(usage["cache_read_input_tokens"], 300); + assert_eq!(usage["total"], 1_200, "total = raw input + output"); + }Also applies to: 712-781
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/agent/progress_tracing/langfuse.rs` around lines 262 - 278, Add test coverage in the Langfuse tracing tests for the cached-token subtraction path in langfuse:: usage conversion. Create a case that sets both gen_ai.usage.input_tokens and gen_ai.usage.cached_input_tokens to nonzero values, then assert usageDetails.input is reduced to input - cached while cache_read_input_tokens reflects the cached amount and total still reconciles correctly. Reuse the existing langfuse conversion helpers/tests so the new assertion directly exercises the non_cached_input branch and the saturating_sub logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/openhuman/agent/progress_tracing/langfuse.rs`:
- Around line 262-278: Add test coverage in the Langfuse tracing tests for the
cached-token subtraction path in langfuse:: usage conversion. Create a case that
sets both gen_ai.usage.input_tokens and gen_ai.usage.cached_input_tokens to
nonzero values, then assert usageDetails.input is reduced to input - cached
while cache_read_input_tokens reflects the cached amount and total still
reconciles correctly. Reuse the existing langfuse conversion helpers/tests so
the new assertion directly exercises the non_cached_input branch and the
saturating_sub logic.
In `@src/openhuman/tinyagents/tests.rs`:
- Around line 512-536: The middleware inventory assertions in the TinyAgents
tests are still unverified and were explicitly marked as pending, so replace the
hardcoded counts with values that are validated by the test suite or defer them
behind a clear follow-up check. Update the assertions around
assembled.harness.middleware(), mw.tool_middleware_len(), and the matching
inventory check near the later block to reflect the actual compiled middleware
stack, and keep the note synchronized with the verified source of truth.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f69142c2-dc79-467b-8c2b-4ea28f04f73a
📒 Files selected for processing (23)
app/src/lib/i18n/ar.tsapp/src/lib/i18n/bn.tsapp/src/lib/i18n/de.tsapp/src/lib/i18n/es.tsapp/src/lib/i18n/fr.tsapp/src/lib/i18n/hi.tsapp/src/lib/i18n/id.tsapp/src/lib/i18n/it.tsapp/src/lib/i18n/ko.tsapp/src/lib/i18n/pl.tsapp/src/lib/i18n/pt.tsapp/src/lib/i18n/ru.tsapp/src/lib/i18n/zh-CN.tssrc/bin/harness_subagent_audit.rssrc/openhuman/agent/harness/tool_result_artifacts/mod.rssrc/openhuman/agent/progress.rssrc/openhuman/agent/progress_tracing.rssrc/openhuman/agent/progress_tracing/langfuse.rssrc/openhuman/agent/progress_tracing/tests.rssrc/openhuman/channels/providers/web/progress_bridge.rssrc/openhuman/threads/turn_state/mirror.rssrc/openhuman/tinyagents/observability.rssrc/openhuman/tinyagents/tests.rs
✅ Files skipped from review due to trivial changes (10)
- app/src/lib/i18n/de.ts
- app/src/lib/i18n/ar.ts
- app/src/lib/i18n/ru.ts
- app/src/lib/i18n/zh-CN.ts
- app/src/lib/i18n/id.ts
- app/src/lib/i18n/pt.ts
- app/src/lib/i18n/es.ts
- src/bin/harness_subagent_audit.rs
- app/src/lib/i18n/fr.ts
- app/src/lib/i18n/pl.ts
🚧 Files skipped from review as they are similar to previous changes (11)
- app/src/lib/i18n/hi.ts
- app/src/lib/i18n/it.ts
- app/src/lib/i18n/bn.ts
- src/openhuman/agent/progress_tracing/tests.rs
- app/src/lib/i18n/ko.ts
- src/openhuman/channels/providers/web/progress_bridge.rs
- src/openhuman/agent/progress.rs
- src/openhuman/agent/harness/tool_result_artifacts/mod.rs
- src/openhuman/threads/turn_state/mirror.rs
- src/openhuman/tinyagents/observability.rs
- src/openhuman/agent/progress_tracing.rs
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 26dbf59924
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let mut provider_model = ProviderModel::new(provider, model, temperature) | ||
| .with_usage_carry(provider_usage_carry.clone()); |
There was a problem hiding this comment.
Share usage carry with fallback route models
When the primary route fails and RunPolicy.fallback selects one of the route models registered by routes::build_route_models, that ProviderModel is still constructed with its default private usage_carry (see routes.rs), while the bridge only drains the provider_usage_carry created here. Successful fallback calls therefore never provide their provider-reported charged USD/context/cache/reasoning breakdown to the bridge, so the turn falls back to catalog estimates despite the new carry path; thread this same carry into the fallback route models as well.
Useful? React with 👍 / 👎.
| let cache_creation_tokens = carried | ||
| .as_ref() | ||
| .map(|u| u.cache_creation_tokens) | ||
| .filter(|t| *t > 0) | ||
| .unwrap_or(usage.cache_creation_tokens); | ||
| let reasoning_tokens = carried | ||
| .as_ref() | ||
| .map(|u| u.reasoning_tokens) | ||
| .filter(|t| *t > 0) | ||
| .unwrap_or(usage.reasoning_tokens); |
There was a problem hiding this comment.
Use carried token breakdown in progress events
For providers where cache-creation or reasoning tokens only exist on the carried UsageInfo, these local values are recorded into UsageInfo but the ModelCallCompleted event below still sends usage.cache_creation_tokens and usage.reasoning_tokens. That leaves the per-call progress/Langfuse generation telemetry at zero for those fields even though the cost tracker got the right values; use the carried cache_creation_tokens/reasoning_tokens variables in the log and progress event too.
Useful? React with 👍 / 👎.
| // bounded size for the wire (#4007); `output_chars` + | ||
| // `elapsed_ms` still ride along in `subagent` below. | ||
| output: Some(cap_wire_output(output)), | ||
| failure: failure_json, |
There was a problem hiding this comment.
Forward subagent failure payload into live state
This socket event now includes the structured failure, but the frontend ChatSubagentToolResultEvent type still has no failure field and onSubagentToolResult only updates status/timing/result. When a sub-agent tool fails during a live turn, the emitted explanation is dropped before it reaches SubagentToolCallEntry, so the child row loses the new “why/next” copy until a snapshot reload; add the field to the event type and parse/store it in the live handler.
Useful? React with 👍 / 👎.
…ests The tinyhumansai#4459 failure field on AgentProgress::SubagentToolCallCompleted must be set in the two test constructions upstream tinyhumansai#4498's telemetry tests added (cargo check --lib does not compile #[cfg(test)], so this surfaced only in the coverage job's cargo test). Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
…p cleanup - Restore ArchivistHook::rolling_segment_recap (tinyhumansai#4469 removed it but an E2E test target still calls it; the removal was optional 'remove or re-wire'). - Preserve QueueMode when re-queueing residual steer/collect on forwarder drop so a delivered collect line isn't re-framed as user steering (tinyhumansai#4456). - Clean up the staged temp file if the initial write fails in update_memory_md (tinyhumansai#4458). Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
…yhumansai#4465) A <tool_call> tag that parses as P-Format has no json_calls entry (the JSON pass can't parse it), so advancing json_idx shifted every later JSON tag onto the wrong index and silently dropped a real JSON call. Both review bots flagged this. Leave json_idx untouched for P-Format tags. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4aa69372e4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| next_span_seq: 0, | ||
| id_prefix: uuid::Uuid::new_v4().simple().to_string(), | ||
| // Metadata-only by default; opt in via `with_content_capture`. | ||
| capture_content: false, |
There was a problem hiding this comment.
Wire the capture flag into SpanCollector
When observability.agent_tracing.capture_content is enabled, the progress bridge sets the flag on TraceContext and then constructs the collector with SpanCollector::new(trace_ctx), but this new storage gate is always initialized to false unless the separate with_content_capture() builder is called. In the live path that builder is never used, so AgentProgress::TurnContent emitted by the session turn is dropped at storage time and opt-in prompt/reply capture never reaches the trace/exporter; initialize this from ctx.capture_content or call the builder at the construction site.
Useful? React with 👍 / 👎.
| parsed_tool_calls = combined.len(), | ||
| "[agent_parse] P-Format-aware parse produced combined tool-call set" | ||
| ); | ||
| (narrative, combined) |
There was a problem hiding this comment.
Preserve canonical calls outside literal tool tags
When a P-Format response also contains a canonical call that is not a literal <tool_call> tag, such as a ```tool_call block or GLM-style line after a recovered name[...] tag, `parse_tool_calls(response)` can put that call in `json_calls`, but the loop above only walks `TOOL_CALL_OPEN_TAGS` and never consumes those non-tag calls. Because `combined` is nonempty, returning it here drops the remaining canonical calls and executes only the recovered P-Format call; append any unconsumed canonical calls (or merge by source span) before returning.
Useful? React with 👍 / 👎.
Implements the TinyAgents migration parity & bug-fix tracker (#4470) — the audit of the 1.3→1.5 migration. All 19 child issues addressed in one branch: 18 fixed (each seam-side, no
vendor/tinyagentschanges), 1 deferred (#4468, pure test-parity — see below).Built on current
main(post #4473 / #4480 / #4483). Each fix was re-checked against live code (some issues were partly reshaped by the migration waves) and stacked sequentially, each passingcargo check --libbefore landing; a final fullcargo check --libon the complete branch is green.Phase 1 — critical correctness & security
SchemaGuardMiddleware);ArgRecoveryMiddlewareno longer coerces to{}when required fields exist.Option<HashSet>:None=all,Some(empty)=deny-all); spawn/delegate tools stripped at registration.ToolCallOutcome); AWS/OpenAI/Bearer patterns added.capture_content; LangfuseusageDetailsmade disjoint (input − cached).Phase 2 — durable-state corruption & leaks
TurnCompleted; drop empty-assistant row onEmptyProviderResponse; clear staleerror_slot.Phase 3 — follow-up PR repairs
update_memory_md; atomic, serializedMEMORY.mdwrites.UserDeclined(no auto-retry); tool failure persisted in the turn snapshot.Phase 4 — parity restoration
after_toolmiddleware order (handoff sees raw; capture sees capped).Deferred
Per the working constraint, no tests were written/run in this pass; every fix is verified via
cargo check --lib(implementation compiles). A follow-up test pass must:src/openhuman/tinyagents/tests.rs(tool_middleware_len/ lifecycle counts): several fixes added middlewares and bumped these, butcargo check --libdoes not compile#[cfg(test)], so the cumulative counts are unverified and likely need adjustment once tests compile.Generated via multi-agent orchestration. Full per-issue rationale is in each commit body.
https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR
Summary by CodeRabbit
New Features
update_memory_md) by default so agents can persist memory changes via the standard path.Bug Fixes
Chores / Other