Skip to content

fix: keep binary tool I/O out of the LLM context with FileArtifact#6047

Open
mattatcha wants to merge 9 commits into
mainfrom
matcha/age-103-file-artifact-binary-bypass
Open

fix: keep binary tool I/O out of the LLM context with FileArtifact#6047
mattatcha wants to merge 9 commits into
mainfrom
matcha/age-103-file-artifact-binary-bypass

Conversation

@mattatcha
Copy link
Copy Markdown
Collaborator

@mattatcha mattatcha commented Jun 5, 2026

Summary

LLMs can't reproduce long opaque strings byte-perfect, so a binary file routed through a tool-call argument — e.g. a generated PPTX/PDF passed as base64 content to an upload connector — drifts a few characters, invalidates the base64, and corrupts the uploaded file.

This adds FileArtifact: a tool returns it instead of a base64 string, and the agent executor keeps the bytes out of the model context behind a short, namespaced crewai+file:// handle, expanding it back to exact base64 just before the consuming tool runs. The bytes never enter the model, so they can't be corrupted.

Key Changes

  • New crewai.tools.FileArtifact + an execution-scoped, TTL-backed store with namespaced handles (resolution only ever fires on tokens we mint, never on user data).
  • Honored on both native executors (the default AgentExecutor and the legacy CrewAgentExecutor) and the ReAct path: store-on-output, resolve-handle-on-input, including cache hits. Original args are never mutated, so events/cache/logs keep the short handle.
  • Per-run cleanup wired into crew kickoff; TTL prune as the backstop.
  • Tests: unit coverage + both-executor integration (SHA-256 round-trip of a 30 KB payload).

Usage

from crewai.tools import FileArtifact

# In a tool's _run, instead of returning base64:
return FileArtifact(data=pptx_bytes, filename="deck.pptx", mime_type="application/vnd...")

The consuming connector tool needs no change — the executor resolves the handle into its content argument generically.


Note

Medium Risk
Touches all tool execution paths and in-memory artifact lifecycle; incorrect handle resolution or scope cleanup could break producer→consumer tool chains or leak memory, but changes are additive and well-tested.

Overview
Adds FileArtifact so tools can pass binary data without routing base64 through the LLM. When a tool returns a FileArtifact, executors store bytes in a process-local, crew-scoped store and surface a short crewai+file:// handle to the model; resolve_artifact_handles expands those handles to base64 only at tool invoke time, while events, cache keys, and logged args keep the handle unchanged.

Wiring spans legacy CrewAgentExecutor, experimental AgentExecutor, ReAct (agent_utils), and ToolUsage (sync/async invoke + _format_result). Crew kickoff (kickoff / akickoff) calls clear_artifact_scope alongside existing file cleanup. FileArtifact is exported from crewai.tools. New test_file_artifact.py covers store/resolve/scoping/TTL and both native executors end-to-end.

Reviewed by Cursor Bugbot for commit 0fba816. Bugbot is set up for automated code reviews on this repo. Configure here.

Summary by CodeRabbit

  • New Features
    • Binary file artifact support for tools with stable placeholder handles and preserved metadata (filename, MIME, size).
    • Tools now resolve artifact handles in inputs and persist artifact outputs across native and experimental executions.
    • Per-execution artifact scoping with automatic cleanup when a run finishes.
    • FileArtifact is exposed in the public tools API.
  • Tests
    • Comprehensive tests for artifact storage, resolution, scoping, TTL pruning, and round‑tripping.

Binary file data (PPTX, PDF, images) returned by one tool and echoed by
the model as a base64 argument to another tool drifts by a few characters,
invalidating the base64 and corrupting the resulting file.

Tools can now return a FileArtifact instead of a base64 string. The agent
executor stores the bytes out-of-band (execution-scoped, TTL-backed) and
shows the model a short, namespaced `crewai+file://` handle, expanding it
back to exact base64 just before the consuming tool runs. Wired into the
native (default AgentExecutor + legacy) and ReAct execution paths with
per-run cleanup.
@linear
Copy link
Copy Markdown

linear Bot commented Jun 5, 2026

AGE-103

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 5, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds FileArtifact-based out-of-band binary passing: in-memory scoped store with TTL, model-facing placeholders, resolver/store helpers, wiring into native and structured tool execution (including cached results), crew-scoped cleanup, and tests validating end-to-end behavior.

Changes

Binary Artifact Out-of-Band Passing for Tools

Layer / File(s) Summary
Artifact Storage and Helper Functions
lib/crewai/src/crewai/tools/file_artifact.py, lib/crewai/src/crewai/tools/__init__.py
FileArtifact dataclass for bytes+metadata; process-local _ArtifactStore mints crewai+file://<uuid> handles, stores bytes with optional scope and TTL, and prunes expired entries. Public helpers: store_artifact(), resolve_artifact_handles(), store_if_artifact(), clear_artifact_scope(), and artifact_scope_id().
Native Tool Executor Artifact Integration
lib/crewai/src/crewai/agents/crew_agent_executor.py, lib/crewai/src/crewai/experimental/agent_executor.py, lib/crewai/src/crewai/utilities/agent_utils.py
Both legacy and experimental executors and utility helpers compute scope_id from crew/task/agent, resolve handles in arguments before calling native functions, and pass cached and fresh native tool outputs through store_if_artifact() to persist/represent artifacts.
Structured Tool Artifact Integration
lib/crewai/src/crewai/tools/tool_usage.py
ToolUsage resolves artifact handles in filtered tool arguments for both invoke/ainvoke paths and adds _artifact_scope_id so _format_result() calls store_if_artifact() with the proper scope.
Crew Execution Lifecycle Cleanup
lib/crewai/src/crewai/crew.py
Adds clear_artifact_scope(self.id) to kickoff() and akickoff() finally blocks after clear_files(self.id) to remove crew-scoped artifacts at execution end.
Artifact System Test Suite
lib/crewai/tests/tools/test_file_artifact.py
Comprehensive tests: base64 round-trip, metadata defaults and size, placeholder contents and escaping, scope-id preference/fallback, recursive resolve without mutation, conditional store behavior, scoped clearing, native executor producer/consumer wiring, TTL pruning, and test isolation fixture.

Sequence Diagram(s)

sequenceDiagram
  participant ProducerTool
  participant Executor
  participant ArtifactStore
  participant Model
  participant ConsumerTool
  ProducerTool->>Executor: return FileArtifact(bytes, filename, mime)
  Executor->>ArtifactStore: store_artifact(artifact, scope_id)
  ArtifactStore-->>Executor: placeholder (crewai+file://uuid + metadata)
  Executor->>Model: include placeholder string in model context
  Model-->>Executor: model-triggered consumer tool call with crewei+file://uuid
  Executor->>ArtifactStore: resolve_artifact_handles(arguments)
  ArtifactStore-->>Executor: arguments with base64 payload
  Executor->>ConsumerTool: invoke consumer tool with resolved bytes
  Note over Executor: At crew end
  Executor->>ArtifactStore: clear_artifact_scope(crew_id)
  ArtifactStore->>ArtifactStore: delete crew-scoped artifacts
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

size/M, enhancement

Suggested reviewers

  • greysonlalonde

Poem

🐇 I hide bytes in handles, neat and small,

crewi strings that hop from tool to call.
Producers store, consumers fetch with cheer,
Scopes swept clean when crew time disappears.
Hooray — bytes travel light and clear!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: introducing FileArtifact to keep binary tool I/O out of LLM context, which is the core objective of this PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch matcha/age-103-file-artifact-binary-bypass

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread lib/crewai/src/crewai/tools/file_artifact.py Outdated
Comment thread lib/crewai/src/crewai/agents/crew_agent_executor.py Outdated
The handle regex matches hex case-insensitively, but store keys are
lowercase uuid4 strings. Normalize the captured uuid to lowercase before
lookup so a handle echoed by the model with uppercase hex still resolves
to its bytes instead of leaking the raw token to the downstream tool.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/crewai/src/crewai/crew.py (1)

1-1: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Fix formatting issues before merge.

The pipeline is failing because the code needs to be formatted with ruff. Run the following command to fix:

uv run ruff format lib/
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai/src/crewai/crew.py` at line 1, The file
lib/crewai/src/crewai/crew.py (starts with "from __future__ import annotations")
has formatting issues flagged by ruff; run the formatter command (uv run ruff
format lib/) to auto-fix formatting across lib/, verify crew.py and any changed
files are properly formatted, then stage and commit the formatted changes before
merging.

Source: Pipeline failures

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/crewai/src/crewai/tools/file_artifact.py`:
- Around line 23-240: The file lib/crewai/src/crewai/tools/file_artifact.py
fails ruff formatting; run the project formatter (e.g., ruff format) on this
module and commit the resulting changes so CI passes. Focus on formatting the
whole file (classes FileArtifact, _ArtifactStore, functions store_artifact,
resolve_artifact_handles, store_if_artifact, clear_artifact_scope,
artifact_scope_id, and _human_size) and then re-run ruff --check to verify no
further edits are required.
- Around line 95-152: _Entry currently only records stored_at so TTL from the
most recent store() call is used for pruning and resolve() never enforces
expiry; add a ttl: int field to _Entry, set it in _ArtifactStore.store (store
the passed ttl per entry), update _prune_locked to compare each entry.stored_at
+ entry.ttl against the cutoff (or check per-entry expiry) rather than using the
incoming ttl, and make _ArtifactStore.resolve check the entry's ttl/stored_at
and return None (and optionally delete the entry) when it is expired; leave
clear_scope behavior unchanged.

In `@lib/crewai/src/crewai/tools/tool_usage.py`:
- Around line 331-337: The file has formatting issues causing ruff format
--check to fail; run ruff (or your project's formatter) to reformat this module,
focusing on the sections around resolve_artifact_handles, calling.arguments, and
tool.ainvoke usages (the artifact-handling blocks and the other indicated
regions) so the code meets ruff style rules and CI passes; after formatting,
re-run ruff format --check to verify the module is clean.

In `@lib/crewai/tests/tools/test_file_artifact.py`:
- Around line 217-223: Run the Ruff formatter over the repository (or at least
this file) to fix lint/formatting errors causing CI to fail; e.g., run `uv run
ruff format lib/crewai/tests/tools/test_file_artifact.py` (or `uv run ruff
format lib/`) and commit the changes so the block that constructs the
SimpleNamespace tool_call and the call to
executor._execute_single_native_tool_call is reformatted to satisfy Ruff. Ensure
the SimpleNamespace instantiation (tool_call, function, arguments) adheres to
the project's formatting rules after running Ruff.

---

Outside diff comments:
In `@lib/crewai/src/crewai/crew.py`:
- Line 1: The file lib/crewai/src/crewai/crew.py (starts with "from __future__
import annotations") has formatting issues flagged by ruff; run the formatter
command (uv run ruff format lib/) to auto-fix formatting across lib/, verify
crew.py and any changed files are properly formatted, then stage and commit the
formatted changes before merging.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: d08f95f3-8469-4030-845e-9570b02a20cb

📥 Commits

Reviewing files that changed from the base of the PR and between 906cd97 and f300c1f.

📒 Files selected for processing (7)
  • lib/crewai/src/crewai/agents/crew_agent_executor.py
  • lib/crewai/src/crewai/crew.py
  • lib/crewai/src/crewai/experimental/agent_executor.py
  • lib/crewai/src/crewai/tools/__init__.py
  • lib/crewai/src/crewai/tools/file_artifact.py
  • lib/crewai/src/crewai/tools/tool_usage.py
  • lib/crewai/tests/tools/test_file_artifact.py

Comment thread lib/crewai/src/crewai/tools/file_artifact.py Outdated
Comment thread lib/crewai/src/crewai/tools/file_artifact.py Outdated
Comment thread lib/crewai/src/crewai/tools/tool_usage.py Outdated
Comment on lines +217 to +223
tool_call = SimpleNamespace(
id="c",
function=SimpleNamespace(
name=func_name, arguments=args if isinstance(args, str) else json.dumps(args)
),
)
return executor._execute_single_native_tool_call(tool_call)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Run Ruff formatting to unblock CI.

lint-run is currently failing on formatting; this block (notably Line 220) is a likely reformat target. Please run uv run ruff format lib/ (or at least this file) before merge.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai/tests/tools/test_file_artifact.py` around lines 217 - 223, Run the
Ruff formatter over the repository (or at least this file) to fix
lint/formatting errors causing CI to fail; e.g., run `uv run ruff format
lib/crewai/tests/tools/test_file_artifact.py` (or `uv run ruff format lib/`) and
commit the changes so the block that constructs the SimpleNamespace tool_call
and the call to executor._execute_single_native_tool_call is reformatted to
satisfy Ruff. Ensure the SimpleNamespace instantiation (tool_call, function,
arguments) adheres to the project's formatting rules after running Ruff.

Source: Pipeline failures

- Store each artifact's own expiry (expires_at) instead of a shared
  stored_at; prune by per-entry expiry on store and enforce it on
  resolve. Previously a short-TTL store could evict long-TTL entries and
  expired handles stayed resolvable until the next write.
- Derive the artifact scope from the executor's crew or the agent's crew
  so the native and ReAct paths agree with the crew-id Crew passes to
  clear_artifact_scope, preventing crew-bound artifacts from lingering
  under a task scope until TTL.
Comment thread lib/crewai/src/crewai/tools/file_artifact.py
mattatcha added 2 commits June 4, 2026 19:46
- Wire resolve_artifact_handles/store_if_artifact into
  agent_utils.execute_single_native_tool_call, the native tool path used
  by StepExecutor (Plan-and-Execute mode); previously a FileArtifact
  produced or consumed there bypassed the bypass and leaked raw bytes.
- Fold the crew/agent.crew fallback into artifact_scope_id(crew, task,
  agent) so all four execution paths derive the cleanup scope identically
  instead of repeating 'self.crew or getattr(self.agent, ...)'.
- Sanitize ']' and newlines (not just quotes) in the placeholder so an
  odd filename can't break the bracketed attribute line; extend
  _human_size with TB/PB.
The tool-result cache stores the raw FileArtifact and hands back the same
instance on every cache hit, where store_if_artifact ran again and minted a
fresh handle plus another byte copy. Repeated identical cached calls therefore
stacked duplicate copies of the same payload until scope/TTL cleanup. The store
now reuses an existing handle when the same FileArtifact instance is stored
again under the same scope, keying off object identity (the cache keeps the
object alive, so the id is stable within a run).
Comment thread lib/crewai/src/crewai/agents/crew_agent_executor.py
mattatcha added 2 commits June 4, 2026 20:06
store_if_artifact ran before the after_tool_call hooks, so a hook that
replaced the result with a FileArtifact put raw bytes / a dataclass repr
into the tool message and events. Re-run store_if_artifact on the final
result after the hook loop in all three native tool paths (no-op for the
normal string case).
Keying the object-identity dedup map on id(artifact) alone could orphan a
mapping if the same instance were ever stored under two scopes (the second
store overwrote the first's entry, and per-handle cleanup then skipped it).
Key on (id(artifact), scope) so each scope keeps its own handle and cleanup is
exact and unconditional. Also clear _handle_by_obj in the test fixture so stale
id->handle mappings don't accumulate across the session.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit ad798af. Configure here.

Comment thread lib/crewai/src/crewai/crew.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant