fix: keep binary tool I/O out of the LLM context with FileArtifact by mattatcha · Pull Request #6047 · crewAIInc/crewAI

mattatcha · 2026-06-05T00:08:07Z

Summary

LLMs can't reproduce long opaque strings byte-perfect, so a binary file routed through a tool-call argument — e.g. a generated PPTX/PDF passed as base64 content to an upload connector — drifts a few characters, invalidates the base64, and corrupts the uploaded file.

This adds FileArtifact: a tool returns it instead of a base64 string, and the agent executor keeps the bytes out of the model context behind a short, namespaced crewai+file:// handle, expanding it back to exact base64 just before the consuming tool runs. The bytes never enter the model, so they can't be corrupted.

Key Changes

New crewai.tools.FileArtifact + an execution-scoped, TTL-backed store with namespaced handles (resolution only ever fires on tokens we mint, never on user data).
Honored on both native executors (the default AgentExecutor and the legacy CrewAgentExecutor) and the ReAct path: store-on-output, resolve-handle-on-input, including cache hits. Original args are never mutated, so events/cache/logs keep the short handle.
Per-run cleanup wired into crew kickoff; TTL prune as the backstop.
Tests: unit coverage + both-executor integration (SHA-256 round-trip of a 30 KB payload).

Usage

from crewai.tools import FileArtifact

# In a tool's _run, instead of returning base64:
return FileArtifact(data=pptx_bytes, filename="deck.pptx", mime_type="application/vnd...")

The consuming connector tool needs no change — the executor resolves the handle into its content argument generically.

Note

Medium Risk
Touches all tool execution paths and in-memory artifact lifecycle; incorrect handle resolution or scope cleanup could break producer→consumer tool chains or leak memory, but changes are additive and well-tested.

Overview
Adds FileArtifact so tools can pass binary data without routing base64 through the LLM. When a tool returns a FileArtifact, executors store bytes in a process-local, crew-scoped store and surface a short crewai+file:// handle to the model; resolve_artifact_handles expands those handles to base64 only at tool invoke time, while events, cache keys, and logged args keep the handle unchanged.

Wiring spans legacy CrewAgentExecutor, experimental AgentExecutor, ReAct (agent_utils), and ToolUsage (sync/async invoke + _format_result). Crew kickoff (kickoff / akickoff) calls clear_artifact_scope alongside existing file cleanup. FileArtifact is exported from crewai.tools. New test_file_artifact.py covers store/resolve/scoping/TTL and both native executors end-to-end.

^{Reviewed by Cursor Bugbot for commit 0fba816. Bugbot is set up for automated code reviews on this repo. Configure here.}

Summary by CodeRabbit

New Features
- Binary file artifact support for tools with stable placeholder handles and preserved metadata (filename, MIME, size).
- Tools now resolve artifact handles in inputs and persist artifact outputs across native and experimental executions.
- Per-execution artifact scoping with automatic cleanup when a run finishes.
- FileArtifact is exposed in the public tools API.
Tests
- Comprehensive tests for artifact storage, resolution, scoping, TTL pruning, and round‑tripping.

Binary file data (PPTX, PDF, images) returned by one tool and echoed by the model as a base64 argument to another tool drifts by a few characters, invalidating the base64 and corrupting the resulting file. Tools can now return a FileArtifact instead of a base64 string. The agent executor stores the bytes out-of-band (execution-scoped, TTL-backed) and shows the model a short, namespaced `crewai+file://` handle, expanding it back to exact base64 just before the consuming tool runs. Wired into the native (default AgentExecutor + legacy) and ReAct execution paths with per-run cleanup.

linear · 2026-06-05T00:08:10Z

AGE-103

coderabbitai · 2026-06-05T00:08:29Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds FileArtifact-based out-of-band binary passing: in-memory scoped store with TTL, model-facing placeholders, resolver/store helpers, wiring into native and structured tool execution (including cached results), crew-scoped cleanup, and tests validating end-to-end behavior.

Changes

Binary Artifact Out-of-Band Passing for Tools

Layer / File(s)	Summary
Artifact Storage and Helper Functions `lib/crewai/src/crewai/tools/file_artifact.py`, `lib/crewai/src/crewai/tools/__init__.py`	`FileArtifact` dataclass for bytes+metadata; process-local `_ArtifactStore` mints `crewai+file://<uuid>` handles, stores bytes with optional scope and TTL, and prunes expired entries. Public helpers: `store_artifact()`, `resolve_artifact_handles()`, `store_if_artifact()`, `clear_artifact_scope()`, and `artifact_scope_id()`.
Native Tool Executor Artifact Integration `lib/crewai/src/crewai/agents/crew_agent_executor.py`, `lib/crewai/src/crewai/experimental/agent_executor.py`, `lib/crewai/src/crewai/utilities/agent_utils.py`	Both legacy and experimental executors and utility helpers compute `scope_id` from crew/task/agent, resolve handles in arguments before calling native functions, and pass cached and fresh native tool outputs through `store_if_artifact()` to persist/represent artifacts.
Structured Tool Artifact Integration `lib/crewai/src/crewai/tools/tool_usage.py`	ToolUsage resolves artifact handles in filtered tool arguments for both `invoke`/`ainvoke` paths and adds `_artifact_scope_id` so `_format_result()` calls `store_if_artifact()` with the proper scope.
Crew Execution Lifecycle Cleanup `lib/crewai/src/crewai/crew.py`	Adds `clear_artifact_scope(self.id)` to `kickoff()` and `akickoff()` `finally` blocks after `clear_files(self.id)` to remove crew-scoped artifacts at execution end.
Artifact System Test Suite `lib/crewai/tests/tools/test_file_artifact.py`	Comprehensive tests: base64 round-trip, metadata defaults and size, placeholder contents and escaping, scope-id preference/fallback, recursive resolve without mutation, conditional store behavior, scoped clearing, native executor producer/consumer wiring, TTL pruning, and test isolation fixture.

Sequence Diagram(s)

sequenceDiagram
  participant ProducerTool
  participant Executor
  participant ArtifactStore
  participant Model
  participant ConsumerTool
  ProducerTool->>Executor: return FileArtifact(bytes, filename, mime)
  Executor->>ArtifactStore: store_artifact(artifact, scope_id)
  ArtifactStore-->>Executor: placeholder (crewai+file://uuid + metadata)
  Executor->>Model: include placeholder string in model context
  Model-->>Executor: model-triggered consumer tool call with crewei+file://uuid
  Executor->>ArtifactStore: resolve_artifact_handles(arguments)
  ArtifactStore-->>Executor: arguments with base64 payload
  Executor->>ConsumerTool: invoke consumer tool with resolved bytes
  Note over Executor: At crew end
  Executor->>ArtifactStore: clear_artifact_scope(crew_id)
  ArtifactStore->>ArtifactStore: delete crew-scoped artifacts

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

size/M, enhancement

Suggested reviewers

greysonlalonde

Poem

🐇 I hide bytes in handles, neat and small,

crewi strings that hop from tool to call.
Producers store, consumers fetch with cheer,
Scopes swept clean when crew time disappears.
Hooray — bytes travel light and clear!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 30.43% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: introducing FileArtifact to keep binary tool I/O out of LLM context, which is the core objective of this PR.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch matcha/age-103-file-artifact-binary-bypass

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

The handle regex matches hex case-insensitively, but store keys are lowercase uuid4 strings. Normalize the captured uuid to lowercase before lookup so a handle echoed by the model with uppercase hex still resolves to its bytes instead of leaking the raw token to the downstream tool.

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

lib/crewai/src/crewai/crew.py (1)
1-1: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Fix formatting issues before merge.

The pipeline is failing because the code needs to be formatted with ruff. Run the following command to fix:
uv run ruff format lib/
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai/src/crewai/crew.py` at line 1, The file
lib/crewai/src/crewai/crew.py (starts with "from __future__ import annotations")
has formatting issues flagged by ruff; run the formatter command (uv run ruff
format lib/) to auto-fix formatting across lib/, verify crew.py and any changed
files are properly formatted, then stage and commit the formatted changes before
merging.
Source: Pipeline failures

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/crewai/src/crewai/tools/file_artifact.py`:
- Around line 23-240: The file lib/crewai/src/crewai/tools/file_artifact.py
fails ruff formatting; run the project formatter (e.g., ruff format) on this
module and commit the resulting changes so CI passes. Focus on formatting the
whole file (classes FileArtifact, _ArtifactStore, functions store_artifact,
resolve_artifact_handles, store_if_artifact, clear_artifact_scope,
artifact_scope_id, and _human_size) and then re-run ruff --check to verify no
further edits are required.
- Around line 95-152: _Entry currently only records stored_at so TTL from the
most recent store() call is used for pruning and resolve() never enforces
expiry; add a ttl: int field to _Entry, set it in _ArtifactStore.store (store
the passed ttl per entry), update _prune_locked to compare each entry.stored_at
+ entry.ttl against the cutoff (or check per-entry expiry) rather than using the
incoming ttl, and make _ArtifactStore.resolve check the entry's ttl/stored_at
and return None (and optionally delete the entry) when it is expired; leave
clear_scope behavior unchanged.

In `@lib/crewai/src/crewai/tools/tool_usage.py`:
- Around line 331-337: The file has formatting issues causing ruff format
--check to fail; run ruff (or your project's formatter) to reformat this module,
focusing on the sections around resolve_artifact_handles, calling.arguments, and
tool.ainvoke usages (the artifact-handling blocks and the other indicated
regions) so the code meets ruff style rules and CI passes; after formatting,
re-run ruff format --check to verify the module is clean.

In `@lib/crewai/tests/tools/test_file_artifact.py`:
- Around line 217-223: Run the Ruff formatter over the repository (or at least
this file) to fix lint/formatting errors causing CI to fail; e.g., run `uv run
ruff format lib/crewai/tests/tools/test_file_artifact.py` (or `uv run ruff
format lib/`) and commit the changes so the block that constructs the
SimpleNamespace tool_call and the call to
executor._execute_single_native_tool_call is reformatted to satisfy Ruff. Ensure
the SimpleNamespace instantiation (tool_call, function, arguments) adheres to
the project's formatting rules after running Ruff.

---

Outside diff comments:
In `@lib/crewai/src/crewai/crew.py`:
- Line 1: The file lib/crewai/src/crewai/crew.py (starts with "from __future__
import annotations") has formatting issues flagged by ruff; run the formatter
command (uv run ruff format lib/) to auto-fix formatting across lib/, verify
crew.py and any changed files are properly formatted, then stage and commit the
formatted changes before merging.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: d08f95f3-8469-4030-845e-9570b02a20cb

📥 Commits

Reviewing files that changed from the base of the PR and between 906cd97 and f300c1f.

📒 Files selected for processing (7)

lib/crewai/src/crewai/agents/crew_agent_executor.py
lib/crewai/src/crewai/crew.py
lib/crewai/src/crewai/experimental/agent_executor.py
lib/crewai/src/crewai/tools/__init__.py
lib/crewai/src/crewai/tools/file_artifact.py
lib/crewai/src/crewai/tools/tool_usage.py
lib/crewai/tests/tools/test_file_artifact.py

coderabbitai · 2026-06-05T00:15:55Z

+        tool_call = SimpleNamespace(
+            id="c",
+            function=SimpleNamespace(
+                name=func_name, arguments=args if isinstance(args, str) else json.dumps(args)
+            ),
+        )
+        return executor._execute_single_native_tool_call(tool_call)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Run Ruff formatting to unblock CI.

lint-run is currently failing on formatting; this block (notably Line 220) is a likely reformat target. Please run uv run ruff format lib/ (or at least this file) before merge.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/crewai/tests/tools/test_file_artifact.py` around lines 217 - 223, Run the Ruff formatter over the repository (or at least this file) to fix lint/formatting errors causing CI to fail; e.g., run `uv run ruff format lib/crewai/tests/tools/test_file_artifact.py` (or `uv run ruff format lib/`) and commit the changes so the block that constructs the SimpleNamespace tool_call and the call to executor._execute_single_native_tool_call is reformatted to satisfy Ruff. Ensure the SimpleNamespace instantiation (tool_call, function, arguments) adheres to the project's formatting rules after running Ruff.

Source: Pipeline failures

- Store each artifact's own expiry (expires_at) instead of a shared stored_at; prune by per-entry expiry on store and enforce it on resolve. Previously a short-TTL store could evict long-TTL entries and expired handles stayed resolvable until the next write. - Derive the artifact scope from the executor's crew or the agent's crew so the native and ReAct paths agree with the crew-id Crew passes to clear_artifact_scope, preventing crew-bound artifacts from lingering under a task scope until TTL.

- Wire resolve_artifact_handles/store_if_artifact into agent_utils.execute_single_native_tool_call, the native tool path used by StepExecutor (Plan-and-Execute mode); previously a FileArtifact produced or consumed there bypassed the bypass and leaked raw bytes. - Fold the crew/agent.crew fallback into artifact_scope_id(crew, task, agent) so all four execution paths derive the cleanup scope identically instead of repeating 'self.crew or getattr(self.agent, ...)'. - Sanitize ']' and newlines (not just quotes) in the placeholder so an odd filename can't break the bracketed attribute line; extend _human_size with TB/PB.

The tool-result cache stores the raw FileArtifact and hands back the same instance on every cache hit, where store_if_artifact ran again and minted a fresh handle plus another byte copy. Repeated identical cached calls therefore stacked duplicate copies of the same payload until scope/TTL cleanup. The store now reuses an existing handle when the same FileArtifact instance is stored again under the same scope, keying off object identity (the cache keeps the object alive, so the id is stable within a run).

store_if_artifact ran before the after_tool_call hooks, so a hook that replaced the result with a FileArtifact put raw bytes / a dataclass repr into the tool message and events. Re-run store_if_artifact on the final result after the hook loop in all three native tool paths (no-op for the normal string case).

Keying the object-identity dedup map on id(artifact) alone could orphan a mapping if the same instance were ever stored under two scopes (the second store overwrote the first's entry, and per-handle cleanup then skipped it). Key on (id(artifact), scope) so each scope keeps its own handle and cleanup is exact and unconditional. Also clear _handle_by_obj in the test fixture so stale id->handle mappings don't accumulate across the session.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit ad798af. Configure here.}

…ss runs

github-actions Bot added the size/XL label Jun 5, 2026