Skip to content

fix(files): prune dependency dirs in expandFileGlobs before fast-glob traversal#410

Draft
dcramer wants to merge 4 commits into
mainfrom
fix/prune-vendor-dirs-in-glob-expansion
Draft

fix(files): prune dependency dirs in expandFileGlobs before fast-glob traversal#410
dcramer wants to merge 4 commits into
mainfrom
fix/prune-vendor-dirs-in-glob-expansion

Conversation

@dcramer

@dcramer dcramer commented Jun 16, 2026

Copy link
Copy Markdown
Member

What

BUILTIN_IGNORE_PATTERNS in scan-policy.ts already skips vendor/, node_modules/, dist/ etc. after enumeration (in createSyntheticFileChange via getPrePatchFileSkip), but fast-glob was still traversing those entire directory trees first.

For a new Laravel app the vendor/ tree can contain 10,000–50,000 PHP files. Running warden dieter/**/*.php caused fast-glob to enumerate that entire tree, creating severe memory pressure that triggered the reported segfault/crash.

Changes

  • BUILTIN_PRUNE_DIRECTORY_PATTERNS — new exported constant listing the directory patterns that are safe to cut at traversal time: **/vendor/**, **/node_modules/**, **/dist/**, **/build/**, **/.next/**, **/.nuxt/**, **/out/**, **/coverage/**, **/.cache/**
  • getEffectivePrunePatterns(userIgnorePaths?) — computes the effective fast-glob ignore list, dropping any prune entry where the user has supplied a negation override (e.g. !vendor/** in warden config)
  • ExpandGlobOptions.ignore — new optional field threads the user ignore config through to expandFileGlobs so negation overrides reach the traversal layer
  • expandFileGlobs — passes ['**/.git/**', ...prunePatterns] as the fast-glob ignore option instead of just ['**/.git/**']
  • expandAndCreateFileChanges — passes the ignore option through to expandFileGlobs
  • gitignore fallback scan — updated to use BUILTIN_PRUNE_DIRECTORY_PATTERNS (previously only skipped node_modules/, now consistent with the traversal prune list)

Verified

  • pnpm --filter @sentry/warden exec vitest run src/cli/files.test.ts — 35 tests passed (25 pre-existing + 10 new)
  • pnpm --filter @sentry/warden typecheck passed

New test coverage:

  • getEffectivePrunePatterns unit tests (default, negation override, edge cases)
  • expandFileGlobs prunes vendor/ and node_modules/ by default, including in non-git repos
  • expandFileGlobs re-includes vendor/ when user config has !vendor/**
  • expandAndCreateFileChanges threads the ignore config override end-to-end

View Session in Sentry

… traversal

BUILTIN_IGNORE_PATTERNS already skips vendor/, node_modules/, dist/ etc.
after enumeration (in createSyntheticFileChange via getPrePatchFileSkip),
but fast-glob was still traversing those trees before the skip could apply.

For a new Laravel app the vendor/ tree can contain 10,000–50,000 PHP files.
Running `warden dieter/**/*.php` caused fast-glob to enumerate that entire
tree, creating extreme memory pressure that likely triggered the reported
segfault/crash.

Fix: introduce BUILTIN_PRUNE_DIRECTORY_PATTERNS and getEffectivePrunePatterns()
so that directory-level ignores are applied to the fast-glob ignore list at
traversal time. User negation patterns (e.g. `!vendor/**` in warden config)
are respected and remove the corresponding prune entry, allowing advanced
users to re-include a dependency directory when needed.

Also updates the gitignore-fallback directory scan to use the same prune
list (previously only skipped node_modules/, now skips all built-in prune
dirs) so behaviour is consistent across both code paths.

expandAndCreateFileChanges now threads the ignore config through to
expandFileGlobs so user negation overrides reach the traversal layer.

Co-Authored-By: sentry-junior[bot] <264270552+sentry-junior[bot]@users.noreply.github.com>
Comment thread packages/warden/src/cli/files.ts Outdated
Comment thread packages/warden/src/cli/files.ts
…onError

If the built-in directory prune list is partially overridden (e.g. user negates
!vendor/** in warden config) and a broad glob is run against a tree with more
than 10,000 files, warden now throws WardenGlobExpansionError immediately with
an actionable error message rather than silently consuming memory until crash.

runFileMode catches the error and surfaces it via reporter.error so it renders
cleanly in both TTY and JSON output modes.

Message example:
  Glob pattern matched 15,432 files (limit is 10,000).
  This usually means a dependency directory (vendor/, node_modules/, ...) is
  being scanned.

  Try one of:
    • Quote the pattern to avoid shell expansion: warden 'dieter/**/*.php'
    • Narrow to your application code:           warden dieter/app/**/*.php
    • Keep dependency dirs explicitly excluded in warden.toml:
        [defaults.ignore]
        paths = ["**/vendor/**"]

Co-Authored-By: sentry-junior[bot] <264270552+sentry-junior[bot]@users.noreply.github.com>
Comment thread packages/warden/src/cli/files.ts
return 1;
}
reporter.error('Failed to build context');
return 1;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context build hides error text

Low Severity

The new runFileMode try/catch reports a helpful message for WardenGlobExpansionError, but any other error from buildFileEventContext is reported only as Failed to build context, dropping the underlying Error message that would explain I/O or config failures.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 8c4d39e. Configure here.

The MAX_GLOB_FILE_RESULTS guardrail in expandFileGlobs fires after fast-glob
returns, which is too late if the shell pre-expanded the glob before warden
ran (e.g. zsh globstar turning dieter/**/*.php into 15,000 explicit paths in
argv). At that point the oversized array already exists in memory.

Add an early guard at the top of runFileMode that checks filePatterns.length
against MAX_GLOB_FILE_RESULTS before any config load, file I/O, or context
build. This catches the shell-expansion case with zero overhead.

Co-Authored-By: sentry-junior[bot] <264270552+sentry-junior[bot]@users.noreply.github.com>
…rdcoded prune list

Root cause: git ls-files with pathspecs (.gitignore **/.gitignore) does not
reliably recurse into brand-new untracked directories to find their .gitignore
files. A new Laravel app in dieter/ would have dieter/.gitignore (with vendor/)
undetected, so vendor/ was not gitignored and warden would traverse it.

Fix: drop the pathspecs from the git ls-files call and filter for .gitignore
files client-side. Without pathspecs git recurses into all untracked dirs and
applies each directory's own .gitignore rules via --exclude-standard, so
dieter/.gitignore is both discovered and applied correctly.

Remove BUILTIN_PRUNE_DIRECTORY_PATTERNS / getEffectivePrunePatterns: hardcoding
vendor/, node_modules/ etc in the fast-glob ignore list is the wrong layer.
The correct mechanism is each project's .gitignore and that now works.

Move the MAX_GLOB_FILE_RESULTS guardrail to AFTER gitignore filtering so that
properly gitignored dependency directories don't false-positive - the limit
now only fires when .gitignore is absent or misconfigured.

Removes ExpandGlobOptions.ignore and the expandAndCreateFileChanges ignore
pass-through that only existed to support the prune list.

Co-Authored-By: sentry-junior[bot] <264270552+sentry-junior[bot]@users.noreply.github.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 97d175d. Configure here.

if (filteredFiles.length >= MAX_GLOB_FILE_RESULTS) {
throw new WardenGlobExpansionError(filteredFiles.length, MAX_GLOB_FILE_RESULTS);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dependency trees enumerated before filter

High Severity

expandFileGlobs runs fast-glob with only **/.git/** ignored, then applies gitignore to the full match list. Gitignored paths like vendor/ still get walked and collected first. MAX_GLOB_FILE_RESULTS runs only on the filtered array, so large dependency trees can still exhaust memory or crash before the guard runs.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 97d175d. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant