Skip to content

du: do not panic on a non-UTF-8 name in the --exclude check#12771

Open
leeewee wants to merge 1 commit into
uutils:mainfrom
leeewee:du-fix-exclude-non-utf8
Open

du: do not panic on a non-UTF-8 name in the --exclude check#12771
leeewee wants to merge 1 commit into
uutils:mainfrom
leeewee:du-fix-exclude-non-utf8

Conversation

@leeewee

@leeewee leeewee commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Fixes #12769

du's regular (non-safe) traversal — used whenever -L/--dereference is given, and always on non-Linux — matched --exclude patterns against entry.file_name().into_string().unwrap(). OsString::into_string() returns Err for any non-UTF-8 entry name, so the .unwrap() aborted the process:

$ mkdir -p duex/sub
$ python3 -c "import os; os.mkdir(b'duex/sub/\xff\xfedir')"
$ du -L --exclude='zzz' duex
thread 'main' panicked at src/uu/du/src/du.rs:656:89:
called `Result::unwrap()` on an `Err` value: "\xFF\xFEdir"
$ echo $?
134

GNU lists the directory and exits 0. This uses to_string_lossy() instead — matching the sibling this_stat.path.to_string_lossy() already on the line above (and the identical exclude check in the safe traversal) — so a non-UTF-8 name is handled gracefully:

$ du -L --exclude='zzz' duex ; echo $?
...
0

Added a Unix-gated regression test (test_du_exclude_non_utf8_name).

The regular (non-safe) traversal — used with `-L`/`--dereference`, and always on
non-Linux — matched `--exclude` patterns against
`entry.file_name().into_string().unwrap()`, which aborts on any non-UTF-8 entry
name. Use `to_string_lossy()` instead, matching the sibling path check on the
same line, so a non-UTF-8 name is handled gracefully like GNU.
@codspeed-hq

codspeed-hq Bot commented Jun 11, 2026

Copy link
Copy Markdown

Merging this PR will improve performance by 3.45%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 1 improved benchmark
✅ 322 untouched benchmarks
⏩ 46 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation single_date_now 85.5 µs 82.7 µs +3.45%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing leeewee:du-fix-exclude-non-utf8 (c86ab87) with main (220a8ec)

Open in CodSpeed

Footnotes

  1. 46 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown

GNU testsuite comparison:

Skip an intermittent issue tests/tail/tail-n0f (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/timeout/timeout-group (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/cut/bounded-memory (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/tail/retry (passes in this run but fails in the 'main' branch)

@leeewee leeewee force-pushed the du-fix-exclude-non-utf8 branch from db2105c to c86ab87 Compare June 11, 2026 08:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

du panic (into_string().unwrap) on a non-UTF-8 filename in the dereferencing traversal

1 participant