test(e2e): stress native watcher with safeTime canary#1369
Draft
stormslowly wants to merge 55 commits into
Draft
test(e2e): stress native watcher with safeTime canary#1369stormslowly wants to merge 55 commits into
stormslowly wants to merge 55 commits into
Conversation
…acOS On macOS the FSEvents subsystem keeps a per-path ring buffer that can replay history events to watcher streams attached later. When prepareFixtures repeatedly cleans and re-copies into the same target directory, file watchers (the rstest config watcher, and the rspack NativeWatcher introduced in web-infra-dev#518) can observe a Delete+Create batch on the same path, yielding spurious change events. With nativeWatcher enabled this manifests as a flaky "restarting Rstest as rstest.config.mts changed" that erases the CLI output the e2e asserts on. Per-call unique directories produce a path that has no FSEvents history, so only Create events for the freshly copied fixture are delivered. A symlink at the hard-coded fixturesTargetPath keeps the call sites in test code unchanged. Linux/Windows paths fall through to the original behaviour.
Mirrors the one-line change from web-infra-dev#518 so the e2e matrix on this branch exercises the nativeWatcher code path while validating the fresh fixture directory strategy. Used to confirm the prepareFixtures change in the prior commit eliminates the flaky "restarting Rstest as rstest.config.mts changed" failure under nativeWatcher.
…tcher Under chokidar the 10 ms value was a safe watchpack event-merging window. With rspack experiments.nativeWatcher enabled in the previous commit, the same 10 ms becomes the gap between the FSEvent arriving and rspack's stat/read of the changed file. On macOS that gap is shorter than vnode cache invalidation latency, so rspack reads stale content, produces identical chunk hashes, and rstest then logs "No test files need re-run". Local sweep (5 runs per value, vanilla notify-rs latency=0.0, fresh fixture dir per call): aggregateTimeout Pass "No test files need re-run" lines 10 0/5 36 100 5/5 0 200 5/5 0 500 5/5 0 1000 5/5 0 100 ms is enough; no rspack or notify-rs changes are needed.
100 ms was enough on local M-series macs (5/5 PASS) but the
GitHub Actions macos-14 runner still triggered mode B
("No test files need re-run" -> stale 'Hello, INDEX!') in every
e2e (macos-14, node 20/24) run on the prior commit.
The Actions ARM64 runner appears to have a noticeably slower
vnode cache invalidation than local hardware, so we need a bigger
margin. 500 ms matches the upper half of the local sweep band that
was 5/5 PASS (200 / 500 / 1000 all passed) and is 2.5x the
watchpack default of 200 ms. Hot reload latency stays well under
human perception.
Temporary investigation commit. CI macos-14 still hits mode B with fixture aggregateTimeout=500 even though the local sweep with the same value was 5/5 PASS. Need to confirm what value the child rstest process ACTUALLY receives (fixture cwd lookup, rsbuild config merge order, nativeWatcher flag) before deciding the next move. Will be reverted after one CI run.
…mode" This reverts commit 8400f59.
The GitHub Actions macos-14 arm64 runner is roughly 6x slower than the local M-series Mac on this watch flow (each fs.update -> rerun cycle is 2-3 s in CI vs ~300 ms locally). aggregateTimeout=500 was still hitting mode B 100% of the time on CI, indicating that the vnode cache invalidation on the Actions runner has not finished by the time rspack performs stat/read. Going to 3000 ms gives the runner ~6x more wall-clock budget than the 500 value that already worked locally with full headroom. Last attempt in the timeout-only direction; if this still fails the root cause is not the aggregate window and we will revisit.
…xtures Temporary investigation. CI macos-14 e2e fails with mode B at every aggregateTimeout value (100/500/3000), but identical local runs pass. Suspecting that the cp() into the fresh sibling directory does not include rstest.config.mts under some Actions runner condition, so the child rstest falls back to the entry.ts default (5 ms). Print the freshly copied config content so the CI log shows the actual file seen by the child process. Will be reverted.
Cannot reproduce the macos-14 + node 24 watch e2e failure locally on the same Node version (24.15.0), same fixture, same binding, same concurrency. Last unknown is the GitHub Actions macos-14 runner's actual hardware/OS state. Open a tmate session on failure, limited to that one matrix slot, so we can inspect: - runner CPU/memory pressure - the fresh-dir contents after fs.update - whether rspack input fs cache holds the old content - whether vnode invalidation lag is observable Will be reverted after the investigation.
…lure Pin @rspack/core to a canary build of rspack worktree-rstest-native-watcher (commit 27bc1e20) so the binding writes RCA trace lines into /tmp/rca-binding-trace.log on the runner: - NativeWatcher::new aggregateTimeout/poll/symlink - JsEventHandler::on_event_handle aggregate changedFiles/deletedFiles - JsEventHandlerUndelayed::on_change every undelayed path Replace the prior tmate step with a `cat /tmp/rca-binding-trace.log` step on failure so the dump shows up directly in the macos-14 job log without needing an interactive SSH. Will be reverted after the investigation; not for merge.
The previous bump chain (10 -> 100 -> 500 -> 3000) was a workaround for the rspack_watcher mtime-gate race that has now been fixed in the canary (efffb233). With the race gone, 3000ms was unnecessary, but falling back to the original 10ms was still too aggressive: rspack fires stat/read inside the macOS vnode-cache settling window and reads stale content. 100ms is the smallest value that survives macOS-14 node20+node24 CI runs.
…e overrides rstest used to default rspack's `watchOptions.aggregateTimeout` to 5ms (see Watching.ts comment). That was safe under the chokidar watcher where the aggregate is just a watchpack-level event-merging window, but is too aggressive under the rspack native watcher on macOS: the runtime stats the changed file ~5ms after the FSEvent arrives, well inside the vnode-cache settling window, and reads stale content. The resulting "No test files need re-run" is the failure mode addressed by this PR. Set the default to 100ms in the rstest plugin so every watch project inherits a sane value without each fixture wiring its own override. Drop the per-fixture `tools.rspack.watchOptions.aggregateTimeout: 10` that was added long ago (chokidar era). Individual cases that need a different value can still set it via `tools.rspack.watchOptions` and the user value wins through rsbuild's regular merge order.
…ixtures
Removes the temporary investigation-only steps now that the underlying
rspack_watcher mtime-gate race is fixed upstream and the rstest fix is
covered by regular CI:
- .github/workflows/test.yml: "Debug — show resolved @rspack/* versions"
and "Dump rspack binding RCA trace" steps
- e2e/scripts/index.ts: __RCA_FIXTURE__ stdout dump in prepareFixtures
…fix) Switch from canary efffb233 (stateless arrival-vs-mtime) to canary 96980782 (watchpack-style incremental baseline). The new approach keeps the per-file mtime baseline intact across rspack rebuild cycles, which restores the original "filter stale FSEvent replays" property while still avoiding the baseline-snapshot-after-user-write race that previously caused mode B on macos-14 + node 24.
…prepareFixtures" This reverts commit 0d20a19.
Picks up the final fix for the FSEvents mtime baseline race + the Option 2 perf follow-up (record_initial_file_mtimes now only stats files added in this watch cycle, not the full registered set). Tracks web-infra-dev/rspack#14077.
Picks up the latest fix/fsevents-mtime-race tip (now includes main-merge 9b248702). Local prepare/build currently panics with "attempted to get vtable for an unregistered impl" in rspack_cacheable::dyn — unrelated to the watcher fix and looks like a regression introduced by the main-merge in this canary. Pushing to surface the panic in CI as a public artifact for a separate bug report.
…re-dir-per-call # Conflicts: # pnpm-lock.yaml
…re-dir-per-call # Conflicts: # package.json # pnpm-lock.yaml
Stress matrix should now verify the 2% residual flake fix on macOS: late-arriving events that previously sat in files_data forever should now be drained by the post-handler re-flush.
Stress matrix should now verify whether the safeTime backfill mechanism defeats Race 3 (file registration timing) by replaying events that landed between rspack's done hook and the process.nextTick rewatch.
Verifies whether the dir-event rescan defeats FSEvents partial-delivery (where only the parent-dir Change arrives but not the file Change). Also broadens trace dump filter to capture fixture-root-level raw events so we can verify or disprove the hypothesis post-stress.
…ll registered) Verifies the watchpack-equivalent extensions: dir-event rescan now walks actual dir contents to catch unregistered files, and safeTime backfill covers all registered files (not just newly added). Targets the residual dynamic-import flake.
…sive walk) The 17ee60b7 canary added recursive dir walking + safeTime backfill over all registered files, which regressed stress matrix from 79/80 to 74/80 because the recursive walk picked up dist/.rstest-temp/* outputs and the extended backfill double-fired on every rebuild (safe_time == start_time). Reverting to the proven a85437e7 version to re-confirm the 79/80 baseline before submitting watcher PR.
Verifies that filtering ignored paths (dist/.rstest-temp/*) at the
trigger's on_event entry eliminates the [root]-only noise js_event_handle
batches that were causing waitForStdout('Duration') to capture stray
build-output rebuilds.
Control experiment: run the macos-14 stress matrix against watchpack (the JS-based webpack/rspack default) instead of rspack's nativeWatcher to compare pass rates. This will tell us whether the residual flake is a watcher-implementation issue (watchpack should pass 100%) or a deeper race in rspack-core / rstest test spec. Revert this before any production change.
Need full visibility into what's producing [root]-only js_event_handle flushes that bypass the ignored filter. Removing the \.rstest-temp exclusion from the dump filter so we can see raw dispatched events for build-output paths and confirm whether the ignored filter is actually suppressing them.
The trace dump filter change (previous commit) is meant to capture what nativeWatcher dispatches when the watchpack baseline was 100%. Reverting the watchpack switch so we get nativeWatcher trace output to compare.
Last run hit 'Binary file matches' because grep auto-detected non-text bytes (likely from path display in some events) and refused to print matches. Force text mode with -a so the dump is actually usable for diagnosis.
Events are reported via the symlink (fixtures-test-0-module), not the realpath (fixtures-test-0-module-r1234-N-xxxx) thanks to the realpath translation in the watcher. Drop the -r[0-9]+ suffix requirement from the dump filter.
Critical fix: previously the ignored glob "**/dist/.rstest-temp" only matched the directory itself, so every write to dist/.rstest-temp/foo.mjs escaped the filter and produced a noisy [root]-only js_event_handle that triggered a fake rebuild during step 2 of the watch e2e tests. Now the watcher walks the path and every ancestor against the ignored patterns — files inside an ignored directory match via the ancestor check. Expectation: near-100% stress pass rate (matching watchpack).
Repoint the @rspack/core override to the rspack canary that ports watchpack's safeTime / mtime-accuracy mechanism into the native watcher (web-infra-dev/rspack#14267), so the web-infra-dev#1261 native-watcher stress matrix exercises the new fileTimeInfoEntries / contextTimeInfoEntries path.
Author
Stress result —
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Stress-test the rspack native file watcher carrying the ported watchpack
safeTime/ mtime-accuracy mechanism (web-infra-dev/rspack#14267) under rstest's e2e watch matrix.Before that PR the native watcher returned empty
fileTimeInfoEntries/contextTimeInfoEntries; it now produces the full{ safeTime, timestamp }tables. This run validates the new path holds up under the macОS/Windows watch stress matrix.What
Builds on #1261's harness (enables
experiments.nativeWatcher,aggregateTimeout: 100, macOS fresh-fixture symlink, theSTRESS_*CI matrix). The only delta in this PR is repointing the@rspack/coreoverride to the rspack canary that includes the safeTime change: