perf(core): keep the worker pool busy while collecting coverage#1348
Draft
fi3ework wants to merge 2 commits into
Draft
perf(core): keep the worker pool busy while collecting coverage#1348fi3ework wants to merge 2 commits into
fi3ework wants to merge 2 commits into
Conversation
When coverage is enabled, the host process decoded and merged every test file's coverage on its single event loop during the run — the same loop that dispatches the next file to each free worker. Under load that decode work starved dispatch, leaving workers idle and CPU under-utilized (worse with more files / larger coverage). Workers now write coverage to a temp file and hand back only the path, so the host never decodes coverage mid-run. After the run, the files are read and merged in a worker_threads pool in parallel, then combined. Coverage output is byte-identical; on a 500-file project utilization rose ~56%→~80% and wall time dropped ~30%. Each provider (@rstest/coverage-istanbul, @rstest/coverage-v8) ships a self-contained merge worker that uses its own istanbul merge; the core pool orchestrates the threads and falls back to a host-side parse if needed. DEBUG=rstest prints pool env + event-loop/ingest diagnostics. Closes #1326
pkg.pr.new preview
|
Rsdoctor Bundle Diff AnalysisFound 13 projects in monorepo, 3 projects with changes. 📊 Quick Summary
📋 Detailed Reports (Click to expand)📁 coverage-istanbulPath:
📦 Download Diff Report: coverage-istanbul Bundle Diff 📁 coverage-v8Path:
📦 Download Diff Report: coverage-v8 Bundle Diff 📁 core/mainPath:
📦 Download Diff Report: core/main Bundle Diff Generated by Rsdoctor GitHub Action |
The end-of-run fan-out (b3a5a97) materializes the full un-deduped coverage corpus on tmpdir for the whole run, then amplifies it ~Nx across merge threads at the very end. On a RAM-backed /tmp that peak competes with the V8 heap and swap-thrashes into a hang; at large per-file coverage it OOM-crashes (SIGABRT) even on a healthy box. Replace the batch fan-out with a single long-lived merge worker_thread that consumes each per-file coverage path as it arrives, merges it incrementally, and unlinks the temp file immediately — so the on-disk corpus stays bounded and only one deduped map is ever resident (no Nx amplification). On istanbul this also keeps the host event loop free (eld p99 roughly halved) and removes the end-of-run merge tail. Gated on a provider capability flag (`coverageMergeWorkerStreaming`) so the batch-only v8 worker is never mis-driven into the streaming protocol (which would silently drop coverage); `RSTEST_COV_INGEST=batch` forces the end-of-run fan-out, `=stream` is explicit opt-in. Refs #1326
pkg.pr.new preview
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background
When coverage is enabled (the default forks pool), the worker pool doesn't stay fully busy. On CI, CPU usage plateaus well below the available cores even with many test files still waiting to run — the same suite without coverage uses all cores. Reported in #1326.
Why it happens
Each test file runs in its own worker and sends its coverage back to the main process when it finishes. The main process runs on a single event loop, and that one loop does two jobs at the same time: decoding and merging incoming coverage, and handing the next test file to each free worker.
As coverage piles up, decoding it crowds out the dispatch work — so workers that just finished sit idle waiting for their next file. The more files there are (and the larger the coverage), the worse the stall. That's why it shows up specifically when coverage is on, and only at scale.
The fix
Take coverage off the main process's critical path:
The coverage result is exactly the same as before — only when and where it gets computed changes. On a 500-file project this brought CPU utilization back from ~56% to ~80% and cut wall-clock time by ~30%.
Diagnostics
Running with
DEBUG=rstestnow prints the run's environment (cores, load, pool and coverage settings) plus event-loop and coverage-ingest timings, so a utilization problem can be pinned down from a single run.Closes #1326