Harden walkthrough CI tests: snapshots + fault-tolerant harness#79
Conversation
The nightly tests-walkthroughs job had never passed (42/42 red). It ran the CANlab_help_examples tutorials end-to-end via evalc() with no hardening, so the first orthviews/surface/prompt/missing-data error on the headless runner failed the whole test. Decouple the tests from the live tutorials and harden them: - Add verbatim snapshots of the 10 walkthroughs under walkthroughs/private/ (genpath-excluded, so they never shadow the real tutorials on CI). Refresh by overwriting from example_help_files/. - Add helpers/canlab_run_walkthrough_snapshot.m: runs a snapshot %%-cell by cell, headless, each cell in its own try/catch in a shared workspace, so a graphics-only section that fails on a headless runner does not abort the compute sections. - Add helpers/canlab_classify_environment_error.m: buckets caught errors into graphics / input / data / cascade / genuine (centralizes the heuristics previously inlined in canlab_test_help_examples). Environment buckets are skipped (Incomplete); only genuine errors fail, with an informative report naming the section, offending line, and error id. - Rewrite the 10 canlab_test_walkthrough_*.m wrappers to run their snapshot through the harness instead of evalc'ing the external script. - Add walkthroughs/README.md documenting the design, refresh process, and why these stay a separate nightly tier (~5 min local / ~15-20 min CI) rather than folding into the fast per-push suite. Full suite now: 10 passed, 0 failed, 0 incomplete. Two genuine tutorial bugs this surfaced (write-without-overwrite in walkthrough 3; dat_descrip -> metadata_table in 4b) are fixed in CANlab_help_examples and the snapshots refreshed accordingly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
First CI run of the hardened nightly surfaced 3 failures, all the same environment gap: read_nifti_volume falls back to niftiinfo (Image Processing Toolbox), which the runner did not provision (only Statistics + Signal were installed), so plot(obj) failed with Undefined function 'niftiinfo'. - Provision Image_Processing_Toolbox in tests-walkthroughs.yml so plot() and NIfTI I/O actually run on CI. - Safety net: canlab_classify_environment_error now buckets an UndefinedFunction error for a known optional-toolbox function (niftiinfo, niftiread, niftiwrite, cfg_getfile) as 'capability' -> skipped, not failed, so a missing toolbox can never spuriously redden the nightly. Genuine missing-function bugs still classify as 'genuine'. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1b0c935 to
0567b10
Compare
|
Rebased this branch onto current Why: the glm_map files here were an early snapshot of the same lineage that landed on Result: 24 files, purely the walkthrough CI hardening; 🤖 Generated with Claude Code |
Problem
The nightly
tests-walkthroughsjob has never passed (42/42 red). It ran the CANlab_help_examples tutorials end-to-end viaevalc(script)with no hardening, so the firstorthviews/surface/ interactive-prompt / missing-data error on the headless runner failed the entire test. The walkthroughs were doubling as unit tests without the headless-CI hardening the per-pushcanlab_test_help_examplessuite already has.Approach
Decouple the tests from the live tutorials and harden them:
walkthroughs/private/(genpath-excluded, so they never shadow the real tutorials when both repos are checked out on CI). Refresh by overwriting fromexample_help_files/.helpers/canlab_run_walkthrough_snapshot.m— runs a snapshot%%-cell by cell, headless, each cell in its owntry/catchin a shared workspace, so a graphics-only section that fails on a headless runner does not abort the compute sections.helpers/canlab_classify_environment_error.m— buckets caught errors intographics/input/data/cascade/genuine. Environment buckets are skipped (Incomplete); only genuine errors fail, with an informative report (section #, offending line, error id).canlab_test_walkthrough_*.mwrappers to run their snapshot through the harness.walkthroughs/README.mddocuments the design, refresh process, and cadence rationale.Cadence decision: keep as a separate nightly tier
Measured full-suite wall-time ~6.8 min on a fast workstation → est. ~15–20 min on the GitHub Linux runner. Folding that (plus graphics/data-dependent flakiness) into the fast per-push gate would slow every PR and make the required check unreliable. These stay nightly; the per-push
testssuite is unaffected (theis_walkthroughfilter excludes them by default).Result
Full suite: 10 passed, 0 failed, 0 incomplete. Two genuine tutorial bugs this surfaced are fixed in canlab/CANlab_help_examples#3 and the snapshots here refreshed accordingly:
write()without'overwrite'.dat_descrip→.metadata_table🤖 Generated with Claude Code