Skip to content

Harden walkthrough CI tests: snapshots + fault-tolerant harness#79

Merged
torwager merged 2 commits into
masterfrom
fix/ci-walkthroughs
Jun 26, 2026
Merged

Harden walkthrough CI tests: snapshots + fault-tolerant harness#79
torwager merged 2 commits into
masterfrom
fix/ci-walkthroughs

Conversation

@torwager

@torwager torwager commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Problem

The nightly tests-walkthroughs job has never passed (42/42 red). It ran the CANlab_help_examples tutorials end-to-end via evalc(script) with no hardening, so the first orthviews / surface / interactive-prompt / missing-data error on the headless runner failed the entire test. The walkthroughs were doubling as unit tests without the headless-CI hardening the per-push canlab_test_help_examples suite already has.

Approach

Decouple the tests from the live tutorials and harden them:

  • Verbatim snapshots of the 10 walkthroughs under walkthroughs/private/ (genpath-excluded, so they never shadow the real tutorials when both repos are checked out on CI). Refresh by overwriting from example_help_files/.
  • helpers/canlab_run_walkthrough_snapshot.m — runs a snapshot %%-cell by cell, headless, each cell in its own try/catch in a shared workspace, so a graphics-only section that fails on a headless runner does not abort the compute sections.
  • helpers/canlab_classify_environment_error.m — buckets caught errors into graphics / input / data / cascade / genuine. Environment buckets are skipped (Incomplete); only genuine errors fail, with an informative report (section #, offending line, error id).
  • Rewrote the 10 canlab_test_walkthrough_*.m wrappers to run their snapshot through the harness.
  • walkthroughs/README.md documents the design, refresh process, and cadence rationale.

Cadence decision: keep as a separate nightly tier

Measured full-suite wall-time ~6.8 min on a fast workstation → est. ~15–20 min on the GitHub Linux runner. Folding that (plus graphics/data-dependent flakiness) into the fast per-push gate would slow every PR and make the required check unreliable. These stay nightly; the per-push tests suite is unaffected (the is_walkthrough filter excludes them by default).

Result

Full suite: 10 passed, 0 failed, 0 incomplete. Two genuine tutorial bugs this surfaced are fixed in canlab/CANlab_help_examples#3 and the snapshots here refreshed accordingly:

  • walkthrough 3: write() without 'overwrite'
  • walkthrough 4b: .dat_descrip.metadata_table

🤖 Generated with Claude Code

@torwager torwager left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved

torwager and others added 2 commits June 25, 2026 22:22
The nightly tests-walkthroughs job had never passed (42/42 red). It ran the
CANlab_help_examples tutorials end-to-end via evalc() with no hardening, so
the first orthviews/surface/prompt/missing-data error on the headless runner
failed the whole test.

Decouple the tests from the live tutorials and harden them:

- Add verbatim snapshots of the 10 walkthroughs under walkthroughs/private/
  (genpath-excluded, so they never shadow the real tutorials on CI). Refresh
  by overwriting from example_help_files/.
- Add helpers/canlab_run_walkthrough_snapshot.m: runs a snapshot %%-cell by
  cell, headless, each cell in its own try/catch in a shared workspace, so a
  graphics-only section that fails on a headless runner does not abort the
  compute sections.
- Add helpers/canlab_classify_environment_error.m: buckets caught errors into
  graphics / input / data / cascade / genuine (centralizes the heuristics
  previously inlined in canlab_test_help_examples). Environment buckets are
  skipped (Incomplete); only genuine errors fail, with an informative report
  naming the section, offending line, and error id.
- Rewrite the 10 canlab_test_walkthrough_*.m wrappers to run their snapshot
  through the harness instead of evalc'ing the external script.
- Add walkthroughs/README.md documenting the design, refresh process, and why
  these stay a separate nightly tier (~5 min local / ~15-20 min CI) rather
  than folding into the fast per-push suite.

Full suite now: 10 passed, 0 failed, 0 incomplete. Two genuine tutorial bugs
this surfaced (write-without-overwrite in walkthrough 3; dat_descrip ->
metadata_table in 4b) are fixed in CANlab_help_examples and the snapshots
refreshed accordingly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
First CI run of the hardened nightly surfaced 3 failures, all the same
environment gap: read_nifti_volume falls back to niftiinfo (Image Processing
Toolbox), which the runner did not provision (only Statistics + Signal were
installed), so plot(obj) failed with Undefined function 'niftiinfo'.

- Provision Image_Processing_Toolbox in tests-walkthroughs.yml so plot() and
  NIfTI I/O actually run on CI.
- Safety net: canlab_classify_environment_error now buckets an
  UndefinedFunction error for a known optional-toolbox function (niftiinfo,
  niftiread, niftiwrite, cfg_getfile) as 'capability' -> skipped, not failed,
  so a missing toolbox can never spuriously redden the nightly. Genuine
  missing-function bugs still classify as 'genuine'.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@torwager torwager force-pushed the fix/ci-walkthroughs branch from 1b0c935 to 0567b10 Compare June 26, 2026 13:55
@torwager

Copy link
Copy Markdown
Contributor Author

Rebased this branch onto current master and dropped the @glm_map/ scaffold commits, leaving a focused CI-only PR.

Why: the glm_map files here were an early snapshot of the same lineage that landed on master via #77 (master's glm_map.m is 740 lines vs 446 here, and master has run_diagnostics, create_orthogonal_contrast_set, import_onsets, replace_basis_set, validate_object, plus docs/glm_map_methods.md). Keeping them would have re-introduced the superseded diagnostics.m and clashed with master. The walkthrough/CI files have no glm_map dependency, so they separate cleanly.

Result: 24 files, purely the walkthrough CI hardening; @glm_map/ is now byte-identical to master. Verified locally (R2026a + SPM25): all 10 walkthrough tests pass, and the fault-tolerant harness correctly degrades 4 graphics-only sections to skips instead of failing.

🤖 Generated with Claude Code

@torwager torwager merged commit 48cf89b into master Jun 26, 2026
1 check passed
@torwager torwager deleted the fix/ci-walkthroughs branch June 26, 2026 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant