Skip to content

fix(vps): complete remaining #451 hardening (hermes-claw swap, autoUpgrade=boot, deploy docs)#463

Merged
alexandru-savinov merged 3 commits into
mainfrom
fix/451-remaining-vps-hardening
Jun 5, 2026
Merged

fix(vps): complete remaining #451 hardening (hermes-claw swap, autoUpgrade=boot, deploy docs)#463
alexandru-savinov merged 3 commits into
mainfrom
fix/451-remaining-vps-hardening

Conversation

@alexandru-savinov

Copy link
Copy Markdown
Owner

Complete the remaining #451 remote-VPS hardening

#462 already shipped part of #451 (the --max-jobs 1 --cores 1 throttle on deploy.sh/install.sh and the zero-kuzea swapfile). This PR finishes the rest:

Commit #451 item Change
fix(hermes-claw) P1 Add a 4GB /swapfile — hermes-claw (CX33, GRUB) was the last VPS with no disk swap (the <8GB-GRUB build-OOM-brick profile). Mirrors sancta-choir.
fix(sancta-claw) P2 system.autoUpgrade.operation = "boot" — the nightly unattended upgrade ran the default GRUB-mutating switch; boot builds + sets the boot default without activating, so a bad generation only needs a reboot.
docs P3 CLAUDE.md gains the remote-VPS non-atomic-switch warning (build-gate + --max-jobs + boot, citing #252). sancta-choir's 6.6 kernel pin gets a documented rationale + exit criteria.

On the kernel-pin "audit/expire" item

I documented the pin's exit criteria rather than unpinning. The corruption was a one-time build-OOM artifact (the 6.12 kernel itself isn't broken), but I can't verify the VPS store state or test a 6.12 boot from this darwin host — and unpinning untested risks an unbootable headless box, which is exactly the failure #451 is about. The comment now records why it's pinned and the precise steps to safely remove it (GC corrupt paths → clean 6.12 build on-host → nixos-rebuild boot + reboot test).

Validation (darwin host — eval only)

  • nix eval …{hermes-claw,sancta-claw,sancta-choir}…toplevel.drvPath — all evaluate cleanly.
  • sancta-claw.config.system.autoUpgrade.operation == "boot"; hermes-claw swapfile present.
  • nixpkgs-fmt --check — clean on all 3 changed nix files.

Closes #451. (No code/secret changes beyond config + docs; CI's Trivy scan + aarch64/x86_64 eval are the gates.)

🤖 Generated with Claude Code

alexandru-savinov and others added 3 commits June 3, 2026 21:34
hermes-claw (CX33, GRUB) had no disk swap — the <8GB-GRUB profile #451 flags. Mirrors sancta-choir; no kernel pin (no corrupted-store history).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ch (#451)

The nightly unattended upgrade ran the default GRUB-mutating switch; operation=boot builds + sets the boot default without activating, so a bad generation only needs a reboot (#252).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…exit criteria (#451)

CLAUDE.md: add the remote-VPS non-atomic-switch warning (build-gate + --max-jobs + boot, cite #252). sancta-choir: document why the 6.6 pin exists and the exit criteria to unpin, instead of silent debt.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexandru-savinov alexandru-savinov added claude Created/proposed by Claude Code area:system Base OS / NixOS configuration reliability Reliability improvements labels Jun 3, 2026
@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

PR #463 Review — VPS Hardening (hermes-claw swap, autoUpgrade=boot, deploy docs)

Summary

This PR finishes the remaining #451 remote-VPS OOM-hardening items: 4 GB swapfile on hermes-claw, autoUpgrade.operation = "boot" on sancta-claw, and documentation updates (CLAUDE.md warning + sancta-choir kernel-pin exit criteria). All changes are defensive and well-reasoned.


Findings

# Severity File Finding
1 LOW hosts/sancta-claw/configuration.nix operation = "boot" + allowReboot = false means updates never self-activate — acceptable as documented, but security patches will silently queue until a manual reboot
2 LOW hosts/hermes-claw/configuration.nix Swapfile has no priority or options override — defaults are fine, just noting it
3 LOW hosts/sancta-choir/configuration.nix sancta-choir has no autoUpgrade.operation = "boot" equivalent — its upgrade path is manual, so this is intentional, but worth confirming
4 INFO CLAUDE.md Good addition. The deploy warning is accurate and directly actionable

Detailed Analysis

hermes-claw swapfile

Correct NixOS pattern. swapDevices = [{ device = "/swapfile"; size = 4096; }] instructs NixOS to create and activate the swapfile on first boot. 4 GB is appropriate for a CX33 box (4 GB RAM) and mirrors sancta-choir. Comment explains the rationale and explicitly opts out of a kernel pin with clear reasoning. No issues.

sancta-claw autoUpgrade operation

operation = "boot" is the correct mitigation for the GRUB-mutation risk on unattended remote boxes. The combination with allowReboot = false is intentional and documented.

LOW: The practical effect is that the nightly build sets a new boot default that is never activated until a human reboots. The running generation can drift indefinitely from the boot generation. This is the safer trade-off on a headless VPS, but a manual reboot cadence is now needed to actually ship updates (including security patches).

sancta-choir kernel pin comment

Pure documentation improvement. The three-condition exit criteria (GC corrupt paths → clean 6.12 build on-host → nixos-rebuild boot + reboot) are precise and correct. The rationale correctly scopes the corruption to a build-time OOM artifact, not a 6.12 kernel defect. No issues.

CLAUDE.md

Warning is accurate, cites the right scripts (deploy.sh / install.sh), and gives a concrete safe-deploy recipe. Recommends nixos-rebuild boot for risky changes — consistent with what this PR does for sancta-claw. No issues.


Verdict

PASS — No critical or high severity issues. All changes are defensive hardening with clear operational rationale. The LOW findings are known trade-offs, not bugs.

@alexandru-savinov alexandru-savinov merged commit f549c19 into main Jun 5, 2026
14 checks passed
@alexandru-savinov alexandru-savinov deleted the fix/451-remaining-vps-hardening branch June 5, 2026 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:system Base OS / NixOS configuration claude Created/proposed by Claude Code reliability Reliability improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(reliability): remote VPS nixos-rebuild switch is non-atomic — hermes-claw unprotected, --max-jobs throttle enforced nowhere

1 participant