IxVM kernel: unblock UTF-8 decode/encode proof + Nat-layer FFT cuts by arthurpaulino · Pull Request #450 · argumentcomputer/ix

arthurpaulino · 2026-06-19T12:30:15Z

Headline

Aiur kernel previously OOM'd on
_private.Init.Data.String.Decode.0.ByteArray.utf8DecodeChar?.assemble₄_eq_some_of_toBitVec._proof_1_8.
This constant is a prerequisite of ByteArray.utf8DecodeChar?_utf8EncodeChar_append
(part of the UTF-8 round-trip lemma chain).

Now it typechecks at 37.31B FFT (down from the OOM baseline). The
two-direction lemma …_utf8EncodeChar_append itself remains
out-of-reach (still OOMs further along the same dependency tree under
the user's memory cap), but the prerequisite that was the immediate
blocker is unstuck.

The same kernel changes also deliver large FFT cuts on previously-
expensive targets — Tier 1d's structural short-circuit replaces full
delta-whnf cascades wherever it fires:

Constant	Before	After	Δ
`Vector.append`	4.02B	3.16B	−21.4%
`Array.append_assoc`	3.94B	3.08B	−21.8%

Commits

1. `a6cd34a` — Tier 1d def-eq short-circuit (the unblock)

Aiur's k_is_def_eq_core jumped from Tier 1.5 straight to full delta
WHNF (Tier 2). The cascading Nat.rec / Nat.succ iota expansions then
drove whnf_const_head past 1M unique entries before either side
reached a comparable canonical form — OOM. Rust's def-eq settles the
same pair via no-delta whnf + quick structural recursion before any
of that fires.

Three minimum-necessary pieces ported. No KStore. No FVar. No Subst /
KernelTypes change.

whnf_nd family (Whnf.lean, mirror Rust
whnf_no_delta_for_def_eq). Same dispatch tree as whnf, but
whnf_nd_const_head's Defn arm falls through to a stuck
apply_spine instead of delta-unfolding. Iota / proj / quot /
primitives still fire.
k_infer_only family (Infer.lean, mirror Rust
with_infer_only). App drops k_check(a, dom); Lam drops
k_ensure_sort(ty); Let drops val/ty validation. Used at
try_proof_irrel, is_prop_type, try_unit_like — the def-eq
tactics that only need the synthesized type.
k_is_def_eq_struct_safe + Tier 1d wiring (DefEq.lean,
mirror Rust quick_def_eq + post-try_def_eq_app). Sort-Sort via
level_equal; Lam-Lam / All-All via recursive k_is_def_eq on the
type and on the body under Cons(ty_a, types) (types-cons, NOT
FVar opening). Inserted between Tier 1c (string lit) and Tier 2
(full whnf):
```
aw_nd = whnf_nd(a); bw_nd = whnf_nd(b)
ptr_eq(aw_nd, bw_nd) → 1
k_is_def_eq_struct_safe(aw_nd, bw_nd) → 1
try_lazy_delta_app(aw_nd, bw_nd) → 1     (rerun: spine args may have
  reduced past what Tier 1.5's pre-whnf attempt could see)
```

Each piece independently validated necessary. The previously-tried
KStore explicit caches, FVar variant + opens, FVar-based binder
opening — all confirmed NOT necessary for this unblock and left out.

3 files changed, +296/−6 lines.

2. `36d6c3a` — drop `g_or` from `u64_sub_with_borrow`

u64_sub_with_borrow combined two per-byte borrow bits with g_or.
The two bits are mutually exclusive: u_t = 1 ⇒ intermediate t_i ≥ 1 ⇒ subtracting br_in ∈ {0,1} cannot underflow ⇒ u_r = 0. Field
+ substitutes for g_or directly. Per Aiur cost model g_or adds
+1 aux + 1 lookup per call (≈ 5 width); field + is free. 7 g_ors ×
2.23M rows.

UTF-8 _proof_1_8: 39.12B → 38.14B (−2.6%).

3. `9e787da` — drop `g_or` from `klimbs_add_carry` / `klimbs_sub_borrow`

Same mutually-exclusive-carry pattern. Two limb-level borrows /
carries from sequential u64 ops cannot both be 1.

UTF-8 _proof_1_8: 38.14B → 38.07B (−0.18%).

4. `4e379e7` — hot/cold split `try_nat_dispatch`, extract binop arm

try_nat_dispatch's width 90 was floored by the binop arm (2× whnf

2× try_extract_nat + try_nat_binop_addr + apply_spine), charged on
every Nat.succ / Nat.pred row. Factor binop dispatch into
try_nat_binop_dispatch. Main narrows to the max of succ / pred
arms.

UTF-8 _proof_1_8: 38.07B → 37.80B (−0.7%).

5. `80ce3d2` — hot/cold split `expr_lbr`, extract Let arm

expr_lbr's width was floored by the Let arm (3 recursive expr_lbr
calls + 2 lbr_max + 1 lbr_dec), charged on every row even though Let
is rare. Factor into expr_lbr_let.

Nat.add_comm: 55.63M → 55.50M (−0.2%). UTF-8 _proof_1_8: 37.80B
→ 37.62B (−0.5%).

6. `1f8effd` — hot/cold split `try_extract_nat`, extract App arm

try_extract_nat's width was floored by the App arm (list_lookup +
address_eq + recursive try_extract_nat + klimbs_succ). Factor into
try_extract_nat_app. Main narrows to leaf-arm width.

UTF-8 _proof_1_8: 37.62B → 37.31B (−0.8%).

7. `039e9cf` — re-pin IxVM FFT costs

41 pins in Tests/Ix/IxVM.lean::kernelCheckEntries updated. Every
constant got cheaper; none regressed. Largest reductions:

Vector.append: 4.02B → 3.16B (−21.4%)
Array.append_assoc: 3.94B → 3.08B (−21.8%)

lake test -- --ignored ixvm passes with 0 FFT mismatches.

Cumulative on UTF-8 `_proof_1_8`

Commit	FFT
baseline (`main`)	OOM
`a6cd34a` Tier 1d	39.12B
`36d6c3a` g_or → + in `u64_sub_with_borrow`	38.14B (−2.6%)
`9e787da` g_or → + in `klimbs_add_carry` / `klimbs_sub_borrow`	38.07B (−0.18%)
`4e379e7` hot/cold `try_nat_dispatch`	37.80B (−0.7%)
`80ce3d2` hot/cold `expr_lbr`	37.62B (−0.5%)
`1f8effd` hot/cold `try_extract_nat`	37.31B (−0.8%)

Post-unlock optimization: −4.6% (39.12B → 37.31B).

Cost on small targets

Main baseline Nat.add_comm: 56.08M FFT.
This branch Nat.add_comm: 55.50M FFT.

Tier 1d itself adds no overhead on the common case (whnf_nd +
struct_safe + try_lazy_delta_app are themselves Aiur-memoized); the
follow-up optimizations are net wins.

Test plan

lake exe check Nat.add_comm passes (55.50M FFT).
lake exe check Vector.extract_append passes.
lake exe check "_private.…assemble₄_eq_some_of_toBitVec._proof_1_8"
passes (37.31B FFT, previously OOM).
lake test -- --ignored ixvm — all 41 FFT pins updated, suite
passes with 0 mismatches.

…ost-spine-congruence) UTF-8 `_private.Init.Data.String.Decode.0.ByteArray.utf8DecodeChar? .assemble₄_eq_some_of_toBitVec._proof_1_8` OOMs on the previous pipeline because `k_is_def_eq_core` jumps from Tier 1.5 straight to full delta WHNF (Tier 2); the cascading Nat.rec / Nat.succ iota expansions then drive `whnf_const_head` past 1M unique entries before either side reaches a comparable canonical form. Rust's def-eq settles the same pair via the no-delta whnf + quick structural recursion before any of that fires. This patch ports the three pieces of that short-circuit and nothing else — no FVar variant, no KStore, no Subst changes, no signature sweep. * `whnf_nd` family in `Whnf.lean` (mirror Rust `whnf_no_delta_for_def_eq`). Same dispatch tree as `whnf` (beta / let zeta / iota / proj / quot / primitives all fire), except `whnf_nd_const_head`'s Defn arm falls through to a stuck `apply_spine` instead of delta-unfolding. * `k_infer_only` family in `Infer.lean` (mirror Rust `with_infer_only`). App drops `k_check(a, dom)`; Lam drops `k_ensure_sort(ty)`; Let drops the val/ty validations. Distinct Aiur memo from `k_infer`, parity with Rust's separate `infer_cache` / `infer_only_cache`. * `k_is_def_eq_struct_safe` in `DefEq.lean` (mirror Rust `quick_def_eq`). Sort-Sort via `level_equal`; Lam-Lam / All-All recurse on type and on body under `Cons(ty_a, types)`. Returns 1 only when DEFINITELY def-eq; 0 means fall through. Sound on partially-whnf'd (no-delta) inputs because the handled shapes don't depend on further reductions. * `k_is_def_eq_core` Tier 1d wiring inserted between Tier 1c (string lit) and Tier 2 (full whnf): aw_nd = whnf_nd(a); bw_nd = whnf_nd(b) ptr_eq(aw_nd, bw_nd) → 1 k_is_def_eq_struct_safe(aw_nd, bw_nd) → 1 if 1 try_lazy_delta_app(aw_nd, bw_nd) → 1 if 1 (rerun post-whnf_nd: spine args may have reduced past what Tier 1.5's pre-whnf attempt could see, exposing Const-Const congruence that was hidden) * `try_proof_irrel`, `is_prop_type`, `try_unit_like` switch from `k_infer` to `k_infer_only` — these helpers only need the synthesized type, not the full re-validation work that `k_infer` does for each recursive App/Let/Lam. Each piece individually validated necessary (removing it puts UTF-8 back into the OOM regime). FVar variant + opens, KStore explicit caches, infer_only's FVar-based binder opening — all confirmed NOT necessary for the UTF-8 unblock and left out (see PLAN.md for future experiments). Measured (FFT cost): Nat.add_comm: 56.08M → 55.63M (~stable; new code paths add no overhead on the common case because Tier 1d's whnf_nd + struct_safe are themselves Aiur-memoized). _private.Init.Data.String.Decode.0.ByteArray.utf8DecodeChar? .assemble₄_eq_some_of_toBitVec._proof_1_8: OOM → 39.12B FFT, passes. 3 files, +296/-6 lines.

`u64_sub_with_borrow` combines two per-byte borrow bits with `g_or`. The two bits are MUTUALLY EXCLUSIVE: `u_t = borrow(a_i - b_i)` and `u_r = borrow((a_i + 256 - b_i) - br_in)`. If `u_t = 1` the intermediate `t_i ≥ 1`, so subtracting `br_in ∈ {0,1}` cannot underflow ⇒ `u_r = 0`. Field `+` substitutes for `g_or` directly (per the same pattern as `u64_add` in `ByteStream.lean`). Per Aiur cost model, `g_or` adds +1 aux + 1 lookup per call; field `+` is free. 8 g_or call sites in `u64_sub_with_borrow` each charged on every one of the function's 2.23M rows. Measured (FFT cost) on UTF-8 `_proof_1_8`: 39.12B → 38.14B (-2.6%) Nat.add_comm unchanged (55.63M). See [[reference_aiur_carry_add]].

Same mutually-exclusive-carry pattern as `u64_sub_with_borrow`: * `klimbs_add_carry`: u64_add of (la, lb) yields carry1; u64_add of (sum1, carry_in) yields carry2. carry1=1 ⇒ sum1 ≤ 2^64-2 ⇒ carry2=0. * `klimbs_sub_borrow`: symmetric for borrows. Replace `g_or(c1, c2)` with `c1 + c2` (field +). Both helpers run on hot Nat-primitive paths. Measured on UTF-8 `_proof_1_8`: 38.14B → 38.07B (-0.18%) Nat.add_comm unchanged. See [[reference_aiur_carry_add]].

`try_nat_dispatch` ran 1.12M rows in UTF-8 `_proof_1_8` at width 90, charging 5.16% of total FFT. Width was floored by its widest match arm (the binop branch with 2× whnf + 2× try_extract_nat + try_nat_binop_addr + apply_spine), even on Nat.succ / Nat.pred rows that never touched it. Factor binop dispatch into its own `try_nat_binop_dispatch` fn. Main dispatcher narrows to the max of succ / pred arms (single whnf + try_extract_nat + klimbs_succ/dec + apply_spine). The cold fn's width only charges the rows that actually dispatch a binop. Measured on UTF-8 `_proof_1_8`: 38.07B → 37.80B (-0.7%) Nat.add_comm unchanged.

`expr_lbr` ran 1.47M rows in UTF-8 `_proof_1_8` at width 39, charging 3.01% of total FFT. The Let arm (3 recursive expr_lbr calls + 2 lbr_max + 1 lbr_dec) is the widest match arm, charged on every row of expr_lbr even though Let is rare in most expressions encountered. Factor the Let arm into `expr_lbr_let(ty, val, body)`. Main expr_lbr narrows to max of the 2-recursion arms (App / Lam / Forall). Cold fn only charges Let-arm rows. Measured: Nat.add_comm: 55.63M → 55.50M (-0.2%) UTF-8 `_proof_1_8`: 37.80B → 37.62B (-0.5%)

`try_extract_nat` ran 1.12M rows at width 45, charging 2.68% of UTF-8 `_proof_1_8` total FFT. The App arm (list_lookup + address_eq + recursive try_extract_nat + klimbs_succ) is the widest match arm; the Lit / Const / default arms are leaf compares. Factor App into `try_extract_nat_app(f, a, addrs)`. Main extractor narrows to leaf-arm width. Cold fn only charges App-arm rows. Measured on UTF-8 `_proof_1_8`: 37.62B → 37.31B (-0.8%) Nat.add_comm unchanged.

Updates 41 pinned FFT costs in `Tests/Ix/IxVM.lean::kernelCheckEntries` to match the new kernel's output. All pins moved DOWN — every constant got cheaper, none regressed. Largest reductions (% change): Vector.append: 4_023_268_168 → 3_160_970_390 (-21.4%) Array.append_assoc: 3_938_574_533 → 3_079_334_815 (-21.8%) String.Internal.append: 793_580_333 → 775_968_134 ( -2.2%) bv_to_nat_lit: 635_780_327 → 619_870_154 ( -2.5%) nat_gcd_lit: 665_518_356 → 649_859_784 ( -2.4%) Nat.sub_le_of_le_add: 567_575_653 → 557_867_526 ( -1.7%) IxVMPrim.nat_mod_lit: 414_695_549 → 407_517_834 ( -1.7%) IxVMPrim.nat_div_lit: 405_607_545 → 398_641_590 ( -1.7%) IxVMPrim.nat_shr_lit: 411_128_901 → 404_158_486 ( -1.7%) Nat.decLe: 209_641_496 → 206_196_563 ( -1.6%) Nat.add_comm: 56_084_908 → 55_504_714 ( -1.0%) `lake test -- --ignored ixvm` passes with 0 FFT mismatches.

arthurpaulino added 7 commits June 19, 2026 12:19

arthurpaulino force-pushed the ap/utf8-tier-1d branch from 039e9cf to 07712ff Compare June 19, 2026 15:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IxVM kernel: unblock UTF-8 decode/encode proof + Nat-layer FFT cuts#450

IxVM kernel: unblock UTF-8 decode/encode proof + Nat-layer FFT cuts#450
arthurpaulino wants to merge 7 commits into
mainfrom
ap/utf8-tier-1d

arthurpaulino commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

arthurpaulino commented Jun 19, 2026

Headline

Commits

1. a6cd34a — Tier 1d def-eq short-circuit (the unblock)

2. 36d6c3a — drop g_or from u64_sub_with_borrow

3. 9e787da — drop g_or from klimbs_add_carry / klimbs_sub_borrow

4. 4e379e7 — hot/cold split try_nat_dispatch, extract binop arm

5. 80ce3d2 — hot/cold split expr_lbr, extract Let arm

6. 1f8effd — hot/cold split try_extract_nat, extract App arm

7. 039e9cf — re-pin IxVM FFT costs

Cumulative on UTF-8 _proof_1_8

Cost on small targets

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. `a6cd34a` — Tier 1d def-eq short-circuit (the unblock)

2. `36d6c3a` — drop `g_or` from `u64_sub_with_borrow`

3. `9e787da` — drop `g_or` from `klimbs_add_carry` / `klimbs_sub_borrow`

4. `4e379e7` — hot/cold split `try_nat_dispatch`, extract binop arm

5. `80ce3d2` — hot/cold split `expr_lbr`, extract Let arm

6. `1f8effd` — hot/cold split `try_extract_nat`, extract App arm

7. `039e9cf` — re-pin IxVM FFT costs

Cumulative on UTF-8 `_proof_1_8`