STOK-Core implements a reward-free reinforcement learning framework where policies optimize State-Time Option Kernels (STOKs)—full probability distributions over goal-success and constraint-violation events—instead of scalar value functions.
A working reference implementation of the State-Time Option Kernel core from
Ringstrom & Schrater (2025), built to demonstrate that the Option Kernel
Bellman Equation (OKBE) formulation works as a reward-free learning primitive
on general well-formed inputs. Per-item paper-parity verdicts in
docs/assessments/LOCAL_PDF_PARITY_REPORT.md;
active scope in
docs/dev/PAPER_PDF_TASKBOARD.md.
Ringstrom, T., & Schrater, P. (2025). A Unified Theory of Compositionality, Modularity, and Interpretability in Markov Decision Processes. arXiv:2506.09499 [cs.LG]
STOK-Core is the kernel layer: the OKBE / STOK / SOK / Goal Kernel / Plan Kernel mathematics, plus the Algorithm 1 (feasibility iteration) and Algorithm 2 (BFS option-sequence search) procedures that compose them.
It is not an agent, and it is not the broader compositional architecture that turns these kernels into a self-sustaining entity (motivation construction, value-of-key, the autopoietic / homeostatic content from Ringstrom's 2023 thesis). The paper this implementation tracks is explicit about that: it formalizes the kernel-layer compositionality, factorization, and verifiability results; the agent-layer thesis content is referenced but not contributed by the paper itself.
What you can do with this code today:
- solve a Task MDP for
(κ*, π**, η⁺, η⁻)per Algorithm 1, - compose STOKs / SOKs via Chapman-Kolmogorov,
- factorize a high-dimensional product-space STOK per Theorem 2.1,
- compute sublimated feasibility bounds per Theorem 2.4 for tree-search pruning,
- assemble a paper-faithful Goal Kernel / Plan Kernel and run Algorithm 2 over composite states.
What is out of scope (post-paper, separate document
docs/EXTENSION_DISCUSSION.md):
- empowerment as motivation; preference construction; intrinsic value,
- real-time / online runtime (STOK-RT),
- continuous state-spaces, sparse tensors, neural-net integration,
- agent identity / homeostasis / the rest of the thesis.
STOK-Core replaces scalar value functions with State-Time Option Kernels — full probability distributions over goal-success and constraint-violation events. This is what makes the framework reward-free: policies optimize the probability of named events occurring at named times, not the expectation of an unnamed scalar.
5357322 (feat: paper-parity audit M0-M7; general factorization + Algorithm 2)
Compositional: STOKs compose exactly via Chapman-Kolmogorov equations, enabling modular planning.
Verifiable: STOKs record probabilities of semantically interpretable events (goal satisfaction, constraint violation) needed for formal verification.
Scalable: Product-space STOKs factorize into low-dimensional components, avoiding the curse of dimensionality of high-dimensional Bellman backups.
<<<<<<< HEAD
-
✅ All 4 theorems validated with working examples
-
✅ 220+ tests passing (100% success rate, zero failures)
-
✅ GPU-accelerated via Burn ML framework ======= The audit's purpose was to make the kernel-layer demonstration credible — to ensure that the OKBE / STOK mathematics matches the paper item-by-item on general well-formed inputs.
-
✅ All 4 theorems implemented as first-class APIs (
build_state_option_set/build_affordance_option_setfor Theorems 2.2/2.3; full Eq [23] factorization with TEF for Theorem 2.1; sublimation generic across HL cardinalities for Theorem 2.4). -
✅ Algorithm 1 + Algorithm 2 paper-faithful with composite search nodes and sublimation pruning.
-
✅ 280+ tests passing (0 failing) across inline + 6 integration test files, including direct-DP cross-checks against the factorization on small problems. See
docs/assessments/VERIFICATION_RECORD.md. -
✅ 86–5,070× memory reductions demonstrated in examples (Theorem 2.1 in action on real product spaces).
-
✅ GPU-accelerated via Burn ML framework.
For "paper-faithful per item, with what caveats," walk
docs/assessments/LOCAL_PDF_PARITY_REPORT.md
against the PDF.
5357322 (feat: paper-parity audit M0-M7; general factorization + Algorithm 2)
[dependencies]
stok-core = { git = "https://github.com/Hyphaeic/stok-core" }
burn = { version = "0.19", features = ["wgpu"] }use stok_core::prelude::*;
fn main() -> Result<(), StokError> {
let device = default_device();
// Define a simple navigation task
let mdp = TaskMDP::simple_chain(10, 20, &device);
// Solve for state-time option kernel
let stok = solve_task_mdp(&mdp)?;
// Query cumulative feasibility
println!("κ(state_0) = {:.3}", stok.kappa[0]);
// All termination events sum to 1
assert!(stok.validate().is_ok());
Ok(())
}Run these to see the framework in action:
cargo run --release --example honey_badger_2spaceDemonstrates: 2-space product (Grid × Hydration), 86× memory reduction
Output:
Product space: 25 × 10 = 250 states
Factorized: 25² × T + 10² × T = 725×T values
Full product: 250² × T = 62,500×T values
Reduction: 86× ✅
cargo run --release --example temperature_regulationDemonstrates: Region-based default dynamics, homeostatic planning, 93× memory reduction
Output:
Hot/Cold regions with temperature regulation
Agent must alternate regions to avoid death
All trajectories: SUCCESS (reaches goal with safe temperature)
cargo run --release --example logic_task_sublimationDemonstrates: Precedence constraints (D ≺ E, D ≺ F), abstract feasibility bounds
Output:
Sublimated feasibility κ*_sub,σ computed
Invalid states (honey before key): κ*_sub = 0.000 🔴 PRUNE
Valid states: κ*_sub = 1.000 ✅ FEASIBLE
Pruning efficiency: 37.5% of branches eliminated
cargo run --release --example honey_badger_3spaceDemonstrates: Grid × Hydration × Logic (2,000 states), 5,070× memory reduction!
Output:
3-space product: 25 × 10 × 8 = 2,000 states
Factorized representation: (25² + 10² + 8²) × T
Memory reduction: 5,070× 🚀
✅ ALL 4 THEOREMS VALIDATED
Four coupled equations that optimize cumulative feasibility instead of cumulative reward:
κ-OKBE (Equation 7): Maximize goal-completion probability
κ*(x) = max_a [f₁(x,a) + f₂(x,a) Σ P(x'|x,a) κ*(x')]
π-OKBE (Equation 8): Minimize expected time among κ-optimal actions
η-OKBEs (Equations 9-12): Construct termination distributions
η**(x_f, t_f | x_i) = P(terminate at x_f, time t_f | start at x_i)
Result: A STOK kernel with properties:
- ✅ Σ_{x_f, t_f} η**(x_f, t_f | x_i) = 1 (proper distribution)
- ✅ κ(x) = Σ_{x_f, t_f} η⁺(x_f, t_f | x) (consistency)
- ✅ Composes exactly via Chapman-Kolmogorov
Chain options o₁ ∘ o₂ by convolving their STOKs:
let composed = compose_stoks(&stok1, &stok2)?;
// Time convolution: t_μ = t_1 + t_2
// State marginalization: Σ_{x_f1} η₁(x_f1, t_1) · η₂(x_μ, t_μ - t_1 | x_f1)Validated: Composition preserves normalization, satisfies associativity
Avoid curse of dimensionality in product spaces S = X × Z₁ × ... × Zₙ:
// Instead of computing η̃ on |S|² × T (intractable)
// Factor into base + high-level components:
η̃(z_f, x_f, t_f | z, x) = ξ(t_f|z,x) · ρ_π(x_f|x,t_f) · ∏_k ρ_k(z_{k,f}|z_k,t_f)
let factorized = assemble_factorized_stok(
base_stok, // Solve on X only
hl_dynamics, // Each Z_k kernel
default_actions, // Default α^ℓ per space
product_dims,
)?;Memory: O(|X|² + Σ_k |Z_k|²) instead of O(∏_k |Z_k|² × |X|²)
Validated: 86-5,070× reductions in practice
Solve high-level-only problems to prune tree search:
let sublimated = SublimatedTMDP::new(P_logic, goal_logic, constraints_logic, 30)?;
let kappa_sub = sublimated.solve()?;
// Bound: κ̃*(σ,z,x) ≤ κ*_sub,σ(σ)
// If κ*_sub,σ(σ) = 0 → prune all (z,x) with that σValidated: Bounds hold, pruning eliminates 37.5% of search nodes in examples
Environment dynamics change based on high-level state:
// Key unlocks mountain door
let door_mode = KeyDoorMode::new(logic_space_id, key_bit_index);
// Before key: P_x(·|·,·,e_closed) blocks passage
// After key: P_x(·|·,·,e_open) allows passageValidated: Temperature regulation and honey_badger_3space demonstrate mode switching
| Paper Eq | Description | Implementation | Tests |
|---|---|---|---|
| [7] | κ-OKBE (cumulative feasibility) | solver/bellman.rs |
✅ 20 |
| [8] | π-OKBE (time-minimizing policy) | solver/feasibility_iteration.rs |
✅ 10 |
| [9-12] | η-OKBEs (STOK construction) | solver/stok_construction.rs |
✅ 15 |
| [15-17] | κ-η consistency relations | stok/kernel.rs |
✅ 8 |
| [18] | STOK composition | composition/chapman_kolmogorov.rs |
✅ 10 |
| [19] | SOK composition | composition/chapman_kolmogorov.rs |
✅ 9 |
| [23] | STOK factorization | hierarchy/factorization.rs |
✅ 27 |
| [24] | Goal kernel | planning/goal_kernel.rs |
✅ 26 |
| Theorem | Statement | Example | Status |
|---|---|---|---|
| 2.1 | STOK Decomposition | honey_badger_{2,3}space | ✅ Validated |
| 2.2 | State-Action Option Set | tree_search.rs | ✅ Implemented |
| 2.3 | Affordance Option Set | affordance.rs | ✅ Implemented |
| 2.4 | Sublimation Bound: κ̃* ≤ κ*_sub | logic_task_sublimation | ✅ Validated |
Generate complete API documentation:
cargo doc --openThe rustdoc includes:
- Detailed module documentation
- Equation references from the paper
- Usage examples
- Mathematical background
cargo testCurrent Status: 220+ tests, 100% passing, 0 failures
Coverage:
- ✅ All equation implementations
- ✅ All mathematical invariants (normalization, consistency, bounds)
- ✅ Composition properties (associativity, identity)
- ✅ Product-space factorization
- ✅ Mode switching
- ✅ Sublimation bounds
cargo benchPerformance Targets (100 states):
- Bellman backup: <1ms ✅
- Full convergence: <100ms ✅
- STOK composition: <10ms ✅
stok-core/
├── mdp/ # Task MDP definition
├── solver/ # OKBEs and feasibility iteration
├── stok/ # STOK kernel data structures
├── composition/ # Chapman-Kolmogorov composition
├── planning/ # Goal kernel and tree search
├── prediction/ # SPK, TEF, CEF prediction kernels
└── hierarchy/ # Product-space, affordances, factorization
├── product_space.rs # ProductState representation
├── affordance.rs # Affordance functions (Def. 0.1)
├── factorization.rs # STOK factorization (Thm 2.1)
├── modes.rs # Mode functions ζ: Z → E
└── sublimation.rs # Sublimation (Thm 2.4)
TaskMDP → solve_task_mdp() → STOKKernel
↓
compose_stoks() → ComposedSTOK
↓
GoalKernel → tree_search() → Plan
↓
simulate_plan() → Trajectory
For product spaces (S = X × Z):
Base STOK + HL SPKs → assemble_factorized_stok() → FactorizedSTOK
↓
evaluate()
sample()
STOKs compose exactly via Chapman-Kolmogorov:
- Chain options:
μ = o₁ ∘ o₂ ∘ ... ∘ oₙ - Exact predictions: Time convolution + state marginalization
- Normalization preserved: Σ η** = 1
Goals (f_g) and constraints (f_c) are separated:
- Goal: Where to go
- Constraint: Where NOT to go
- Achievement:
f₁ = f_g · f_c - Continuation:
f₂ = (1-f_g) · f_c
No reward shaping needed!
Avoid exponential blow-up in product spaces:
- Full product: O(|X|² × ∏ |Z_k|² × T) — intractable
- Factorized: O(|X|² × T + Σ |Z_k|² × T) — tractable
- Reductions: 86-5,070× demonstrated
Query specific events:
// Probability of success at state 10, time 5
let prob = stok.eta_plus.slice([initial, 10, 5]).elem();
// Expected termination time
let expected_t = compute_expected_time(&stok, initial);
// Risk analysis
let failure_prob = stok.infeasibility(state);Backend-agnostic via Burn framework:
- WGPU (default): Cross-platform GPU support
- CPU: Fallback for testing
- CUDA (optional): NVIDIA GPU acceleration
Navigate a gridworld while managing hydration:
- Product space: Grid (25 states) × Hydration (10 levels) = 250 states
- Affordance: Drinking at lake → restore hydration
- Challenge: Reach friend before dying of thirst
Key Result: 86× memory reduction via factorization
cargo run --release --example honey_badger_2spaceTravel between hot/cold regions to maintain safe temperature:
- Product space: Grid (20 states) × Temperature (11 levels) = 220 states
- Mode switching: Region determines warming/cooling dynamics
- Challenge: Avoid freezing or overheating
Key Result: 93× memory reduction, homeostatic control validated
cargo run --release --example temperature_regulationComplete tasks with precedence constraints:
- Logic space: 3 binary flags (D, E, F) = 8 states
- Constraints: Must complete D before E or F
- Sublimation: Solve logic-only problem first, use for pruning
Key Result: 37.5% tree search pruning via abstract feasibility
cargo run --release --example logic_task_sublimationUltimate integration: Navigate, manage hydration, complete tasks, unlock doors:
- Product space: Grid (25) × Hydration (10) × Logic (8) = 2,000 states
- All features: Factorization, modes, sublimation, affordances
- All theorems: 2.1, 2.2, 2.3, 2.4 demonstrated together
Key Result: 5,070× memory reduction!
cargo run --release --example honey_badger_3space| Operation | Target | Achieved | Scale |
|---|---|---|---|
| Bellman backup | <1ms | ✅ ~0.5ms | 100 states |
| Full convergence | <100ms | ✅ ~50ms | 100 states |
| SOK composition | <10ms | ✅ ~5ms | 100 states |
| Problem | Product Size | Full Tensor | Factorized | Reduction |
|---|---|---|---|---|
| 2-space | 250 | 62,500×T | 725×T | 86× |
| Temperature | 220 | 48,400×T | 521×T | 93× |
| 3-space | 2,000 | 6.4M×T | 12,814×T | 500× |
| honey_badger (full) | 8,000 | ~100M×T | ~20K×T | 5,000× |
Theorem 2.1 validated: Factorization scales exponentially better!
Section 1 (Core Theory):
- ✅ Equations [7-12]: OKBEs
- ✅ Equations [15-17]: κ-η relationships
- ✅ Equations [18-19]: STOK/SOK composition
Section 2 (Compositional TMDPs):
- ✅ Equation [20]: Composition function λ
- ✅ Equation [23]: STOK factorization
- ✅ Equation [24]: Goal kernel
- ✅ Theorem 2.1: STOK Decomposition (+ Corollary 6.1)
- ✅ Theorem 2.2: State-Action Option Set
- ✅ Theorem 2.3: Affordance Option Set
- ✅ Theorem 2.4: Sublimation bound
- ✅ Algorithm 1: Feasibility Iteration (Appendix 10)
- ✅ Algorithm 2: Tree Search (Appendix 10)
test result: ok. 282 passed; 0 failed; 0 ignoredCoverage by module:
- Core theory: ~140 tests (TaskMDP / OKBE / STOK construction / composition).
- Product-space: ~60 tests (factorization, mode functions, affordances, SPK / CEF / TEF).
- Sublimation: ~14 tests including non-binary HL coverage and direct-DP bound cross-check.
- Integration: 32 tests across 6 files (
tests/stok_tests.rs,tests/composition_tests.rs,tests/factorization_tests.rs,tests/sublimation_tests.rs,tests/goal_kernel_tests.rs,tests/algorithm_2_tests.rs).
Paper-parity status: all M0–M7 milestones complete. Two named follow-ups
remain (PP-106 ν-uniform corner case in extract_pi_time_minimizing;
success/failure CEF split for exact feasibility_approx under the general
Theorem 2.1 path) — neither affects the kernel-layer demonstration on
well-formed inputs. See
docs/dev/PAPER_PDF_TASKBOARD.md and
docs/assessments/LOCAL_PDF_PARITY_REPORT.md.
If you use this code in your research, please cite:
@article{ringstrom2025stok,
title={A Unified Theory of Compositionality, Modularity, and Interpretability in Markov Decision Processes},
author={Ringstrom, Thomas J. and Schrater, Paul R.},
journal={arXiv preprint arXiv:2506.09499},
year={2025}
}And optionally cite the implementation:
@software{stokcore2026,
title={STOK-Core: Complete Reference Implementation of Option Kernel Bellman Equations},
author={[BillyHDP]},
year={2026},
url={https://github.com/Hyphaeic/stok-core}
}- Rust 1.75+ (2021 edition)
- GPU with Vulkan/Metal/DX12 support (for WGPU backend)
- OR CUDA 11+ (optional, for CUDA backend)
# Standard build
cargo build --release
# With CUDA backend
cargo build --release --features cuda
# Run tests
cargo test
# Run benchmarks
cargo bench
# Generate documentation
cargo doc --openThis implementation covers the core theory from Ringstrom & Schrater (2025). Possible extensions include:
- Sparse tensors: Scale to millions of states (paper mentions on page 7)
- Risk-calibrated planning: Adjust feasibility under uncertainty
- Empowerment: Intrinsic motivation (conceptual discussion in Section 3.C)
- Real-time runtime (STOK-RT): <10ms decision cycles, anytime sampling
- Continuous state-spaces: Extend from discrete to continuous
- Neural network integration: Learn world models
- Transfer learning: Leverage sublimation and modularity
We welcome contributions! Potential areas:
- Additional examples (robotics, games, control)
- Sparse tensor backends
- Visualization tools
- Performance optimizations
- Documentation improvements
Please open an issue to discuss before starting major work.
MIT License - See LICENSE for details.
This implementation is based on the theoretical framework developed by:
- Thomas J. Ringstrom (PhD Thesis, University of Minnesota, 2023)
- Thomas J. Ringstrom & Paul R. Schrater (arXiv:2506.09499, 2025)
Special thanks to the Burn team for the excellent ML framework.
- Paper: arXiv:2506.09499
- Issues: GitHub Issues
- Email: rings034@gmail.com (theory questions → paper authors)
stok-core/
├── src/
│ ├── mdp/ # Task MDP definitions
│ ├── solver/ # OKBE solvers
│ ├── stok/ # STOK kernel structures
│ ├── composition/ # Chapman-Kolmogorov composition
│ ├── planning/ # Goal kernel, tree search
│ ├── prediction/ # SPK, TEF, CEF kernels
│ ├── hierarchy/ # Product-space, factorization
│ ├── backend/ # GPU device management
│ ├── types.rs # Core types and errors
│ └── lib.rs # Public API
├── examples/
│ ├── honey_badger_2space.rs # Theorem 2.1 (86× reduction)
│ ├── temperature_regulation.rs # Mode switching (93× reduction)
│ ├── logic_task_sublimation.rs # Theorem 2.4 pruning
│ └── honey_badger_3space.rs # All theorems (5,070× reduction!)
├── tests/ # Integration test suite
│ ├── stok_tests.rs # PP-105 invariant tests (Σ η** = 1, κ = Σ η+)
│ ├── composition_tests.rs # PP-203 Eq [18-19] theorem-style
│ ├── factorization_tests.rs # PP-305 CHK-CO6.1 / CHK-T2.1 cross-checks
│ ├── sublimation_tests.rs # PP-403/404 Theorem 2.4 non-binary HL
│ ├── goal_kernel_tests.rs # PP-505 FactorizedGoalKernel + PlanKernel
│ └── algorithm_2_tests.rs # PP-605 Algorithm 2 regression tests
├── benches/
│ └── feasibility_bench.rs # Performance benchmarks
└── docs/ # Project documentation
├── README.md # Doc index
├── dev/ # Authoritative development docs
│ ├── PAPER_PDF_TASKBOARD.md # PP-XXX dev scope
│ ├── PAPER_TRACEABILITY_MATRIX.md # paper item → code mapping
│ ├── PAPER_ASSUMPTIONS.md # paper-backed vs repo-local
│ ├── PAPER_ACCEPTANCE_CHECKLIST.md# CHK-* pass conditions
│ └── PAPER_API_AUDIT.md # public-symbol classification
├── assessments/
│ ├── LOCAL_PDF_PARITY_REPORT.md # per-item parity verdicts
│ └── VERIFICATION_RECORD.md # exact test counts at parity-freeze
├── architectures/thesis_breakdown.md# Ringstrom 2023 thesis content note
├── EXTENSION_DISCUSSION.md # post-paper research (out of scope)
├── references/ # paper + thesis PDFs
└── archive/legacy_status_2026/ # pre-audit "100/100" docs (NOT authoritative)
- 📄 Paper: arXiv:2506.09499
- 📚 Thesis: Ringstrom (2023), University of Minnesota
- 🔧 Burn Framework: tracel-ai/burn
- 📖 API Docs: Run
cargo doc --open - 🎯 Examples: See
examples/directory - ✅ Tests: Run
cargo test
STOK-Core: a working reference for the kernel layer of a reward-free learning system. It demonstrates that the OKBE / STOK formulation works on general well-formed inputs — the specific kernel-layer compositionality, factorization, and verifiability results from Ringstrom & Schrater (2025). Agent-level concerns (motivation, value-of-key, the self-sustaining compositional identity discussed in Ringstrom's 2023 thesis) are out of scope for this repository.
Implementing compositional, verifiable planning for high-dimensional MDPs without rewards.