Skip to content

Hyphaeic/STOK-CORE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hyphaeic

STOK-CORE

Hyphaeic arXiv Tests License

STOK-Core implements a reward-free reinforcement learning framework where policies optimize State-Time Option Kernels (STOKs)—full probability distributions over goal-success and constraint-violation events—instead of scalar value functions.

Documentation: cargo doc --open · Paper · Examples

HPL License · Local License


Paper Tests License

A working reference implementation of the State-Time Option Kernel core from Ringstrom & Schrater (2025), built to demonstrate that the Option Kernel Bellman Equation (OKBE) formulation works as a reward-free learning primitive on general well-formed inputs. Per-item paper-parity verdicts in docs/assessments/LOCAL_PDF_PARITY_REPORT.md; active scope in docs/dev/PAPER_PDF_TASKBOARD.md.

Ringstrom, T., & Schrater, P. (2025). A Unified Theory of Compositionality, Modularity, and Interpretability in Markov Decision Processes. arXiv:2506.09499 [cs.LG]


Scope

STOK-Core is the kernel layer: the OKBE / STOK / SOK / Goal Kernel / Plan Kernel mathematics, plus the Algorithm 1 (feasibility iteration) and Algorithm 2 (BFS option-sequence search) procedures that compose them.

It is not an agent, and it is not the broader compositional architecture that turns these kernels into a self-sustaining entity (motivation construction, value-of-key, the autopoietic / homeostatic content from Ringstrom's 2023 thesis). The paper this implementation tracks is explicit about that: it formalizes the kernel-layer compositionality, factorization, and verifiability results; the agent-layer thesis content is referenced but not contributed by the paper itself.

What you can do with this code today:

  • solve a Task MDP for (κ*, π**, η⁺, η⁻) per Algorithm 1,
  • compose STOKs / SOKs via Chapman-Kolmogorov,
  • factorize a high-dimensional product-space STOK per Theorem 2.1,
  • compute sublimated feasibility bounds per Theorem 2.4 for tree-search pruning,
  • assemble a paper-faithful Goal Kernel / Plan Kernel and run Algorithm 2 over composite states.

What is out of scope (post-paper, separate document docs/EXTENSION_DISCUSSION.md):

  • empowerment as motivation; preference construction; intrinsic value,
  • real-time / online runtime (STOK-RT),
  • continuous state-spaces, sparse tensors, neural-net integration,
  • agent identity / homeostasis / the rest of the thesis.

Overview

STOK-Core replaces scalar value functions with State-Time Option Kernels — full probability distributions over goal-success and constraint-violation events. This is what makes the framework reward-free: policies optimize the probability of named events occurring at named times, not the expectation of an unnamed scalar.

5357322 (feat: paper-parity audit M0-M7; general factorization + Algorithm 2)

Why STOKs?

Compositional: STOKs compose exactly via Chapman-Kolmogorov equations, enabling modular planning.

Verifiable: STOKs record probabilities of semantically interpretable events (goal satisfaction, constraint violation) needed for formal verification.

Scalable: Product-space STOKs factorize into low-dimensional components, avoiding the curse of dimensionality of high-dimensional Bellman backups.

Demonstration status (post M0–M7 audit)

<<<<<<< HEAD

  • All 4 theorems validated with working examples

  • 220+ tests passing (100% success rate, zero failures)

  • GPU-accelerated via Burn ML framework ======= The audit's purpose was to make the kernel-layer demonstration credible — to ensure that the OKBE / STOK mathematics matches the paper item-by-item on general well-formed inputs.

  • All 4 theorems implemented as first-class APIs (build_state_option_set / build_affordance_option_set for Theorems 2.2/2.3; full Eq [23] factorization with TEF for Theorem 2.1; sublimation generic across HL cardinalities for Theorem 2.4).

  • Algorithm 1 + Algorithm 2 paper-faithful with composite search nodes and sublimation pruning.

  • 280+ tests passing (0 failing) across inline + 6 integration test files, including direct-DP cross-checks against the factorization on small problems. See docs/assessments/VERIFICATION_RECORD.md.

  • 86–5,070× memory reductions demonstrated in examples (Theorem 2.1 in action on real product spaces).

  • GPU-accelerated via Burn ML framework.

For "paper-faithful per item, with what caveats," walk docs/assessments/LOCAL_PDF_PARITY_REPORT.md against the PDF.

5357322 (feat: paper-parity audit M0-M7; general factorization + Algorithm 2)


Quick Start

Installation

[dependencies]
stok-core = { git = "https://github.com/Hyphaeic/stok-core" }
burn = { version = "0.19", features = ["wgpu"] }

Hello STOK

use stok_core::prelude::*;

fn main() -> Result<(), StokError> {
    let device = default_device();

    // Define a simple navigation task
    let mdp = TaskMDP::simple_chain(10, 20, &device);

    // Solve for state-time option kernel
    let stok = solve_task_mdp(&mdp)?;

    // Query cumulative feasibility
    println!("κ(state_0) = {:.3}", stok.kappa[0]);

    // All termination events sum to 1
    assert!(stok.validate().is_ok());

    Ok(())
}

Examples

Run these to see the framework in action:

1. Product-Space Factorization (Theorem 2.1)

cargo run --release --example honey_badger_2space

Demonstrates: 2-space product (Grid × Hydration), 86× memory reduction

Output:

Product space: 25 × 10 = 250 states
Factorized:    25² × T + 10² × T = 725×T values
Full product:  250² × T = 62,500×T values
Reduction:     86× ✅

2. Temperature Regulation with Mode Switching

cargo run --release --example temperature_regulation

Demonstrates: Region-based default dynamics, homeostatic planning, 93× memory reduction

Output:

Hot/Cold regions with temperature regulation
Agent must alternate regions to avoid death
All trajectories: SUCCESS (reaches goal with safe temperature)

3. Sublimation for Abstract Pruning (Theorem 2.4)

cargo run --release --example logic_task_sublimation

Demonstrates: Precedence constraints (D ≺ E, D ≺ F), abstract feasibility bounds

Output:

Sublimated feasibility κ*_sub,σ computed
Invalid states (honey before key): κ*_sub = 0.000 🔴 PRUNE
Valid states: κ*_sub = 1.000 ✅ FEASIBLE
Pruning efficiency: 37.5% of branches eliminated

4. Full 3-Space Integration (All 4 Theorems!)

cargo run --release --example honey_badger_3space

Demonstrates: Grid × Hydration × Logic (2,000 states), 5,070× memory reduction!

Output:

3-space product: 25 × 10 × 8 = 2,000 states
Factorized representation: (25² + 10² + 8²) × T
Memory reduction: 5,070× 🚀
✅ ALL 4 THEOREMS VALIDATED

Core Features

Option Kernel Bellman Equations (OKBEs)

Four coupled equations that optimize cumulative feasibility instead of cumulative reward:

κ-OKBE (Equation 7): Maximize goal-completion probability

κ*(x) = max_a [f₁(x,a) + f₂(x,a) Σ P(x'|x,a) κ*(x')]

π-OKBE (Equation 8): Minimize expected time among κ-optimal actions

η-OKBEs (Equations 9-12): Construct termination distributions

η**(x_f, t_f | x_i) = P(terminate at x_f, time t_f | start at x_i)

Result: A STOK kernel with properties:

  • ✅ Σ_{x_f, t_f} η**(x_f, t_f | x_i) = 1 (proper distribution)
  • ✅ κ(x) = Σ_{x_f, t_f} η⁺(x_f, t_f | x) (consistency)
  • ✅ Composes exactly via Chapman-Kolmogorov

STOK Composition (Equation 18)

Chain options o₁ ∘ o₂ by convolving their STOKs:

let composed = compose_stoks(&stok1, &stok2)?;

// Time convolution: t_μ = t_1 + t_2
// State marginalization: Σ_{x_f1} η₁(x_f1, t_1) · η₂(x_μ, t_μ - t_1 | x_f1)

Validated: Composition preserves normalization, satisfies associativity


STOK Factorization (Theorem 2.1, Equation 23)

Avoid curse of dimensionality in product spaces S = X × Z₁ × ... × Zₙ:

// Instead of computing η̃ on |S|² × T (intractable)
// Factor into base + high-level components:
η̃(z_f, x_f, t_f | z, x) = ξ(t_f|z,x) · ρ_π(x_f|x,t_f) · ∏_k ρ_k(z_{k,f}|z_k,t_f)

let factorized = assemble_factorized_stok(
    base_stok,           // Solve on X only
    hl_dynamics,         // Each Z_k kernel
    default_actions,     // Default α^ℓ per space
    product_dims,
)?;

Memory: O(|X|² + Σ_k |Z_k|²) instead of O(∏_k |Z_k|² × |X|²)

Validated: 86-5,070× reductions in practice


Sublimation (Theorem 2.4)

Solve high-level-only problems to prune tree search:

let sublimated = SublimatedTMDP::new(P_logic, goal_logic, constraints_logic, 30)?;
let kappa_sub = sublimated.solve()?;

// Bound: κ̃*(σ,z,x) ≤ κ*_sub,σ(σ)
// If κ*_sub,σ(σ) = 0 → prune all (z,x) with that σ

Validated: Bounds hold, pruning eliminates 37.5% of search nodes in examples


Mode Functions (Section 2.B)

Environment dynamics change based on high-level state:

// Key unlocks mountain door
let door_mode = KeyDoorMode::new(logic_space_id, key_bit_index);

// Before key: P_x(·|·,·,e_closed) blocks passage
// After key:  P_x(·|·,·,e_open) allows passage

Validated: Temperature regulation and honey_badger_3space demonstrate mode switching


Mathematical Foundations

Core Equations Implemented

Paper Eq Description Implementation Tests
[7] κ-OKBE (cumulative feasibility) solver/bellman.rs ✅ 20
[8] π-OKBE (time-minimizing policy) solver/feasibility_iteration.rs ✅ 10
[9-12] η-OKBEs (STOK construction) solver/stok_construction.rs ✅ 15
[15-17] κ-η consistency relations stok/kernel.rs ✅ 8
[18] STOK composition composition/chapman_kolmogorov.rs ✅ 10
[19] SOK composition composition/chapman_kolmogorov.rs ✅ 9
[23] STOK factorization hierarchy/factorization.rs ✅ 27
[24] Goal kernel planning/goal_kernel.rs ✅ 26

All 4 Theorems Validated

Theorem Statement Example Status
2.1 STOK Decomposition honey_badger_{2,3}space ✅ Validated
2.2 State-Action Option Set tree_search.rs ✅ Implemented
2.3 Affordance Option Set affordance.rs ✅ Implemented
2.4 Sublimation Bound: κ̃* ≤ κ*_sub logic_task_sublimation ✅ Validated

Documentation

API Reference

Generate complete API documentation:

cargo doc --open

The rustdoc includes:

  • Detailed module documentation
  • Equation references from the paper
  • Usage examples
  • Mathematical background

Testing

Run Full Test Suite

cargo test

Current Status: 220+ tests, 100% passing, 0 failures

Coverage:

  • ✅ All equation implementations
  • ✅ All mathematical invariants (normalization, consistency, bounds)
  • ✅ Composition properties (associativity, identity)
  • ✅ Product-space factorization
  • ✅ Mode switching
  • ✅ Sublimation bounds

Run Benchmarks

cargo bench

Performance Targets (100 states):

  • Bellman backup: <1ms ✅
  • Full convergence: <100ms ✅
  • STOK composition: <10ms ✅

Architecture

Module Structure

stok-core/
├── mdp/           # Task MDP definition
├── solver/        # OKBEs and feasibility iteration
├── stok/          # STOK kernel data structures
├── composition/   # Chapman-Kolmogorov composition
├── planning/      # Goal kernel and tree search
├── prediction/    # SPK, TEF, CEF prediction kernels
└── hierarchy/     # Product-space, affordances, factorization
    ├── product_space.rs    # ProductState representation
    ├── affordance.rs       # Affordance functions (Def. 0.1)
    ├── factorization.rs    # STOK factorization (Thm 2.1)
    ├── modes.rs            # Mode functions ζ: Z → E
    └── sublimation.rs      # Sublimation (Thm 2.4)

Data Flow

TaskMDP → solve_task_mdp() → STOKKernel
                                  ↓
              compose_stoks() → ComposedSTOK
                                  ↓
              GoalKernel → tree_search() → Plan
                                  ↓
              simulate_plan() → Trajectory

For product spaces (S = X × Z):

Base STOK + HL SPKs → assemble_factorized_stok() → FactorizedSTOK
                                                         ↓
                                                    evaluate()
                                                      sample()

Key Capabilities

✅ Compositional Planning

STOKs compose exactly via Chapman-Kolmogorov:

  • Chain options: μ = o₁ ∘ o₂ ∘ ... ∘ oₙ
  • Exact predictions: Time convolution + state marginalization
  • Normalization preserved: Σ η** = 1

✅ Constraint-Aware Optimization

Goals (f_g) and constraints (f_c) are separated:

  • Goal: Where to go
  • Constraint: Where NOT to go
  • Achievement: f₁ = f_g · f_c
  • Continuation: f₂ = (1-f_g) · f_c

No reward shaping needed!

✅ High-Dimensional Factorization

Avoid exponential blow-up in product spaces:

  • Full product: O(|X|² × ∏ |Z_k|² × T) — intractable
  • Factorized: O(|X|² × T + Σ |Z_k|² × T) — tractable
  • Reductions: 86-5,070× demonstrated

✅ Interpretability

Query specific events:

// Probability of success at state 10, time 5
let prob = stok.eta_plus.slice([initial, 10, 5]).elem();

// Expected termination time
let expected_t = compute_expected_time(&stok, initial);

// Risk analysis
let failure_prob = stok.infeasibility(state);

✅ GPU Acceleration

Backend-agnostic via Burn framework:

  • WGPU (default): Cross-platform GPU support
  • CPU: Fallback for testing
  • CUDA (optional): NVIDIA GPU acceleration

Examples Walkthrough

Honey Badger (2-Space)

Navigate a gridworld while managing hydration:

  • Product space: Grid (25 states) × Hydration (10 levels) = 250 states
  • Affordance: Drinking at lake → restore hydration
  • Challenge: Reach friend before dying of thirst

Key Result: 86× memory reduction via factorization

cargo run --release --example honey_badger_2space

Temperature Regulation

Travel between hot/cold regions to maintain safe temperature:

  • Product space: Grid (20 states) × Temperature (11 levels) = 220 states
  • Mode switching: Region determines warming/cooling dynamics
  • Challenge: Avoid freezing or overheating

Key Result: 93× memory reduction, homeostatic control validated

cargo run --release --example temperature_regulation

Logic Task with Sublimation

Complete tasks with precedence constraints:

  • Logic space: 3 binary flags (D, E, F) = 8 states
  • Constraints: Must complete D before E or F
  • Sublimation: Solve logic-only problem first, use for pruning

Key Result: 37.5% tree search pruning via abstract feasibility

cargo run --release --example logic_task_sublimation

Full Honey Badger (3-Space)

Ultimate integration: Navigate, manage hydration, complete tasks, unlock doors:

  • Product space: Grid (25) × Hydration (10) × Logic (8) = 2,000 states
  • All features: Factorization, modes, sublimation, affordances
  • All theorems: 2.1, 2.2, 2.3, 2.4 demonstrated together

Key Result: 5,070× memory reduction!

cargo run --release --example honey_badger_3space

Performance

Validated Against Paper Claims

Operation Target Achieved Scale
Bellman backup <1ms ✅ ~0.5ms 100 states
Full convergence <100ms ✅ ~50ms 100 states
SOK composition <10ms ✅ ~5ms 100 states

Memory Efficiency

Problem Product Size Full Tensor Factorized Reduction
2-space 250 62,500×T 725×T 86×
Temperature 220 48,400×T 521×T 93×
3-space 2,000 6.4M×T 12,814×T 500×
honey_badger (full) 8,000 ~100M×T ~20K×T 5,000×

Theorem 2.1 validated: Factorization scales exponentially better!


Implementation Completeness

All Paper Equations

Section 1 (Core Theory):

  • ✅ Equations [7-12]: OKBEs
  • ✅ Equations [15-17]: κ-η relationships
  • ✅ Equations [18-19]: STOK/SOK composition

Section 2 (Compositional TMDPs):

  • ✅ Equation [20]: Composition function λ
  • ✅ Equation [23]: STOK factorization
  • ✅ Equation [24]: Goal kernel

All 4 Main Theorems

  • Theorem 2.1: STOK Decomposition (+ Corollary 6.1)
  • Theorem 2.2: State-Action Option Set
  • Theorem 2.3: Affordance Option Set
  • Theorem 2.4: Sublimation bound

Algorithms

  • Algorithm 1: Feasibility Iteration (Appendix 10)
  • Algorithm 2: Tree Search (Appendix 10)

Project Status

Test Coverage: 280+ Tests ✅

test result: ok. 282 passed; 0 failed; 0 ignored

Coverage by module:

  • Core theory: ~140 tests (TaskMDP / OKBE / STOK construction / composition).
  • Product-space: ~60 tests (factorization, mode functions, affordances, SPK / CEF / TEF).
  • Sublimation: ~14 tests including non-binary HL coverage and direct-DP bound cross-check.
  • Integration: 32 tests across 6 files (tests/stok_tests.rs, tests/composition_tests.rs, tests/factorization_tests.rs, tests/sublimation_tests.rs, tests/goal_kernel_tests.rs, tests/algorithm_2_tests.rs).

Paper-parity status: all M0–M7 milestones complete. Two named follow-ups remain (PP-106 ν-uniform corner case in extract_pi_time_minimizing; success/failure CEF split for exact feasibility_approx under the general Theorem 2.1 path) — neither affects the kernel-layer demonstration on well-formed inputs. See docs/dev/PAPER_PDF_TASKBOARD.md and docs/assessments/LOCAL_PDF_PARITY_REPORT.md.


Citation

If you use this code in your research, please cite:

@article{ringstrom2025stok,
  title={A Unified Theory of Compositionality, Modularity, and Interpretability in Markov Decision Processes},
  author={Ringstrom, Thomas J. and Schrater, Paul R.},
  journal={arXiv preprint arXiv:2506.09499},
  year={2025}
}

And optionally cite the implementation:

@software{stokcore2026,
  title={STOK-Core: Complete Reference Implementation of Option Kernel Bellman Equations},
  author={[BillyHDP]},
  year={2026},
  url={https://github.com/Hyphaeic/stok-core}
}

Building from Source

Prerequisites

  • Rust 1.75+ (2021 edition)
  • GPU with Vulkan/Metal/DX12 support (for WGPU backend)
  • OR CUDA 11+ (optional, for CUDA backend)

Build

# Standard build
cargo build --release

# With CUDA backend
cargo build --release --features cuda

# Run tests
cargo test

# Run benchmarks
cargo bench

# Generate documentation
cargo doc --open

Future Extensions

This implementation covers the core theory from Ringstrom & Schrater (2025). Possible extensions include:

Mentioned in Paper

  • Sparse tensors: Scale to millions of states (paper mentions on page 7)
  • Risk-calibrated planning: Adjust feasibility under uncertainty
  • Empowerment: Intrinsic motivation (conceptual discussion in Section 3.C)

From Broader Research Program

  • Real-time runtime (STOK-RT): <10ms decision cycles, anytime sampling
  • Continuous state-spaces: Extend from discrete to continuous
  • Neural network integration: Learn world models
  • Transfer learning: Leverage sublimation and modularity

Contributing

We welcome contributions! Potential areas:

  • Additional examples (robotics, games, control)
  • Sparse tensor backends
  • Visualization tools
  • Performance optimizations
  • Documentation improvements

Please open an issue to discuss before starting major work.


License

MIT License - See LICENSE for details.


Acknowledgments

This implementation is based on the theoretical framework developed by:

  • Thomas J. Ringstrom (PhD Thesis, University of Minnesota, 2023)
  • Thomas J. Ringstrom & Paul R. Schrater (arXiv:2506.09499, 2025)

Special thanks to the Burn team for the excellent ML framework.


Contact


Project Structure

stok-core/
├── src/
│   ├── mdp/          # Task MDP definitions
│   ├── solver/       # OKBE solvers
│   ├── stok/         # STOK kernel structures
│   ├── composition/  # Chapman-Kolmogorov composition
│   ├── planning/     # Goal kernel, tree search
│   ├── prediction/   # SPK, TEF, CEF kernels
│   ├── hierarchy/    # Product-space, factorization
│   ├── backend/      # GPU device management
│   ├── types.rs      # Core types and errors
│   └── lib.rs        # Public API
├── examples/
│   ├── honey_badger_2space.rs           # Theorem 2.1 (86× reduction)
│   ├── temperature_regulation.rs        # Mode switching (93× reduction)
│   ├── logic_task_sublimation.rs        # Theorem 2.4 pruning
│   └── honey_badger_3space.rs           # All theorems (5,070× reduction!)
├── tests/                                # Integration test suite
│   ├── stok_tests.rs                    # PP-105 invariant tests (Σ η** = 1, κ = Σ η+)
│   ├── composition_tests.rs             # PP-203 Eq [18-19] theorem-style
│   ├── factorization_tests.rs           # PP-305 CHK-CO6.1 / CHK-T2.1 cross-checks
│   ├── sublimation_tests.rs             # PP-403/404 Theorem 2.4 non-binary HL
│   ├── goal_kernel_tests.rs             # PP-505 FactorizedGoalKernel + PlanKernel
│   └── algorithm_2_tests.rs             # PP-605 Algorithm 2 regression tests
├── benches/
│   └── feasibility_bench.rs             # Performance benchmarks
└── docs/                                 # Project documentation
    ├── README.md                        # Doc index
    ├── dev/                             # Authoritative development docs
    │   ├── PAPER_PDF_TASKBOARD.md       # PP-XXX dev scope
    │   ├── PAPER_TRACEABILITY_MATRIX.md # paper item → code mapping
    │   ├── PAPER_ASSUMPTIONS.md         # paper-backed vs repo-local
    │   ├── PAPER_ACCEPTANCE_CHECKLIST.md# CHK-* pass conditions
    │   └── PAPER_API_AUDIT.md           # public-symbol classification
    ├── assessments/
    │   ├── LOCAL_PDF_PARITY_REPORT.md   # per-item parity verdicts
    │   └── VERIFICATION_RECORD.md       # exact test counts at parity-freeze
    ├── architectures/thesis_breakdown.md# Ringstrom 2023 thesis content note
    ├── EXTENSION_DISCUSSION.md          # post-paper research (out of scope)
    ├── references/                      # paper + thesis PDFs
    └── archive/legacy_status_2026/      # pre-audit "100/100" docs (NOT authoritative)

Quick Links

  • 📄 Paper: arXiv:2506.09499
  • 📚 Thesis: Ringstrom (2023), University of Minnesota
  • 🔧 Burn Framework: tracel-ai/burn
  • 📖 API Docs: Run cargo doc --open
  • 🎯 Examples: See examples/ directory
  • ✅ Tests: Run cargo test

STOK-Core: a working reference for the kernel layer of a reward-free learning system. It demonstrates that the OKBE / STOK formulation works on general well-formed inputs — the specific kernel-layer compositionality, factorization, and verifiability results from Ringstrom & Schrater (2025). Agent-level concerns (motivation, value-of-key, the self-sustaining compositional identity discussed in Ringstrom's 2023 thesis) are out of scope for this repository.

Implementing compositional, verifiable planning for high-dimensional MDPs without rewards.

About

STOK-Core implements State-Time Option Kernels end-to-end: a reinforcement learning toolkit modeling goal/constraint termination distributions for compositional planning. It pairs Burn-backed GPU acceleration with Task MDP solvers, Chapman-Kolmogorov composition, and factorized prediction kernels so hierarchical planners remain modular.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages