Skip to content

diff-buf write-buffering + index-root fusion (PSS-backed indexes)#836

Open
whilo wants to merge 23 commits into
mainfrom
feat/fuse-index-roots
Open

diff-buf write-buffering + index-root fusion (PSS-backed indexes)#836
whilo wants to merge 23 commits into
mainfrom
feat/fuse-index-roots

Conversation

@whilo

@whilo whilo commented Jun 1, 2026

Copy link
Copy Markdown
Member

diff-buf write-buffering + index-root fusion (PSS-backed indexes)

Two related write-amplification reductions for content-addressed (konserve) persistence, both landing on the persistent-sorted-set index.

1. diff-buf (per-child diff buffering at the serialization boundary)

Backed by persistent-sorted-set PR replikativ/persistent-sorted-set#6. A normal commit rewrites the whole root→leaf spine (~depth+1 object PUTs). With diff-buf, a rewritten branch buffers each content-only child's diff into a per-child slot and re-points the child to its durable anchor, so a content-only commit costs ~1 PUT instead of ~depth+1. Reads project the buffered diff back lazily on descent (baseline read cost). datahike wiring:

  • :slots round-trip in the fressian Branch read/write handlers (clj + cljs).
  • :diff-buf-size / :branching-factor are create-time-fixed via :index-config (round-trips with the store; connect adopts the stored value). Default ON (256) for new stores.
  • crypto-hash / merkle: a branch's content UUID folds its slots; GC/markFreed defer freeing to commit so a re-pointed anchor is never freed.

2. index-root fusion

Inlines each index's root node into the db-record, so commit! skips writing those roots as separate objects (one fewer PUT/commit, and it composes with diff-buf toward ~2 PUTs/commit). Opt-in, default OFF (*default-fuse-index-roots?*) for now — flipping the default churns object-count assertions across the suite; the SaaS template opts in, and connect adopts the stored value so fused and non-fused stores both reconnect. Design note: doc/index-root-fusion.md.

Comparator on the tree (storage stays comparator-agnostic)

The per-index comparator now lives on the PSS and propagates to its Branch nodes (Branch._projCmp), instead of being carried on storage. This PR removes CachedStorage's cmp field, its comparator impl, and with-comparator. (Mirrors the upstream PSS change; resolves the earlier review note about a storage-carried comparator.)

Dependencies (no local checkouts)

  • persistent-sorted-set → git a36ecbe… (branch feature/op-buf-v5). Its Java is compiled by tools.deps prep — run clojure -X:deps prep once after fetching (CI step).
  • konserve → released 0.9.349 (includes the cljs cross-host header meta-size fix index selection does not work #143), replacing the former local dev checkout.

Validation

  • clj-pss (default index = persistent-set, diff-buf on): 522 tests / 2475 assertions / 0 failures (-Xmx4g).
  • Crash-safety: crash-injection at both sides of the commit HEAD-flip (file backend, diff-buf on) recovers to the old-or-new consistent state — indices agree, no torn/duplicate datoms.
  • New regression: store-test/test-diff-buf-upsert-reopen (value-changing upserts survive store→reopen with no stale/duplicate datoms — guards the comparator-agnostic {:absent :present} leaf-diff serialization).

Known, non-critical (deferred — not addressed here)

A fatal Error (e.g. OOM) thrown inside the async write/commit pipeline can hang a synchronous transact (the no-timeout deref never completes, because superv.async go-try catches Exception, not Throwable). It does not corrupt data (commit is copy-on-write + atomic HEAD flip + free-after-flip). Pre-existing, orthogonal to this PR; the clean fix is a deref timeout. Tracked internally.

Notes

  • PSS naming was op-buf (hitchhiker/Bε operation buffer); renamed diff-buf since this buffers a per-child diff at the serialization boundary. History can be squashed on merge.

whilo added 19 commits May 29, 2026 20:00
Opt-in via :fuse-index-roots? (default false). When enabled, db->stored
inlines each flushed index's in-memory root node into the db-record
(:eavt-root/:aevt-root/:avet-root + temporal) and commit! excludes those
root addresses from the separate-object writes (pending-writes drain).
stored->db seeds the inlined root back via di/-seed-root!, so root()
returns it with no storage round-trip; deeper children stay lazy.

Saves one object write per index root per commit (and one cold-open GET
per index); for a single-leaf index the whole index inlines. History is
preserved (per-commit cid records). Read is presence-based (:eavt-root),
so fused and legacy records both restore. Gated off under :crypto-hash?
for now (the audit walk reads the root from storage by address).

New index protocol methods -root-node / -seed-root! (PSS impl; clj).

Validated: cold-restart (separate JVM) roundtrips correctly at
:keep-history? true with indexed attrs, retraction, and range slices;
write count drops (e.g. 5->3 objects/commit with two active indexes);
store-test green with fusion off.

See doc/index-root-fusion.md.
Wires the persistent-sorted-set op-buf write-optimization into the index adapter
so a commit buffers content-only child diffs into the rewritten ancestor instead
of rewriting the full spine (~1 PUT/commit for small commits). Composes with
index-root fusion: the buffered diffs ride in the fused db-record.

- Branch fressian handlers round-trip :slots (.slotsForStorage / reconstruct
  _slots on read); emitted only when present ⇒ opBufSize=0 / legacy DBs are
  byte-identical (back-compat).
- op-buf-size threaded into fresh-set Settings; single knob = JVM sysprop
  pss.opBufSize (TODO: promote to a config key). Shared node-deserialization
  Settings already honors it via Settings.defaultOpBufSize.
- Per-index storage view (with-comparator) carries the index comparator so
  buffered-leaf projection routes by value on cold restore (CachedStorage gains
  a cmp field + IStorage.comparator()).
- deps.edn: PSS -> :local/root (dev) for the op-buf-v5 build.

Validated: file-backed DB, build over 60 commits, fresh cold reopen, full query
equality (count/sum/lookup across eavt/aevt/avet) vs baseline at B=0/64/256/1024,
with fusion on. JVM-only; cljs falls back to baseline. Crypto-hash + op-buf and
GC/markFreed tracking remain (tracked as debt).
(:storage store) is nil for backends without a CachedStorage (e.g. :mem); (assoc
nil :cmp) produced a plain map that then failed to cast to IStorage. Guard with
instance? CachedStorage so nil/other storages pass through unchanged. Restores I0
(datahike index/ident/db tests green at opBufSize=0).
Both are create-time-fixed PSS-index settings, now sourced from the persisted
:index-config (defaults 0 and 512 — existing stores, built at 512 with no op-buf,
are unaffected). Threaded into fresh-set creation (empty-index/init-index) AND the
node-deserialization Settings (previously hardcoded 512 — the spot that would have
corrupted a non-512 store on restore). op-buf-size keeps the pss.opBufSize sysprop
as an experiment-only fallback. Settings built via the 5-arg normalizing ctor
(defaults refType=SOFT). I0 spot-check (index/db tests) green.

NOTE: connect-time reconcile (adopt stored value so reconnect needn't re-specify,
+ fuse default flip) is the next, separate step.
…pt-in

adopt-stored-fixed: at connect, source :fuse-index-roots? and :index-config
{:branching-factor :op-buf-size} from the STORED config (adopt, or drop when the
store predates the key). Existing stores connect unchanged; new stores that set
these reconnect without re-specifying; the strict consistency check still guards
every other key. Explicit create-time-fixed-keys set documents the immutable set.

Kept *default-fuse-index-roots?* FALSE: flipping it globally breaks the merkle-audit
walk and online GC, which read index roots as separate konserve objects — fusion
inlines them into the db-record (verified: audit-verify-test + gc errored with
:audit/node-missing on all roots; green again once reverted). Fusion stays opt-in
until audit/GC are made fusion-aware. Reconcile validated: new/existing/op-buf/bf
stores all create→release→reconnect cleanly; core/api/db/index/audit/gc green
(1 pre-existing config-test default-assertion failure, unrelated).
…to-hash

Index-root fusion inlines each index root into the db-record, so the root is NOT a
separate konserve object. Previously the audit walk and online GC read roots by
address from konserve → :audit/node-missing for every root when fusion was on.

- GC (reachable-in-branch): seed inlined roots into their indexes before -mark
  (mirrors stored->db), so walk-addresses uses the inlined root and only fetches
  its children.
- Audit (-recompute-merkle-root): add walk-pss-node! — when the root address has no
  separate object (fused), verify the seeded in-memory root's content hash (still
  detects db-record root tampering) and recurse children (separate objects) as usual.
- writing.cljc: drop the fusion×crypto-hash mutual-exclusion gate — fusion+crypto now
  compose (root address is still its content hash; audit verifies the inlined root).
- config-test: expect :fuse-index-roots? in the default config (load-config has always
  added it — pre-existing assertion gap).

Validated: crypto-hash + fusion → verify-chain :ok (0 mismatch/missing); fusion + GC
walk completes, data intact. Global default kept false pending a suite-wide object-count
test update; fusion opt-in per store. Focused suite (config/audit/gc/core/api/db/index)
62 tests, 295 assertions, 0 failures. Resolves the audit/GC half of #57.
Under crypto-hash a Branch address is uuid(child-addresses). With op-buf a buffered
child's stored address is its ANCHOR (old content hash) and the diff lives in the
parent's slots — so the branch address ignored the diff → two logically-different
trees with the same anchors collided.

Fix: branch-crypto-uuid folds the slots into the hash — uuid(canon [addresses slots])
— so the address reflects the durable representation (anchors + diff); the audit walks
(walk-pss-address!/walk-pss-node!) recompute the same from the stored node.
normalizes Datoms→vectors so the diff hashes identically whether it's a live
PersistentTreeMap (store) or a deserialized plain map (restore). Back-compat: when
there are no slots (baseline / existing crypto stores) the hash is UNCHANGED
(uuid(addresses)). Consistent with the merkle already being representation-dependent;
op-buf-size is create-time-fixed per store so the root stays deterministic.

Validated: crypto+op-buf, crypto+op-buf+fusion, and baseline crypto all verify-chain
:ok on cold reopen (count 3000); audit/index/gc suites 25 tests 0 failures. Resolves #54.
…change tests

- op-buf-size made cross-platform (cljs returns 0 fallback, no sysprop).
- cljs empty-index/init-index thread op-buf-size + with-comparator; CachedStorage
  comparator() cross-platform.
- cljs Branch read handler reconstructs _slots (anchor = child address) + 9-arg ctor;
  cljs BTSet read handler threads with-comparator; cljs write handler emits :slots
  via branch/slots-for-storage.
- nodejs_test: cljs-opbuf-write-roundtrip-test (validated: 30 buffered blobs, cold
  reproject exact) + jvm-opbuf-exchange-test (skips if artifact absent).
…survive)

Validates the cljs $remove slot-carry through structural merge/borrow: insert 2000,
retract even :n in small commits, cold reopen → exactly odds survive (count 1000, sum
1000000). 57 buffered-slot blobs written by cljs. 18 tests/102 assertions/0 failures.
… → replace)

Insert 1000 ids with :n 0, update each :n to its id in small commits (upsert routes to
psset/replace → Branch.$replace for eavt/aevt), cold reopen → every :n == its :id, sum
499500. 30 buffered-slot blobs. 19 tests/107 assertions/0 failures.
…s reference set)

Seeded-LCG randomized insert/retract churn over a >bf (branch-node) tree under op-buf-size
64 (frequent merge/borrow/split + buffer/write decisions), periodic + final cold reopens
compared to a reference id-set. Bulk-seeds 2000 then 40 churn rounds, 7 cold checks; 75
buffered-slot blobs confirm op-buf actually engaged. 20 tests/113 assertions/0 failures.
…compute-merkle-root

cljs merkle auditing never worked before: -recompute-merkle-root was :cljs-not-implemented,
gen-address (cljs) hashed only addresses (not op-buf slots ⇒ mismatch vs clj), and
-merkle-root read .-_address (clj field) instead of cljs .-address. Now canon, branch-
crypto-uuid (folds slots via branch/slots-for-storage), gen-address, walk-pss-address!,
walk-pss-node!, node-class-name, -merkle-root, -recompute-merkle-root are all cross-platform.
Gate: cljs-merkle-audit-test re-derives every node hash from storage for crypto baseline +
crypto+op-buf, warm + cold reopen, all :ok. 21 tests/117 assertions/0 failures.

NOTE: datahike.audit/verify-chain does NOT cljs-compile yet (separate core.async go-try-
macroexpansion bug at audit.cljc:54); test calls index-level -recompute-merkle-root directly.
…tests

The konserve cljs header meta-size bug (single-byte vs JVM 4-byte big-endian) broke
JVM<->cljs konserve exchange, blocking cljs connect to JVM-written datahike stores. Point
konserve at ../konserve (dev) for the fix. Tests: jvm-opbuf-exchange-test now PASSES
(cljs connects to a JVM-written op-buf store, reads identical datoms — buffered slots
reproject cross-host); xhost-fress-probe-test reads JVM-konserve-written namespaced
keywords cross-host. 22 tests/125 assertions/0 failures; JVM clj unbroken.
…e.async go macro)

superv.async/go-try- expands to clojure.core.async/go; without requiring that macro in the
ns, the cljs build fell back to the JVM go macro and failed (vary-meta on keyword in
go-impl). audit.cljc never required core.async (it was never cljs-compiled before). Mirror
datahike.versioning: require [clojure.core.async :refer [<!]] + (:require-macros
[clojure.core.async :refer [go]]) for cljs. cljs-merkle-audit-test now exercises the real
verify-chain :deep? API (crypto baseline + op-buf, warm + cold) — all :ok. 22 tests/129
assertions/0 failures; JVM clj verify-chain unbroken.
…f-buf-size, pss.diffBufSize)

Mechanical rename to match persistent-sorted-set (op-buf was the hitchhiker/Bε term; we
buffer a per-child DIFF at the serialization boundary). Config key :op-buf-size →
:diff-buf-size (safe: not released, only dev stores); create-time-fixed key + adopt-stored-fixed
updated; sysprop pss.opBufSize → pss.diffBufSize. Validated: cljs 22 tests/122 assertions/0;
clj crypto+diff-buf+fusion audit :ok. (Datahike default-on flip deferred — it churns the
suite's object-count assertions, same as the fuse-default flip.)
default-index-config :datahike.index/persistent-set → {:diff-buf-size 256}, baked into the
stored config at create time so EXISTING stores keep their value (adopt-stored-fixed sources
it from the store; diff-buf-size fn defaults 0 when absent ⇒ pre-diff-buf stores stay
baseline). Set {:diff-buf-size 0} to disable. Fixes: config-test expected default
(:index-config {:diff-buf-size 256}); reverted an over-eager rename in upsert_impl_test
(the hitchhiker-tree's :op-buf is the real Bε operation buffer, NOT our diff-buf — must not
rename). clj-pss 521 tests/2473 assertions/0; cljs 22/122/0.
Drops the development-trajectory version label in favor of the shipping
name; refers to the PSS doc/diff-buffering.md design. Comment-only change.
- index/persistent_set: remove CachedStorage `cmp` field, its `comparator`
  impl, and `with-comparator`. The per-index comparator now lives on the PSS
  and propagates to Branch nodes (Branch._projCmp); storage stays
  comparator-agnostic. Matches persistent-sorted-set a36ecbe.
- deps.edn: konserve -> 0.9.349 (released; includes cljs cross-host header
  meta-size fix #143); persistent-sorted-set -> git a36ecbe (diff-buf;
  `clojure -X:deps prep` compiles its Java). No more :local/root deps.
- test/store_test: add diff-buf upsert+reopen regression — value-changing
  upserts survive store->reopen with no stale/duplicate datoms (guards the
  comparator-agnostic {:absent :present} leaf-diff serialization).
- nodejs_test: cljfmt (whitespace only).

clj-pss: 522 tests / 2475 assertions / 0 failures (at -Xmx4g).
@whilo whilo force-pushed the feat/fuse-index-roots branch from 63aeb0b to 997d4cf Compare June 1, 2026 21:07
whilo added 4 commits June 1, 2026 18:59
diff-buf write-buffering trades in-memory insert throughput for fewer durable
object PUTs (~7->1/commit) — it only pays off on a request-priced object store.
An in-memory (:memory/:mem) store has no PUTs to fold, so the buffering is pure
overhead: measured ~1.5-1.8x slower pure-insert throughput for zero benefit.

Make the default :diff-buf-size backend-aware via default-index-config-for-backend:
0 for the in-memory backend, 256 (unchanged) for durable stores. Index-agnostic
(only touches the key when the index default carries it, i.e. PSS) and an explicit
user :index-config still wins (deep-merged over the default). storeless-config is
inherently in-memory, so it defaults off too. Update config-test expectations.
Random transact / value-upsert / retractEntity vs a Clojure model, with a release+
reconnect each cycle (cold fressian reload). Exercises the full stack together that the
PSS-level (edn) harness can't reach: PSS diff-buf + fressian :slots handlers + commit-log
+ HEAD + crypto-hash. Deterministic (java.util.Random seed) — failures reproduce from
(seed,params). Swept {diff-buf 0/256} × {crypto-hash off/on}. Bounded deftest in the suite;
run drives larger sweeps. Validated against local PSS: 12 trials, 0 divergences.
… harness)

PSS feature/op-buf-v5 a36ecbe -> 2063823: anchorless-deposit skip (bulk-load throughput),
the in-memory subtreeCount-drift fix (count after restore+mutate), and the seeded stress
harness (content/count/measure/GC/address-determinism). Verified: clojure -X:deps prep
compiles the git PSS Java cleanly and datahike loads + round-trips against it.
Since #759, build.clj's javadoc fn called (b/javadoc ...), but clojure.tools.build.api
has NO javadoc (only javac). A qualified ref to a missing var fails at COMPILE time, so
the whole build ns failed to load — breaking *every* -T:build task (compile-java included)
and, critically, :deps/prep-lib: a git dep on datahike couldn't compile its Java API, so
downstream projects were forced onto :local/root. (Local checkouts only worked via a stale
target/classes built before #759.)

Reimplement javadoc via b/process shelling to the JDK javadoc tool with the project
classpath. build.clj now loads; clj -T:build compile-java + javadoc both run (javadoc
exits 1 on undocumented-element warnings — non-fatal, docs still generated). This lets
consumers use datahike as a git dep with 'clojure -X:deps prep' again.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

index selection does not work

1 participant