diff-buf write-buffering + index-root fusion (PSS-backed indexes)#836
Open
whilo wants to merge 23 commits into
Open
diff-buf write-buffering + index-root fusion (PSS-backed indexes)#836whilo wants to merge 23 commits into
whilo wants to merge 23 commits into
Conversation
Opt-in via :fuse-index-roots? (default false). When enabled, db->stored inlines each flushed index's in-memory root node into the db-record (:eavt-root/:aevt-root/:avet-root + temporal) and commit! excludes those root addresses from the separate-object writes (pending-writes drain). stored->db seeds the inlined root back via di/-seed-root!, so root() returns it with no storage round-trip; deeper children stay lazy. Saves one object write per index root per commit (and one cold-open GET per index); for a single-leaf index the whole index inlines. History is preserved (per-commit cid records). Read is presence-based (:eavt-root), so fused and legacy records both restore. Gated off under :crypto-hash? for now (the audit walk reads the root from storage by address). New index protocol methods -root-node / -seed-root! (PSS impl; clj). Validated: cold-restart (separate JVM) roundtrips correctly at :keep-history? true with indexed attrs, retraction, and range slices; write count drops (e.g. 5->3 objects/commit with two active indexes); store-test green with fusion off. See doc/index-root-fusion.md.
Wires the persistent-sorted-set op-buf write-optimization into the index adapter so a commit buffers content-only child diffs into the rewritten ancestor instead of rewriting the full spine (~1 PUT/commit for small commits). Composes with index-root fusion: the buffered diffs ride in the fused db-record. - Branch fressian handlers round-trip :slots (.slotsForStorage / reconstruct _slots on read); emitted only when present ⇒ opBufSize=0 / legacy DBs are byte-identical (back-compat). - op-buf-size threaded into fresh-set Settings; single knob = JVM sysprop pss.opBufSize (TODO: promote to a config key). Shared node-deserialization Settings already honors it via Settings.defaultOpBufSize. - Per-index storage view (with-comparator) carries the index comparator so buffered-leaf projection routes by value on cold restore (CachedStorage gains a cmp field + IStorage.comparator()). - deps.edn: PSS -> :local/root (dev) for the op-buf-v5 build. Validated: file-backed DB, build over 60 commits, fresh cold reopen, full query equality (count/sum/lookup across eavt/aevt/avet) vs baseline at B=0/64/256/1024, with fusion on. JVM-only; cljs falls back to baseline. Crypto-hash + op-buf and GC/markFreed tracking remain (tracked as debt).
(:storage store) is nil for backends without a CachedStorage (e.g. :mem); (assoc nil :cmp) produced a plain map that then failed to cast to IStorage. Guard with instance? CachedStorage so nil/other storages pass through unchanged. Restores I0 (datahike index/ident/db tests green at opBufSize=0).
Both are create-time-fixed PSS-index settings, now sourced from the persisted :index-config (defaults 0 and 512 — existing stores, built at 512 with no op-buf, are unaffected). Threaded into fresh-set creation (empty-index/init-index) AND the node-deserialization Settings (previously hardcoded 512 — the spot that would have corrupted a non-512 store on restore). op-buf-size keeps the pss.opBufSize sysprop as an experiment-only fallback. Settings built via the 5-arg normalizing ctor (defaults refType=SOFT). I0 spot-check (index/db tests) green. NOTE: connect-time reconcile (adopt stored value so reconnect needn't re-specify, + fuse default flip) is the next, separate step.
…pt-in
adopt-stored-fixed: at connect, source :fuse-index-roots? and :index-config
{:branching-factor :op-buf-size} from the STORED config (adopt, or drop when the
store predates the key). Existing stores connect unchanged; new stores that set
these reconnect without re-specifying; the strict consistency check still guards
every other key. Explicit create-time-fixed-keys set documents the immutable set.
Kept *default-fuse-index-roots?* FALSE: flipping it globally breaks the merkle-audit
walk and online GC, which read index roots as separate konserve objects — fusion
inlines them into the db-record (verified: audit-verify-test + gc errored with
:audit/node-missing on all roots; green again once reverted). Fusion stays opt-in
until audit/GC are made fusion-aware. Reconcile validated: new/existing/op-buf/bf
stores all create→release→reconnect cleanly; core/api/db/index/audit/gc green
(1 pre-existing config-test default-assertion failure, unrelated).
…to-hash Index-root fusion inlines each index root into the db-record, so the root is NOT a separate konserve object. Previously the audit walk and online GC read roots by address from konserve → :audit/node-missing for every root when fusion was on. - GC (reachable-in-branch): seed inlined roots into their indexes before -mark (mirrors stored->db), so walk-addresses uses the inlined root and only fetches its children. - Audit (-recompute-merkle-root): add walk-pss-node! — when the root address has no separate object (fused), verify the seeded in-memory root's content hash (still detects db-record root tampering) and recurse children (separate objects) as usual. - writing.cljc: drop the fusion×crypto-hash mutual-exclusion gate — fusion+crypto now compose (root address is still its content hash; audit verifies the inlined root). - config-test: expect :fuse-index-roots? in the default config (load-config has always added it — pre-existing assertion gap). Validated: crypto-hash + fusion → verify-chain :ok (0 mismatch/missing); fusion + GC walk completes, data intact. Global default kept false pending a suite-wide object-count test update; fusion opt-in per store. Focused suite (config/audit/gc/core/api/db/index) 62 tests, 295 assertions, 0 failures. Resolves the audit/GC half of #57.
Under crypto-hash a Branch address is uuid(child-addresses). With op-buf a buffered child's stored address is its ANCHOR (old content hash) and the diff lives in the parent's slots — so the branch address ignored the diff → two logically-different trees with the same anchors collided. Fix: branch-crypto-uuid folds the slots into the hash — uuid(canon [addresses slots]) — so the address reflects the durable representation (anchors + diff); the audit walks (walk-pss-address!/walk-pss-node!) recompute the same from the stored node. normalizes Datoms→vectors so the diff hashes identically whether it's a live PersistentTreeMap (store) or a deserialized plain map (restore). Back-compat: when there are no slots (baseline / existing crypto stores) the hash is UNCHANGED (uuid(addresses)). Consistent with the merkle already being representation-dependent; op-buf-size is create-time-fixed per store so the root stays deterministic. Validated: crypto+op-buf, crypto+op-buf+fusion, and baseline crypto all verify-chain :ok on cold reopen (count 3000); audit/index/gc suites 25 tests 0 failures. Resolves #54.
…change tests - op-buf-size made cross-platform (cljs returns 0 fallback, no sysprop). - cljs empty-index/init-index thread op-buf-size + with-comparator; CachedStorage comparator() cross-platform. - cljs Branch read handler reconstructs _slots (anchor = child address) + 9-arg ctor; cljs BTSet read handler threads with-comparator; cljs write handler emits :slots via branch/slots-for-storage. - nodejs_test: cljs-opbuf-write-roundtrip-test (validated: 30 buffered blobs, cold reproject exact) + jvm-opbuf-exchange-test (skips if artifact absent).
…survive) Validates the cljs $remove slot-carry through structural merge/borrow: insert 2000, retract even :n in small commits, cold reopen → exactly odds survive (count 1000, sum 1000000). 57 buffered-slot blobs written by cljs. 18 tests/102 assertions/0 failures.
… → replace) Insert 1000 ids with :n 0, update each :n to its id in small commits (upsert routes to psset/replace → Branch.$replace for eavt/aevt), cold reopen → every :n == its :id, sum 499500. 30 buffered-slot blobs. 19 tests/107 assertions/0 failures.
…s reference set) Seeded-LCG randomized insert/retract churn over a >bf (branch-node) tree under op-buf-size 64 (frequent merge/borrow/split + buffer/write decisions), periodic + final cold reopens compared to a reference id-set. Bulk-seeds 2000 then 40 churn rounds, 7 cold checks; 75 buffered-slot blobs confirm op-buf actually engaged. 20 tests/113 assertions/0 failures.
…compute-merkle-root cljs merkle auditing never worked before: -recompute-merkle-root was :cljs-not-implemented, gen-address (cljs) hashed only addresses (not op-buf slots ⇒ mismatch vs clj), and -merkle-root read .-_address (clj field) instead of cljs .-address. Now canon, branch- crypto-uuid (folds slots via branch/slots-for-storage), gen-address, walk-pss-address!, walk-pss-node!, node-class-name, -merkle-root, -recompute-merkle-root are all cross-platform. Gate: cljs-merkle-audit-test re-derives every node hash from storage for crypto baseline + crypto+op-buf, warm + cold reopen, all :ok. 21 tests/117 assertions/0 failures. NOTE: datahike.audit/verify-chain does NOT cljs-compile yet (separate core.async go-try- macroexpansion bug at audit.cljc:54); test calls index-level -recompute-merkle-root directly.
…tests The konserve cljs header meta-size bug (single-byte vs JVM 4-byte big-endian) broke JVM<->cljs konserve exchange, blocking cljs connect to JVM-written datahike stores. Point konserve at ../konserve (dev) for the fix. Tests: jvm-opbuf-exchange-test now PASSES (cljs connects to a JVM-written op-buf store, reads identical datoms — buffered slots reproject cross-host); xhost-fress-probe-test reads JVM-konserve-written namespaced keywords cross-host. 22 tests/125 assertions/0 failures; JVM clj unbroken.
…e.async go macro) superv.async/go-try- expands to clojure.core.async/go; without requiring that macro in the ns, the cljs build fell back to the JVM go macro and failed (vary-meta on keyword in go-impl). audit.cljc never required core.async (it was never cljs-compiled before). Mirror datahike.versioning: require [clojure.core.async :refer [<!]] + (:require-macros [clojure.core.async :refer [go]]) for cljs. cljs-merkle-audit-test now exercises the real verify-chain :deep? API (crypto baseline + op-buf, warm + cold) — all :ok. 22 tests/129 assertions/0 failures; JVM clj verify-chain unbroken.
…f-buf-size, pss.diffBufSize) Mechanical rename to match persistent-sorted-set (op-buf was the hitchhiker/Bε term; we buffer a per-child DIFF at the serialization boundary). Config key :op-buf-size → :diff-buf-size (safe: not released, only dev stores); create-time-fixed key + adopt-stored-fixed updated; sysprop pss.opBufSize → pss.diffBufSize. Validated: cljs 22 tests/122 assertions/0; clj crypto+diff-buf+fusion audit :ok. (Datahike default-on flip deferred — it churns the suite's object-count assertions, same as the fuse-default flip.)
default-index-config :datahike.index/persistent-set → {:diff-buf-size 256}, baked into the
stored config at create time so EXISTING stores keep their value (adopt-stored-fixed sources
it from the store; diff-buf-size fn defaults 0 when absent ⇒ pre-diff-buf stores stay
baseline). Set {:diff-buf-size 0} to disable. Fixes: config-test expected default
(:index-config {:diff-buf-size 256}); reverted an over-eager rename in upsert_impl_test
(the hitchhiker-tree's :op-buf is the real Bε operation buffer, NOT our diff-buf — must not
rename). clj-pss 521 tests/2473 assertions/0; cljs 22/122/0.
Drops the development-trajectory version label in favor of the shipping name; refers to the PSS doc/diff-buffering.md design. Comment-only change.
- index/persistent_set: remove CachedStorage `cmp` field, its `comparator` impl, and `with-comparator`. The per-index comparator now lives on the PSS and propagates to Branch nodes (Branch._projCmp); storage stays comparator-agnostic. Matches persistent-sorted-set a36ecbe. - deps.edn: konserve -> 0.9.349 (released; includes cljs cross-host header meta-size fix #143); persistent-sorted-set -> git a36ecbe (diff-buf; `clojure -X:deps prep` compiles its Java). No more :local/root deps. - test/store_test: add diff-buf upsert+reopen regression — value-changing upserts survive store->reopen with no stale/duplicate datoms (guards the comparator-agnostic {:absent :present} leaf-diff serialization). - nodejs_test: cljfmt (whitespace only). clj-pss: 522 tests / 2475 assertions / 0 failures (at -Xmx4g).
63aeb0b to
997d4cf
Compare
diff-buf write-buffering trades in-memory insert throughput for fewer durable object PUTs (~7->1/commit) — it only pays off on a request-priced object store. An in-memory (:memory/:mem) store has no PUTs to fold, so the buffering is pure overhead: measured ~1.5-1.8x slower pure-insert throughput for zero benefit. Make the default :diff-buf-size backend-aware via default-index-config-for-backend: 0 for the in-memory backend, 256 (unchanged) for durable stores. Index-agnostic (only touches the key when the index default carries it, i.e. PSS) and an explicit user :index-config still wins (deep-merged over the default). storeless-config is inherently in-memory, so it defaults off too. Update config-test expectations.
Random transact / value-upsert / retractEntity vs a Clojure model, with a release+
reconnect each cycle (cold fressian reload). Exercises the full stack together that the
PSS-level (edn) harness can't reach: PSS diff-buf + fressian :slots handlers + commit-log
+ HEAD + crypto-hash. Deterministic (java.util.Random seed) — failures reproduce from
(seed,params). Swept {diff-buf 0/256} × {crypto-hash off/on}. Bounded deftest in the suite;
run drives larger sweeps. Validated against local PSS: 12 trials, 0 divergences.
… harness) PSS feature/op-buf-v5 a36ecbe -> 2063823: anchorless-deposit skip (bulk-load throughput), the in-memory subtreeCount-drift fix (count after restore+mutate), and the seeded stress harness (content/count/measure/GC/address-determinism). Verified: clojure -X:deps prep compiles the git PSS Java cleanly and datahike loads + round-trips against it.
Since #759, build.clj's javadoc fn called (b/javadoc ...), but clojure.tools.build.api has NO javadoc (only javac). A qualified ref to a missing var fails at COMPILE time, so the whole build ns failed to load — breaking *every* -T:build task (compile-java included) and, critically, :deps/prep-lib: a git dep on datahike couldn't compile its Java API, so downstream projects were forced onto :local/root. (Local checkouts only worked via a stale target/classes built before #759.) Reimplement javadoc via b/process shelling to the JDK javadoc tool with the project classpath. build.clj now loads; clj -T:build compile-java + javadoc both run (javadoc exits 1 on undocumented-element warnings — non-fatal, docs still generated). This lets consumers use datahike as a git dep with 'clojure -X:deps prep' again.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
diff-buf write-buffering + index-root fusion (PSS-backed indexes)
Two related write-amplification reductions for content-addressed (konserve) persistence, both landing on the persistent-sorted-set index.
1. diff-buf (per-child diff buffering at the serialization boundary)
Backed by persistent-sorted-set PR replikativ/persistent-sorted-set#6. A normal commit rewrites the whole root→leaf spine (~
depth+1object PUTs). With diff-buf, a rewritten branch buffers each content-only child's diff into a per-child slot and re-points the child to its durable anchor, so a content-only commit costs ~1 PUT instead of~depth+1. Reads project the buffered diff back lazily on descent (baseline read cost). datahike wiring::slotsround-trip in the fressianBranchread/write handlers (clj + cljs).:diff-buf-size/:branching-factorare create-time-fixed via:index-config(round-trips with the store;connectadopts the stored value). Default ON (256) for new stores.markFreeddefer freeing to commit so a re-pointed anchor is never freed.2. index-root fusion
Inlines each index's root node into the db-record, so
commit!skips writing those roots as separate objects (one fewer PUT/commit, and it composes with diff-buf toward ~2 PUTs/commit). Opt-in, default OFF (*default-fuse-index-roots?*) for now — flipping the default churns object-count assertions across the suite; the SaaS template opts in, andconnectadopts the stored value so fused and non-fused stores both reconnect. Design note:doc/index-root-fusion.md.Comparator on the tree (storage stays comparator-agnostic)
The per-index comparator now lives on the PSS and propagates to its Branch nodes (
Branch._projCmp), instead of being carried on storage. This PR removesCachedStorage'scmpfield, itscomparatorimpl, andwith-comparator. (Mirrors the upstream PSS change; resolves the earlier review note about a storage-carried comparator.)Dependencies (no local checkouts)
persistent-sorted-set→ gita36ecbe…(branchfeature/op-buf-v5). Its Java is compiled by tools.deps prep — runclojure -X:deps preponce after fetching (CI step).konserve→ released0.9.349(includes the cljs cross-host header meta-size fix index selection does not work #143), replacing the former local dev checkout.Validation
clj-pss(default index = persistent-set, diff-buf on): 522 tests / 2475 assertions / 0 failures (-Xmx4g).store-test/test-diff-buf-upsert-reopen(value-changing upserts survive store→reopen with no stale/duplicate datoms — guards the comparator-agnostic{:absent :present}leaf-diff serialization).Known, non-critical (deferred — not addressed here)
A fatal
Error(e.g. OOM) thrown inside the async write/commit pipeline can hang a synchronoustransact(the no-timeout deref never completes, becausesuperv.asyncgo-trycatchesException, notThrowable). It does not corrupt data (commit is copy-on-write + atomic HEAD flip + free-after-flip). Pre-existing, orthogonal to this PR; the clean fix is a deref timeout. Tracked internally.Notes
op-buf(hitchhiker/Bε operation buffer); renameddiff-bufsince this buffers a per-child diff at the serialization boundary. History can be squashed on merge.