Leios: late-join support, rebased#2057
Closed
geo2a wants to merge 12 commits into
Closed
Conversation
Parameterise runThreadNet over NodeJoinPlan and add a new property that starts node 3 at a random slot while nodes 0-2 run from slot 0. Demonstrates the crash in 'resolveLeiosBlock' when a late-joining node encounters a CertRB referencing an EB it never received. The fix lives later in the workstream; this commit just makes the gap visible.
Add 'hbIsCertRB' to 'HeaderBody' (and mirror on 'HeaderView') for the CIP-0164 header bit signalling that this RB certifies a previously-announced EB. Thread the bit through 'mkHeader' (Praos and TPraos; the latter ignores it) and the Shelley forge path. Encode canonically: every header is len=12 carrying @(Bool, Maybe EbAnnouncement)@. Decode still accepts len=10 (pre-Leios) and len=11 (announcement-only) for back-compat with existing on-disk data; new encodings never produce those shapes. Two valid encodings for the same logical header would have made hashing / signature over the encoded form non-canonical. Per-header cost: one byte for the Bool plus one for the Maybe-Nothing tag when no announcement is present. Co-authored-by: Damian Nadales <damian.only@gmail.com>, Georgy Lukyanov <georgy.lukyanov@iohk.io> Co-authored-by: Georgy Lukyanov <georgy.lukyanov@iohk.io>
…ults ChainSel needs two header-level queries to identify CertRBs whose certified EB closure is not locally available: 'headerIsCertRB' on the candidate header, and 'headerEbAnnouncement' on its parent. Add both methods to 'ResolveLeiosBlock' without defaults: a silent 'False' / 'Nothing' default would let a future block-type author forget to override, and ChainSel would silently degrade to "never filter a CertRB" without a compile error. Every instance now defines all three methods. The Praos Shelley instance reads 'hbIsCertRB' / 'hbMayEbAnnouncement' off the body. Cardano dispatches to Conway for both methods. Every other instance (Byron, mock, test, single-era HFC wrappers) spells out the "never a CertRB" stance explicitly. Co-authored-by: Georgy Lukyanov <georgy.lukyanov@iohk.io>
Add 'readCompletedClosures :: m (Set EbHash)' to 'LeiosDbHandle'. The handle owns a TVar; ChainSel will read it on the block-add hot path (O(1) 'readTVarIO'). Seed at construction: - SQLite: 'SELECT ebHashBytes FROM ebs WHERE missingTxCount IS NOT NULL AND missingTxCount <= 0'. Covers both "just completed" (0) and "completed and notified" (-1); both states mean the closure is in the DB. Run on a short-lived connection that also guarantees schema initialisation before any 'open'-ed connection later. - In-memory: derive from 'imTxs' / 'imEbBodies' via the same predicate the insert paths use. Update inside the existing insert paths: - Both SQLite insert paths share a 'findAndMarkCompletedEbs' helper inside the BEGIN and a 'notifyAndCacheCompleted' helper after COMMIT. The notify+cache step pushes the just-transitioned closures into the cache. - In-memory insert paths do the same update inside their STM transaction, so the state mutation and the cache update are atomic. 'LeiosDemoLogic.msgLeiosBlock' now also emits 'TraceLeiosBlockTxsAcquired' for closures completed by a body insert (not just tx inserts), matching the symmetry the cache update exposes. Cache is unbounded for now; future work caps it to a k-window with DB query on miss. See 'readCompletedClosures' TODO and the late-join plan. (cherry picked from commit f3c2484) Co-authored-by: Georgy Lukyanov <georgy.lukyanov@iohk.io>
ChainSel must not select a chain that includes a CertRB whose certified
EB closure is not locally available; otherwise 'resolveLeiosBlock'
crashes when the block-add path tries to recover the closure.
Wire-up:
- 'CDB' carries 'cdbLeiosDbHandle :: LeiosDbHandle m' so ChainSel can
read the closure cache on the block-add hot path without threading
the snapshot through every caller.
- 'chainSelectionForBlock' reads 'readCompletedClosures' on each
iteration; the read is O(1) (TVar) so this is cheap.
- New 'ignorePendingCertRBs' wrapper around 'lookupBlockInfo'. Mirrors
the existing 'ignoreInvalid' wrapper. Both lookup paths
('lookupBlockInfo'' and 'succsOf'') filter against the same set.
Filter body itself is a stub:
'computeCertRBsWithPendingEbClosures' returns 'Set.empty'. The real
implementation walks the VolatileDB forward from the immutable tip and,
for each header satisfying 'headerIsCertRB', extracts the certified
'EbHash' from the parent's 'headerEbAnnouncement' and checks it against
'readCompletedClosures'. Lands in the next step of the late-join
workstream.
'Node.hs' / 'Test/ThreadNet/Network.hs' pass the handle to
'openChainDB'; no more 'LeiosOutstanding' MVar lifting.
(cherry picked from commit a62689d)
Co-authored-by: Georgy Lukyanov <georgy.lukyanov@iohk.io>
…names
'LeiosOfferBlock' / 'LeiosOfferBlockTxs' are constructor names from an
earlier iteration of the notification ADT. The current type
('LeiosEbNotification' in 'LeiosDemoDb.Common') has 'AcquiredEb' /
'AcquiredEbTxs'; the old names linger only in stale comments.
Four sites updated: the 'leiosDbInsertTxs' haddock in
'LeiosDemoDb.Common' and three test comments in
'Test.LeiosDemoDb'.
Comment-only change; no behaviour delta.
(cherry picked from commit a080a5a)
Co-authored-by: Georgy Lukyanov <georgy.lukyanov@iohk.io>
Add an STM accessor on the VolatileDB API that returns the '(IsCertRB, Maybe EbAnnouncement)' pair extracted from each block's header at parse time. Storage: - 'ParsedBlockInfo' gains 'pbiLeiosFields'. - 'InternalBlockInfo' gains 'ibiLeiosFields'. - Kept separate from 'BlockInfo' so the public record stays Leios-free. Population: - A new 'extractLeiosFields' helper in 'Impl.Parser' applies 'headerIsCertRB' and 'headerEbAnnouncement' to the block's header. - The parse loop snapshots the fields alongside 'pbiBlockInfo'. - 'putBlockImpl' snapshots them when a fresh block enters the in-memory index. Constraints: - 'ResolveLeiosBlock blk' is threaded through 'parseBlockFile', 'mkOpenState', 'mkOpenStateHelper', 'openDB', and 'putBlockImpl'. - The ChainDB open path already carries this constraint, so the only external caller that needs touching is 'DBImmutaliser.withDBs'. The mock VolatileDB returns 'const Nothing' to keep the tests compiling; a later change can refine it. No consumer reads 'getLeiosFields' yet. The accessor is wiring for the upcoming pending-CertRB cache. (cherry picked from commit 281bfab) Co-authored-by: Georgy Lukyanov <georgy.lukyanov@iohk.io>
Fill in 'computeCertRBsWithPendingEbClosures' (previously a stub
returning the empty set). It walks the VolatileDB forward from the
immutable tip via 'succsOf', reading each header's Leios fields from
'VolatileDB.getLeiosFields', and flags every CertRB whose certified EB
closure is not yet in the completed-closure snapshot
('readCompletedClosures').
A CertRB certifies the EB announced by its immediate parent
('updateChainDepState' overwrites the tracked announcement on every
block), so the walk carries each block's announcement down to its
successors and pairs it with the child CertRB.
This stops a late-joining node from crashing in 'resolveLeiosBlock'
on a CertRB whose EB closure it never observed. Re-selection once the
closure arrives is handled by a later commit.
Squash-port of the real filter from leios-late-join af9ce50, adapted
to the v2 foundation (split hbIsCertRB / hbLeiosEbAnnouncement header
fields + getLeiosFields + readCompletedClosures, rather than
hbMayCertifiedEb + a cdbPendingEBs TVar).
Co-authored-by: Georgy Lukyanov <georgy.lukyanov@iohk.io>
Add 'ebCompletionRunner', a background thread that subscribes to the
LeiosDB notification stream and, on every EB/closure acquisition,
enqueues a LoE-style reprocess.
A CertRB whose certified EB closure is missing is hidden from candidate
selection by 'computeCertRBsWithPendingEbClosures'. Once the closure
arrives, the filter stops hiding it, but without this thread nothing
would re-run chain selection, so a late-joining node's chain would
stay short. The deferred CertRB is a successor of a block on the
current chain, so the existing LoE reprocess path ('addReprocessLoEBlocks')
reconsiders it; because the filter is recomputed from scratch each pass,
a blanket re-trigger needs no dedicated pending-block state.
Squash-port of the reactive re-trigger from leios-late-join 75ef78b
(and follow-ups bcc9089 / 8414ae3 / ca87a5b), simplified onto the
v2 foundation: re-uses the LoE reprocess queue instead of a bespoke
'ChainSelReprocessBlock' message + 'cdbPendingEBs' map.
Co-authored-by: Georgy Lukyanov <georgy.lukyanov@iohk.io>
Make a late-joining node actually pull the EB closures it is missing, so the CertRBs that ChainSel is holding back can eventually be selected and the node converges with its peers. ChainDB side: - 'ChainSel' records the EB closures it is waiting on in a new 'cdbPendingEBs' TVar, keyed by the missing EB's 'LeiosPoint' with that EB's announced byte size (both produced by 'computeCertRBsWithPendingEbClosures', which already walks the volatile chain). Exposed via a new 'ChainDB.getPendingCertRBs' query. NodeKernel side: - 'pendingEbReconciler' mirrors 'getPendingCertRBs' into the Leios outstanding 'missingEbBodies' at the announced size (so the fetch request matches the body the peer returns), making the LeiosFetch client request those closures. - 'leiosFetchLogic' derives, per peer, the certified-EB hashes implied by that peer's ChainSync candidate fragment and passes them to 'leiosFetchLogicIteration' as a fallback peer source. A CertRB certifies the EB announced by its immediate predecessor, so the fragment is folded oldest-to-newest tracking the running announcement. LeiosDemoLogic side: - 'leiosFetchLogicIteration' takes the per-peer candidate-certified-EB map and falls back to it in 'choosePeerEb' / 'choosePeerTx' when no peer has offered the EB body — the case for closures that pre-date a late-joining node. Squash-port of leios-late-join 525a3ef / 0ad359d / 8311774 / 181cdf6 / 8fc0444, adapted to the v2 split-header model (the certified EB is derived from the parent's 'headerEbAnnouncement' rather than a 'hbMayCertifiedEb' field) and to this branch's pending-state shape ('cdbPendingEBs' keyed by 'LeiosPoint', populated by the filter walk). Co-authored-by: Georgy Lukyanov <georgy.lukyanov@iohk.io>
Now that a late-joining node filters CertRBs with missing EB closures, fetches those closures, and re-runs chain selection when they arrive, strengthen the property from "must not crash" to "must converge": - Cap the random join slot at numSlots/4 so the late node always has at least three quarters of the run to catch up; samples near the end would otherwise fail for catch-up-bandwidth reasons unrelated to the late-join logic. - Replace the exception-catching body with a 'conjoin' that requires a non-empty result and that all nodes end on the same chain. Drops the now-unused 'Control.Exception' import and the 'ioProperty'/'property' QuickCheck imports. Squash-port of leios-late-join ec16b11 / dd47734 / b64f327, on the v2 Dijkstra test. Co-authored-by: Georgy Lukyanov <georgy.lukyanov@iohk.io>
Contributor
Author
|
Closing this in favour of #2048. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a combination of #2040 and #2048, rebased on top of latest
leios-prototype