Skip to content

Leios: late-join support, rebased#2057

Closed
geo2a wants to merge 12 commits into
leios-prototypefrom
geo2a/leios-late-join-v2
Closed

Leios: late-join support, rebased#2057
geo2a wants to merge 12 commits into
leios-prototypefrom
geo2a/leios-late-join-v2

Conversation

@geo2a

@geo2a geo2a commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

This is a combination of #2040 and #2048, rebased on top of latest leios-prototype

dnadales and others added 12 commits June 1, 2026 16:52
Parameterise runThreadNet over NodeJoinPlan and add a new property
that starts node 3 at a random slot while nodes 0-2 run from slot 0.
Demonstrates the crash in 'resolveLeiosBlock' when a late-joining
node encounters a CertRB referencing an EB it never received.

The fix lives later in the workstream; this commit just makes the
gap visible.
Add 'hbIsCertRB' to 'HeaderBody' (and mirror on 'HeaderView') for the
CIP-0164 header bit signalling that this RB certifies a
previously-announced EB.  Thread the bit through 'mkHeader' (Praos
and TPraos; the latter ignores it) and the Shelley forge path.

Encode canonically: every header is len=12 carrying
@(Bool, Maybe EbAnnouncement)@.  Decode still accepts len=10
(pre-Leios) and len=11 (announcement-only) for back-compat with
existing on-disk data; new encodings never produce those shapes.
Two valid encodings for the same logical header would have made
hashing / signature over the encoded form non-canonical.

Per-header cost: one byte for the Bool plus one for the
Maybe-Nothing tag when no announcement is present.

Co-authored-by: Damian Nadales <damian.only@gmail.com>, Georgy Lukyanov <georgy.lukyanov@iohk.io>
Co-authored-by: Georgy Lukyanov <georgy.lukyanov@iohk.io>
…ults

ChainSel needs two header-level queries to identify CertRBs whose
certified EB closure is not locally available: 'headerIsCertRB' on
the candidate header, and 'headerEbAnnouncement' on its parent.

Add both methods to 'ResolveLeiosBlock' without defaults: a silent
'False' / 'Nothing' default would let a future block-type author
forget to override, and ChainSel would silently degrade to "never
filter a CertRB" without a compile error.  Every instance now
defines all three methods.

The Praos Shelley instance reads 'hbIsCertRB' / 'hbMayEbAnnouncement'
off the body.  Cardano dispatches to Conway for both methods.  Every
other instance (Byron, mock, test, single-era HFC wrappers) spells
out the "never a CertRB" stance explicitly.

Co-authored-by: Georgy Lukyanov <georgy.lukyanov@iohk.io>
Add 'readCompletedClosures :: m (Set EbHash)' to 'LeiosDbHandle'.  The
handle owns a TVar; ChainSel will read it on the block-add hot path
(O(1) 'readTVarIO').

Seed at construction:
- SQLite: 'SELECT ebHashBytes FROM ebs WHERE missingTxCount IS NOT
  NULL AND missingTxCount <= 0'.  Covers both "just completed" (0)
  and "completed and notified" (-1); both states mean the closure is
  in the DB.  Run on a short-lived connection that also guarantees
  schema initialisation before any 'open'-ed connection later.
- In-memory: derive from 'imTxs' / 'imEbBodies' via the same
  predicate the insert paths use.

Update inside the existing insert paths:
- Both SQLite insert paths share a 'findAndMarkCompletedEbs' helper
  inside the BEGIN and a 'notifyAndCacheCompleted' helper after
  COMMIT.  The notify+cache step pushes the just-transitioned
  closures into the cache.
- In-memory insert paths do the same update inside their STM
  transaction, so the state mutation and the cache update are
  atomic.

'LeiosDemoLogic.msgLeiosBlock' now also emits
'TraceLeiosBlockTxsAcquired' for closures completed by a body
insert (not just tx inserts), matching the symmetry the cache update
exposes.

Cache is unbounded for now; future work caps it to a k-window with
DB query on miss.  See 'readCompletedClosures' TODO and the
late-join plan.

(cherry picked from commit f3c2484)
Co-authored-by: Georgy Lukyanov <georgy.lukyanov@iohk.io>
ChainSel must not select a chain that includes a CertRB whose certified
EB closure is not locally available; otherwise 'resolveLeiosBlock'
crashes when the block-add path tries to recover the closure.

Wire-up:
- 'CDB' carries 'cdbLeiosDbHandle :: LeiosDbHandle m' so ChainSel can
  read the closure cache on the block-add hot path without threading
  the snapshot through every caller.
- 'chainSelectionForBlock' reads 'readCompletedClosures' on each
  iteration; the read is O(1) (TVar) so this is cheap.
- New 'ignorePendingCertRBs' wrapper around 'lookupBlockInfo'.  Mirrors
  the existing 'ignoreInvalid' wrapper.  Both lookup paths
  ('lookupBlockInfo'' and 'succsOf'') filter against the same set.

Filter body itself is a stub:
'computeCertRBsWithPendingEbClosures' returns 'Set.empty'.  The real
implementation walks the VolatileDB forward from the immutable tip and,
for each header satisfying 'headerIsCertRB', extracts the certified
'EbHash' from the parent's 'headerEbAnnouncement' and checks it against
'readCompletedClosures'.  Lands in the next step of the late-join
workstream.

'Node.hs' / 'Test/ThreadNet/Network.hs' pass the handle to
'openChainDB'; no more 'LeiosOutstanding' MVar lifting.

(cherry picked from commit a62689d)
Co-authored-by: Georgy Lukyanov <georgy.lukyanov@iohk.io>
…names

'LeiosOfferBlock' / 'LeiosOfferBlockTxs' are constructor names from an
earlier iteration of the notification ADT.  The current type
('LeiosEbNotification' in 'LeiosDemoDb.Common') has 'AcquiredEb' /
'AcquiredEbTxs'; the old names linger only in stale comments.

Four sites updated: the 'leiosDbInsertTxs' haddock in
'LeiosDemoDb.Common' and three test comments in
'Test.LeiosDemoDb'.

Comment-only change; no behaviour delta.

(cherry picked from commit a080a5a)
Co-authored-by: Georgy Lukyanov <georgy.lukyanov@iohk.io>
Add an STM accessor on the VolatileDB API that returns the
'(IsCertRB, Maybe EbAnnouncement)' pair extracted from each block's
header at parse time.

Storage:
- 'ParsedBlockInfo' gains 'pbiLeiosFields'.
- 'InternalBlockInfo' gains 'ibiLeiosFields'.
- Kept separate from 'BlockInfo' so the public record stays
  Leios-free.

Population:
- A new 'extractLeiosFields' helper in 'Impl.Parser' applies
  'headerIsCertRB' and 'headerEbAnnouncement' to the block's header.
- The parse loop snapshots the fields alongside 'pbiBlockInfo'.
- 'putBlockImpl' snapshots them when a fresh block enters the
  in-memory index.

Constraints:
- 'ResolveLeiosBlock blk' is threaded through 'parseBlockFile',
  'mkOpenState', 'mkOpenStateHelper', 'openDB', and 'putBlockImpl'.
- The ChainDB open path already carries this constraint, so the only
  external caller that needs touching is 'DBImmutaliser.withDBs'.

The mock VolatileDB returns 'const Nothing' to keep the tests
compiling; a later change can refine it.

No consumer reads 'getLeiosFields' yet.  The accessor is wiring for
the upcoming pending-CertRB cache.

(cherry picked from commit 281bfab)
Co-authored-by: Georgy Lukyanov <georgy.lukyanov@iohk.io>
Fill in 'computeCertRBsWithPendingEbClosures' (previously a stub
returning the empty set).  It walks the VolatileDB forward from the
immutable tip via 'succsOf', reading each header's Leios fields from
'VolatileDB.getLeiosFields', and flags every CertRB whose certified EB
closure is not yet in the completed-closure snapshot
('readCompletedClosures').

A CertRB certifies the EB announced by its immediate parent
('updateChainDepState' overwrites the tracked announcement on every
block), so the walk carries each block's announcement down to its
successors and pairs it with the child CertRB.

This stops a late-joining node from crashing in 'resolveLeiosBlock'
on a CertRB whose EB closure it never observed.  Re-selection once the
closure arrives is handled by a later commit.

Squash-port of the real filter from leios-late-join af9ce50, adapted
to the v2 foundation (split hbIsCertRB / hbLeiosEbAnnouncement header
fields + getLeiosFields + readCompletedClosures, rather than
hbMayCertifiedEb + a cdbPendingEBs TVar).

Co-authored-by: Georgy Lukyanov <georgy.lukyanov@iohk.io>
Add 'ebCompletionRunner', a background thread that subscribes to the
LeiosDB notification stream and, on every EB/closure acquisition,
enqueues a LoE-style reprocess.

A CertRB whose certified EB closure is missing is hidden from candidate
selection by 'computeCertRBsWithPendingEbClosures'.  Once the closure
arrives, the filter stops hiding it, but without this thread nothing
would re-run chain selection, so a late-joining node's chain would
stay short.  The deferred CertRB is a successor of a block on the
current chain, so the existing LoE reprocess path ('addReprocessLoEBlocks')
reconsiders it; because the filter is recomputed from scratch each pass,
a blanket re-trigger needs no dedicated pending-block state.

Squash-port of the reactive re-trigger from leios-late-join 75ef78b
(and follow-ups bcc9089 / 8414ae3 / ca87a5b), simplified onto the
v2 foundation: re-uses the LoE reprocess queue instead of a bespoke
'ChainSelReprocessBlock' message + 'cdbPendingEBs' map.

Co-authored-by: Georgy Lukyanov <georgy.lukyanov@iohk.io>
Make a late-joining node actually pull the EB closures it is missing,
so the CertRBs that ChainSel is holding back can eventually be selected
and the node converges with its peers.

ChainDB side:
- 'ChainSel' records the EB closures it is waiting on in a new
  'cdbPendingEBs' TVar, keyed by the missing EB's 'LeiosPoint' with that
  EB's announced byte size (both produced by
  'computeCertRBsWithPendingEbClosures', which already walks the volatile
  chain).  Exposed via a new 'ChainDB.getPendingCertRBs' query.

NodeKernel side:
- 'pendingEbReconciler' mirrors 'getPendingCertRBs' into the Leios
  outstanding 'missingEbBodies' at the announced size (so the fetch
  request matches the body the peer returns), making the LeiosFetch
  client request those closures.
- 'leiosFetchLogic' derives, per peer, the certified-EB hashes implied
  by that peer's ChainSync candidate fragment and passes them to
  'leiosFetchLogicIteration' as a fallback peer source.  A CertRB
  certifies the EB announced by its immediate predecessor, so the
  fragment is folded oldest-to-newest tracking the running announcement.

LeiosDemoLogic side:
- 'leiosFetchLogicIteration' takes the per-peer candidate-certified-EB
  map and falls back to it in 'choosePeerEb' / 'choosePeerTx' when no
  peer has offered the EB body — the case for closures that pre-date a
  late-joining node.

Squash-port of leios-late-join 525a3ef / 0ad359d / 8311774 /
181cdf6 / 8fc0444, adapted to the v2 split-header model (the
certified EB is derived from the parent's 'headerEbAnnouncement' rather
than a 'hbMayCertifiedEb' field) and to this branch's pending-state
shape ('cdbPendingEBs' keyed by 'LeiosPoint', populated by the filter
walk).

Co-authored-by: Georgy Lukyanov <georgy.lukyanov@iohk.io>
Now that a late-joining node filters CertRBs with missing EB closures,
fetches those closures, and re-runs chain selection when they arrive,
strengthen the property from "must not crash" to "must converge":

- Cap the random join slot at numSlots/4 so the late node always has at
  least three quarters of the run to catch up; samples near the end
  would otherwise fail for catch-up-bandwidth reasons unrelated to the
  late-join logic.
- Replace the exception-catching body with a 'conjoin' that requires a
  non-empty result and that all nodes end on the same chain.

Drops the now-unused 'Control.Exception' import and the
'ioProperty'/'property' QuickCheck imports.

Squash-port of leios-late-join ec16b11 / dd47734 / b64f327,
on the v2 Dijkstra test.

Co-authored-by: Georgy Lukyanov <georgy.lukyanov@iohk.io>
@geo2a

geo2a commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

Closing this in favour of #2048.

@geo2a geo2a closed this Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants