fix(memory): atomic cross-process MySQL checkpoint index (resolves #301) by fede-kamel · Pull Request #304 · oracle-samples/locus

fede-kamel · 2026-06-04T01:36:04Z

Summary

Resolves #301. Makes the MySQL checkpoint index updates atomic across processes, closing the cross-process race that #300's per-instance lock could not cover.

Stacked on #303 (base branch = fix/mysql-pool-idle-in-transaction). The new integration test's DROP TABLE teardown needs #303's rollback-on-release fix to avoid the idle-in-transaction MDL hang. Retarget to main once #303 merges.

The bug

StorageBackendAdapter keeps the {thread}:_checkpoints index as a blob and updates it with a load-modify-save guarded only by a per-instance asyncio.Lock (#300). That serializes writers within one process, but two processes / adapter instances over a shared store hold independent locks, so the read-modify-write interleaves and drops index entries. The checkpoint data keys are never lost (distinct keys, no RMW) — only the index under-reports, so list_checkpoints / time-travel miss checkpoints.

Reproduced against MySQL 9.6 (20 concurrent saves to one thread):

one-instance (shared lock)     = 20, 20, 20   ✅  (#300 in-process fix works)
two-instances (separate locks) = 11, 10, 10   ❌  (cross-process loss)

The fix

Give the MySQL backend atomic index primitives and have the adapter delegate to them when present (other backends keep the lock+blob fallback unchanged):

index_add — single INSERT … ON DUPLICATE KEY UPDATE using JSON_ARRAY_INSERT(data, '$.checkpoints[0]', …). The append is serialized by InnoDB's row lock, so concurrent cross-process writers can't clobber each other. Keeps the adapter's {"checkpoints": [...]} shape, newest-first.
index_remove — SELECT … FOR UPDATE + rewrite, so the index RMW is row-locked across processes.
list_checkpoints — de-duplicates by checkpoint_id at read time (the atomic append doesn't de-dup a re-saved id).
_async_backend_op — capability detection requires a real coroutine method, so MagicMock test doubles aren't mistaken for the atomic-index backend.

After the fix: two-instances is 20/20.

Validation

New integration test test_mysql_adapter_cross_process_index_no_loss (two adapter instances, one table): all 20 entries retained.
MySQL integration suite: 12 passed against MySQL 9.6.
Full unit suite green; coverage 93.3% (≥90% gate). mysql.py 97%, adapters.py 99%. Added unit tests for index_add/index_remove (atomic SQL + FOR UPDATE + missing-row no-op) and adapter delegation incl. the MagicMock-safety guard.
ruff, ruff format --check, hatch run typecheck all clean.

Resolves #301. The StorageBackendAdapter maintained the {thread}:_checkpoints index blob with a load-modify-save guarded only by a per-instance asyncio.Lock (added in #300). That serializes concurrent writers within one process, but two processes / adapter instances over a shared store hold independent locks, so the read-modify-write still interleaves and drops index entries. Reproduced against MySQL 9.6: one-instance (shared lock) = 20/20 (in-process fix works) two-instances (separate locks) = 10/20 (cross-process loss) Give the MySQL backend atomic index primitives and have the adapter use them when present (falling back to the lock+blob path for other backends): - index_add: single INSERT ... ON DUPLICATE KEY UPDATE with JSON_ARRAY_INSERT at $.checkpoints[0]; the append is serialized by InnoDB's row lock, so concurrent cross-process writers can't clobber. - index_remove: SELECT ... FOR UPDATE + rewrite so the index RMW is row-locked across processes. - list_checkpoints de-duplicates by checkpoint_id at read time, since the atomic append does not de-dup a re-saved id. Capability detection requires a real coroutine method (_async_backend_op), so MagicMock test doubles don't get mistaken for the atomic-index backend. With the fix, two-instances is 20/20. Stacked on the #303 pool fix (the integration test's DROP TABLE teardown needs the rollback-on-release fix to avoid the idle-in-transaction MDL hang). Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

oracle-contributor-agreement Bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Jun 4, 2026

fede-kamel mentioned this pull request Jun 4, 2026

cross-process: checkpoint-index updates aren't atomic across processes (follow-up to #300) #301

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(memory): atomic cross-process MySQL checkpoint index (resolves #301)#304

fix(memory): atomic cross-process MySQL checkpoint index (resolves #301)#304
fede-kamel wants to merge 1 commit into
fix/mysql-pool-idle-in-transactionfrom
fix/mysql-cross-process-index

fede-kamel commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fede-kamel commented Jun 4, 2026

Summary

The bug

The fix

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant