fix(memory): atomic cross-process MySQL checkpoint index (resolves #301)#304
Open
fede-kamel wants to merge 1 commit into
Open
fix(memory): atomic cross-process MySQL checkpoint index (resolves #301)#304fede-kamel wants to merge 1 commit into
fede-kamel wants to merge 1 commit into
Conversation
Resolves #301. The StorageBackendAdapter maintained the {thread}:_checkpoints index blob with a load-modify-save guarded only by a per-instance asyncio.Lock (added in #300). That serializes concurrent writers within one process, but two processes / adapter instances over a shared store hold independent locks, so the read-modify-write still interleaves and drops index entries. Reproduced against MySQL 9.6: one-instance (shared lock) = 20/20 (in-process fix works) two-instances (separate locks) = 10/20 (cross-process loss) Give the MySQL backend atomic index primitives and have the adapter use them when present (falling back to the lock+blob path for other backends): - index_add: single INSERT ... ON DUPLICATE KEY UPDATE with JSON_ARRAY_INSERT at $.checkpoints[0]; the append is serialized by InnoDB's row lock, so concurrent cross-process writers can't clobber. - index_remove: SELECT ... FOR UPDATE + rewrite so the index RMW is row-locked across processes. - list_checkpoints de-duplicates by checkpoint_id at read time, since the atomic append does not de-dup a re-saved id. Capability detection requires a real coroutine method (_async_backend_op), so MagicMock test doubles don't get mistaken for the atomic-index backend. With the fix, two-instances is 20/20. Stacked on the #303 pool fix (the integration test's DROP TABLE teardown needs the rollback-on-release fix to avoid the idle-in-transaction MDL hang). Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Resolves #301. Makes the MySQL checkpoint index updates atomic across processes, closing the cross-process race that #300's per-instance lock could not cover.
The bug
StorageBackendAdapterkeeps the{thread}:_checkpointsindex as a blob and updates it with a load-modify-save guarded only by a per-instanceasyncio.Lock(#300). That serializes writers within one process, but two processes / adapter instances over a shared store hold independent locks, so the read-modify-write interleaves and drops index entries. The checkpoint data keys are never lost (distinct keys, no RMW) — only the index under-reports, solist_checkpoints/ time-travel miss checkpoints.Reproduced against MySQL 9.6 (20 concurrent saves to one thread):
The fix
Give the MySQL backend atomic index primitives and have the adapter delegate to them when present (other backends keep the lock+blob fallback unchanged):
index_add— singleINSERT … ON DUPLICATE KEY UPDATEusingJSON_ARRAY_INSERT(data, '$.checkpoints[0]', …). The append is serialized by InnoDB's row lock, so concurrent cross-process writers can't clobber each other. Keeps the adapter's{"checkpoints": [...]}shape, newest-first.index_remove—SELECT … FOR UPDATE+ rewrite, so the index RMW is row-locked across processes.list_checkpoints— de-duplicates bycheckpoint_idat read time (the atomic append doesn't de-dup a re-saved id)._async_backend_op— capability detection requires a real coroutine method, soMagicMocktest doubles aren't mistaken for the atomic-index backend.After the fix: two-instances is 20/20.
Validation
test_mysql_adapter_cross_process_index_no_loss(two adapter instances, one table): all 20 entries retained.mysql.py97%,adapters.py99%. Added unit tests forindex_add/index_remove(atomic SQL +FOR UPDATE+ missing-row no-op) and adapter delegation incl. the MagicMock-safety guard.ruff,ruff format --check,hatch run typecheckall clean.