fix(memory): parallelize entity boost searches in AsyncMemory.search#5298
Closed
Bartok9 wants to merge 1 commit into
Closed
fix(memory): parallelize entity boost searches in AsyncMemory.search#5298Bartok9 wants to merge 1 commit into
Bartok9 wants to merge 1 commit into
Conversation
Closes mem0ai#5214 Problem: AsyncMemory._compute_entity_boosts_async processed up to 8 deduped entities sequentially, awaiting each entity's embedding call and entity-store search before starting the next. With a remote embedding provider the latency is roughly the sum of all per-entity calls, making normal searches take several seconds or hit request timeouts. Fix: gather the per-entity embed+search work with asyncio.gather so the independent calls run concurrently. Wall time drops from ~sum of all calls to ~slowest single call. return_exceptions=True keeps one failed entity from aborting the others; failures are logged and skipped. Scoring is unchanged: the max-aggregated boost math runs over gathered results, and max() is order-independent so concurrency preserves identical output. Verification: 3 new tests in tests/memory/test_main.py - preserve-scoring (matches reference math), one-entity-failure-resilience, and a concurrency proof (4 sleeping searches finish in <0.6s, not ~0.8s).
14 tasks
Contributor
|
Closing — superseded by #5377, which covers both Python and TypeScript SDKs with a unified fix (always-on parallelism with concurrency cap, no opt-in flag needed). Thank you for the contribution! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
AsyncMemory._compute_entity_boosts_asyncconcurrently instead of sequentially.Motivation
Closes #5214.
AsyncMemory.search()can become slow when entity boost runs for multiple extracted entities._compute_entity_boosts_async()processed up to 8 deduped entities in aforloop,awaiting each entity's embedding call and entity-store search before starting the next. With a remote embedding provider, wall time is roughly the sum of every per-entity call, which can push otherwise-normal searches into multi-second territory or request timeouts — even though the per-entity work is fully independent.Fix
_search_entitycoroutine and dispatched viaasyncio.gather, so the independent remote calls overlap.return_exceptions=Truekeeps a single failing entity (provider timeout, throttle, transient 5xx) from aborting the others — the failure is logged atWARNINGand that entity is skipped, matching the previous best-effort behaviour.similarity * ENTITY_BOOST_WEIGHT * memory_count_weightmath andmax()aggregation run over the gathered results. Becausemax()is order-independent, concurrent completion produces identical boosts to the old sequential order.Verification
python3 -m pytest tests/memory/test_main.py— 37 passed (3 new + 34 existing).New tests in
tests/memory/test_main.py::TestAsyncEntityBoostParallelism:test_boosts_preserve_scoring— output matches the reference scoring math, including themax()aggregation when one memory is linked by two entities.test_one_entity_failure_does_not_abort_others— a raising entity search is logged and skipped while the surviving entity still contributes a boost.test_searches_run_concurrently— 4 sleeping searches finish in < 0.6 s (vs ~0.8 s sequential) and observed concurrency peak ≥ 2, proving the calls overlap.Diff is scoped to 2 files (
mem0/memory/main.py,tests/memory/test_main.py); the sync_compute_entity_boostspath is intentionally left untouched.