feat(oss): opt-in parallel entity-boost in Memory.search() — 2-4x recall speedup#5046
Closed
DmitryPogodaev wants to merge 1 commit into
Closed
feat(oss): opt-in parallel entity-boost in Memory.search() — 2-4x recall speedup#5046DmitryPogodaev wants to merge 1 commit into
DmitryPogodaev wants to merge 1 commit into
Conversation
… speedup) Adds Memory config flag `parallelEntityBoost` (default false). When true, the entity-boost embed+search loop in Memory.search() runs concurrently via Promise.all instead of sequentially. With remote embedders this turns N+1 sequential RTTs into ~1 RTT. Measured on production setup (ollama embed:latest, 9-entity query): - sequential: 6595ms - parallel: 2089ms (3.16x speedup) Safety: per-iteration writes go to entityBoosts[memId] = Math.max(prev, boost) which is order-independent under interleaved single-threaded JS writes. Default kept at false to preserve back-compat for users with rate-limited or single-slot embedder backends.
|
Dmitry Pogodaev seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
|
please merge this 🙏 🥺 |
14 tasks
Contributor
|
Closing — superseded by #5377, which covers both Python and TypeScript SDKs with a unified fix (always-on parallelism with concurrency cap, no opt-in flag needed). Thank you for the contribution! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
feat(oss): opt-in parallel entity-boost in Memory.search() — 2-4x recall speedup with remote embedders
Problem
Memory.search()runs the entity-boost computation as a sequentialfor...awaitloop:With remote embedders this becomes the dominant latency. We measured a single
recallfor an entity-rich query (9 entities) taking ~6.5 seconds, with~95% of that time spent waiting on serial embed RTTs.
The block has up to 8 iterations (
.slice(0, 8)) plus the initial query embed,so worst case is 9 sequential embedder.embed() calls per search.
This regressed real production latency for us when v3.0.0 added the multi-signal
hybrid retrieval (entity boost). Before v3.0.0 a single
Memory.search()didexactly one embed call.
Fix
Add an opt-in config flag
parallelEntityBoost(default:false, preservesupstream behavior). When
true, the entity-boost loop runs viaPromise.all(deduped.map(...)).Safety: each iteration writes to
entityBoosts[memId] = Math.max(prev, boost).This is order-independent and safe under JS's single-threaded event loop —
interleaved Promise resolutions cannot race because each
Math.max + assignis one synchronous block per microtask.
Default kept at
falseto:OpenAI plans, single-slot ollama)
Users with parallel-friendly embedders (managed services, multi-slot ollama,
batched embedder backends) opt in via:
Measurements
Reproduced on production setup: ollama embed:latest (qwen3-embedding 4B Q4) on
RTX 5090 with
OLLAMA_NUM_PARALLEL=2, accessed via WireGuard tunnel(~218ms RTT), through a multi-threaded HTTP proxy.
Same prompts, same Qdrant collection (~15k memories), same gateway process —
only difference is
parallelEntityBoostflag flipped:Sequential
embed_ms(sum of all per-call durations) ≈ wallclocktotal_msin baseline (each call blocks the next). Parallel
embed_ms(sum) is2-3x larger than wallclock — direct evidence the calls overlap.
For prompts that extract no entities (≤1 embed call, common for short
conversational queries), behavior is identical — no regression possible
since the patched branch is only entered when
deduped.length > 0.Files changed
mem0-ts/src/oss/src/types/index.ts— addparallelEntityBoosttoMemoryConfigSchemamem0-ts/src/oss/src/config/manager.ts— propagate the flag throughConfigManager.mergeConfig(defaultfalse)mem0-ts/src/oss/src/memory/index.ts— gate the entity-boost loop onthis.config.parallelEntityBoostBackward compatibility
false→ identical to current behavioroptionalin schema → existing configs validate unchanged