fix(oss): parallelize entity boost searches in Memory.search by kartik-mem0 · Pull Request #5377 · mem0ai/mem0

kartik-mem0 · 2026-06-04T15:48:49Z

Linked Issue

Description

Memory.search() entity boost computation processed up to 8 extracted entities sequentially — each entity's embed + entity-store search had to complete before the next started. With remote embedding providers (50-200ms RTT per call), this created 16 serial round-trips that dominated search latency, pushing entity-rich queries into multi-second territory or request timeouts.

This PR fixes entity boost performance across both Python and TypeScript SDKs with a batch-first, parallel-search architecture, plus several correctness hardening fixes found during review.

1. Batch embedding (biggest win)

Instead of calling embed() per entity (N API round-trips), all entity texts are now embedded in a single embed_batch() / embedBatch() call. Providers with native batching (e.g. OpenAI) send one HTTP request for all 8 entities. Providers without native batching fall back to sequential embed() via EmbeddingBase.embed_batch() — no regression.

At scale (1000 concurrent users × 8 entities): reduces embed API calls from 8000 → 1000 (8x).

2. Parallel entity store searches

After batch embedding, entity store searches are parallelized:

SDK	Mechanism	Concurrency Cap
Python (sync)	`ThreadPoolExecutor(max_workers=4)`	4
Python (async)	`asyncio.gather` + `Semaphore(4)`	4
TypeScript	`Promise.allSettled`	unbounded (JS single-threaded)

3. Correctness hardening

LMStudio embedBatch sort: Added .sort((a, b) => a.index - b.index) to match OpenAI/Azure — prevents silent embedding-entity misalignment when the server returns results out of insertion order.
Batch length guard: All 3 code paths validate that embed_batch / embedBatch returns the same number of vectors as input texts. On mismatch: logs a warning and skips entity boost gracefully (no crash, no undefined access).
entity_store race fix (Python sync): Resolves self.entity_store once on the main thread before submitting to ThreadPoolExecutor — prevents concurrent lazy-init race when _entity_store is None on first search call.
TS entity search filter scoping: Strips effectiveFilters to only user_id/agent_id/run_id before querying the entity store, matching Python's search_filters behavior. Previously passed full effectiveFilters which could contain processed metadata operators, causing zero entity boost results on metadata-filtered searches.
Consistent failure logging: Both Python and TypeScript now log individual entity boost failures via logger.warning / console.warn.

Combined latency improvement

Entities	Old (sequential)	New (batch + parallel)	Speedup
1	~200ms	~200ms	1x
4	~800ms	~200ms	4x
8 (max)	~1600ms	~300ms	5x

With real providers at 100-200ms RTT, worst case (8 entities) goes from ~3.2s → ~300ms.

Key Design Decisions

embed_batch() first, parallel search second: Separates the two I/O phases cleanly. Batch embedding is strictly better than parallelizing individual embed calls — fewer API calls, less provider rate-limit pressure, simpler thread/coroutine management.
Concurrency cap of 4 (Python): Prevents hammering rate-limited embedding providers. Hardcoded rather than configurable to keep the API surface clean.
return_exceptions=True / Promise.allSettled: One entity failure doesn't abort others. Failed entities are logged at WARNING and skipped (best-effort).
Scoring is unchanged: max() aggregation over per-entity boosts is order-independent, so concurrent completion produces identical scores to the old sequential order.
No opt-in flag: Parallelism is always-on. The semaphore/pool cap provides the safety valve without requiring users to discover and set a config flag.

Supersedes

This PR covers the scope of all three community PRs with a unified fix:

feat(oss): opt-in parallel entity-boost in Memory.search() — 2-4x recall speedup #5046 (TS SDK only, opt-in flag) — closed
feat(oss): add opt-in parallel entity boost for Python #5227 (Python only, opt-in flag) — closed
fix(memory): parallelize entity boost searches in AsyncMemory.search #5298 (Python async only, no sync fix, no concurrency cap) — closed

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Refactor (no functional changes)
Documentation update

Breaking Changes

N/A — no public API changes. Internal implementation detail only.

Test Coverage

I added/updated unit tests
I added/updated integration tests
I tested manually (describe below)
No tests needed (explain why)

Python (8 new tests in `TestEntityBoostParallelism`)

test_sync_boosts_preserve_scoring — scoring math matches reference with batch embed + ThreadPoolExecutor
test_sync_embed_batch_called_once — verifies embed_batch is called exactly once with all entity texts
test_async_boosts_preserve_scoring — scoring math matches reference with batch embed + asyncio.gather
test_async_embed_batch_called_once — verifies embed_batch is called exactly once (async path)
test_sync_one_entity_failure_does_not_abort_others — partial failure resilience (sync)
test_async_one_entity_failure_does_not_abort_others — partial failure resilience (async)
test_sync_searches_run_concurrently — timing + peak concurrency proves overlap (sync)
test_async_searches_run_concurrently — timing + peak concurrency proves overlap (async)

TypeScript (4 new tests in `memory.entity-boost.test.ts`)

should use Promise.allSettled for concurrent entity searches
should preserve scoring math with parallel execution
should survive one entity search failure without losing other boosts — also verifies console.warn is called
should call entity searches concurrently, not sequentially

Verification

Python: 42/42 passed (pytest tests/memory/test_main.py)
TypeScript: 4/4 passed (jest memory.entity-boost.test.ts), Prettier passes
Ruff + isort pass

Checklist

My code follows the project's style guidelines
I have performed a self-review of my code
I have added tests that prove my fix/feature works
New and existing tests pass locally
I have updated documentation if needed

Entity boost computation processed up to 8 entities sequentially, making each search wait for the previous embed+search round-trip. With remote providers this dominated search latency (measured 4-8x overhead). Parallelize across both Python and TypeScript SDKs: - Python async: asyncio.gather with Semaphore(4) - Python sync: concurrent.futures.ThreadPoolExecutor(max_workers=4) - TypeScript: Promise.allSettled Scoring is unchanged — max() aggregation is order-independent. Individual entity failures are logged and skipped (best-effort).

The TS SDK silently skipped rejected entity boost searches while the Python SDK logged a warning for each. Add console.warn on rejection and update the test to verify the warning fires (using a query that actually produces extractable entities).

…ed calls Batch all entity embeddings into a single API call via embed_batch(), then parallelize only the entity store searches. Reduces N embed round-trips to 1 (providers with native batching like OpenAI send a single HTTP request), cutting API calls by up to 8x at scale while keeping the search parallelism from the prior commit.

… length mismatch - LMStudio embedBatch: add .sort((a,b) => a.index - b.index) to match OpenAI/Azure pattern — prevents silent embedding-entity misalignment when the server returns results out of insertion order. - Python (sync + async): validate embed_batch returns same count as input texts; log warning and skip boost if mismatched. - TypeScript: same length guard before Promise.allSettled — prevents undefined vector passed to entityStore.search on short response. - Remove dead searchResponses variable from TS test. - Relax timing bound in concurrency test (350ms → 500ms) to avoid flakes from embedBatch overhead on slow CI runners.

- Python sync: resolve self.entity_store once on the main thread before submitting to ThreadPoolExecutor — prevents concurrent lazy-init race when _entity_store is None on first search call. - TypeScript: strip effectiveFilters to only user_id/agent_id/run_id before querying the entity store, matching Python's search_filters behavior. Previously passed full effectiveFilters which could contain processed metadata operators that entity store records don't have, causing zero entity boost results with metadata-filtered searches.

markymark2001 · 2026-06-05T09:07:26Z

great

kartik-mem0 added 2 commits June 4, 2026 21:18

This was referenced Jun 5, 2026

feat(oss): opt-in parallel entity-boost in Memory.search() — 2-4x recall speedup #5046

Closed

feat(oss): add opt-in parallel entity boost for Python #5227

Closed

fix(memory): parallelize entity boost searches in AsyncMemory.search #5298

Closed

kartik-mem0 added 3 commits June 5, 2026 10:19

whysosaket approved these changes Jun 5, 2026

View reviewed changes

kartik-mem0 merged commit d817aa9 into main Jun 5, 2026
14 checks passed

kartik-mem0 deleted the fix/parallelize-entity-boost-searches branch June 5, 2026 15:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(oss): parallelize entity boost searches in Memory.search#5377

fix(oss): parallelize entity boost searches in Memory.search#5377
kartik-mem0 merged 5 commits into
mainfrom
fix/parallelize-entity-boost-searches

kartik-mem0 commented Jun 4, 2026 •

edited

Loading

Uh oh!

markymark2001 commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kartik-mem0 commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Linked Issue

Description

1. Batch embedding (biggest win)

2. Parallel entity store searches

3. Correctness hardening

Combined latency improvement

Key Design Decisions

Supersedes

Type of Change

Breaking Changes

Test Coverage

Python (8 new tests in TestEntityBoostParallelism)

TypeScript (4 new tests in memory.entity-boost.test.ts)

Verification

Checklist

Uh oh!

markymark2001 commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kartik-mem0 commented Jun 4, 2026 •

edited

Loading

Python (8 new tests in `TestEntityBoostParallelism`)

TypeScript (4 new tests in `memory.entity-boost.test.ts`)