Skip to content

fix(vector_stores): use platform user-data dir instead of /tmp/{provider}#5350

Open
alvinttang wants to merge 1 commit into
mem0ai:mainfrom
alvinttang:fix/vector-store-default-data-dir
Open

fix(vector_stores): use platform user-data dir instead of /tmp/{provider}#5350
alvinttang wants to merge 1 commit into
mem0ai:mainfrom
alvinttang:fix/vector-store-default-data-dir

Conversation

@alvinttang
Copy link
Copy Markdown

Refs #4279.

The default vector store path was /tmp/{provider}. Three reasons that breaks:

  • macOS LaunchAgents: sandbox often blocks /tmp writes; /tmp is periodically cleaned by the OS, so any persisted index gets wiped.
  • systemd services with PrivateTmp= or noexec /tmp.
  • Windows: no /tmp at all.
  • Docker: /tmp is ephemeral unless mounted as a volume, so embedded stores silently lose data on restart.

The maintainer comment on the issue (here) said the right move is to allow an explicit path and pick a more robust default. This PR does both.

New resolution order:

  1. MEM0_DATA_DIR environment variable (explicit override).
  2. Platform convention via stdlib (no new dep):
    • macOS: ~/Library/Application Support/mem0
    • Windows: %LOCALAPPDATA%/mem0
    • Linux/BSD: $XDG_DATA_HOME/mem0 or ~/.local/share/mem0
  3. Provider name appended as the subdir, so faiss keeps its own folder.

Explicit path= in the user config still wins over both the env var and the default, so anyone already setting path sees no change.

RED

def test_default_path_is_not_tmp():
    cfg = VectorStoreConfig(provider="faiss", config={})
    assert not cfg.config.path.startswith("/tmp/")
E   AssertionError: default path still falls back to /tmp: /tmp/faiss

GREEN

test_default_path_is_not_tmp                          PASSED
test_env_var_override                                 PASSED
test_explicit_path_wins                               PASSED
test_macos_default_uses_application_support           PASSED
test_linux_default_respects_xdg_data_home             SKIPPED (linux-only)
4 passed, 1 skipped in 0.52s

tests/configs/ also still passes. The other tests/vector_stores/test_* files require optional backends (chromadb, pgvector, mongodb, etc.) so I couldn't run them locally without installing each, but the only line I touched in configs.py is the path default, which never reaches those backend paths.

…der}

The default vector store path was /tmp/{provider}, which breaks in a few
real deployments:

- macOS LaunchAgents (sandbox often blocks /tmp writes; /tmp is periodically cleaned)
- systemd services with PrivateTmp= or noexec /tmp
- Windows (no /tmp at all)
- Docker (/tmp is ephemeral unless mounted)

Resolution order is now:
  1. MEM0_DATA_DIR env var (explicit override)
  2. Platform convention via stdlib:
     - macOS: ~/Library/Application Support/mem0
     - Windows: %LOCALAPPDATA%/mem0
     - Linux/BSD: $XDG_DATA_HOME/mem0 or ~/.local/share/mem0
  3. provider name appended as the subdir

Explicit path in user config still wins over both the env var and the
default, so behavior for anyone already setting path= is unchanged.

Refs mem0ai#4279
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


alvinttang seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@kaushikmk186-sudo
Copy link
Copy Markdown

Good fix — the /tmp default is a real ops hazard, especially in Docker and systemd environments where silent data loss on restart is exactly the kind of thing that surfaces as a retrieval quality problem rather than a write-path error.

One related note: if you're running an embedded vector store and also hitting the silent embedding drop issue (#5245), the combination is particularly bad — writes silently fail at the embedding layer, and any that do land are potentially wiped on restart. vector-router (https://github.com/Adelagric/vector-router/releases/tag/v0.1.0) sits between the embedding provider and the store and surfaces per-item failures before they reach the store, so it composes with this fix rather than depending on it.

Apache 2.0, drop-in via OPENAI_BASE_URL if useful while both PRs work through review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants