fix(vector_stores): use platform user-data dir instead of /tmp/{provider}#5350
fix(vector_stores): use platform user-data dir instead of /tmp/{provider}#5350alvinttang wants to merge 1 commit into
Conversation
…der}
The default vector store path was /tmp/{provider}, which breaks in a few
real deployments:
- macOS LaunchAgents (sandbox often blocks /tmp writes; /tmp is periodically cleaned)
- systemd services with PrivateTmp= or noexec /tmp
- Windows (no /tmp at all)
- Docker (/tmp is ephemeral unless mounted)
Resolution order is now:
1. MEM0_DATA_DIR env var (explicit override)
2. Platform convention via stdlib:
- macOS: ~/Library/Application Support/mem0
- Windows: %LOCALAPPDATA%/mem0
- Linux/BSD: $XDG_DATA_HOME/mem0 or ~/.local/share/mem0
3. provider name appended as the subdir
Explicit path in user config still wins over both the env var and the
default, so behavior for anyone already setting path= is unchanged.
Refs mem0ai#4279
|
alvinttang seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
|
Good fix — the /tmp default is a real ops hazard, especially in Docker and systemd environments where silent data loss on restart is exactly the kind of thing that surfaces as a retrieval quality problem rather than a write-path error. One related note: if you're running an embedded vector store and also hitting the silent embedding drop issue (#5245), the combination is particularly bad — writes silently fail at the embedding layer, and any that do land are potentially wiped on restart. vector-router (https://github.com/Adelagric/vector-router/releases/tag/v0.1.0) sits between the embedding provider and the store and surfaces per-item failures before they reach the store, so it composes with this fix rather than depending on it. Apache 2.0, drop-in via OPENAI_BASE_URL if useful while both PRs work through review. |
Refs #4279.
The default vector store path was
/tmp/{provider}. Three reasons that breaks:/tmpwrites;/tmpis periodically cleaned by the OS, so any persisted index gets wiped.PrivateTmp=ornoexec/tmp./tmpat all./tmpis ephemeral unless mounted as a volume, so embedded stores silently lose data on restart.The maintainer comment on the issue (here) said the right move is to allow an explicit path and pick a more robust default. This PR does both.
New resolution order:
MEM0_DATA_DIRenvironment variable (explicit override).~/Library/Application Support/mem0%LOCALAPPDATA%/mem0$XDG_DATA_HOME/mem0or~/.local/share/mem0faisskeeps its own folder.Explicit
path=in the user config still wins over both the env var and the default, so anyone already settingpathsees no change.RED
GREEN
tests/configs/also still passes. The othertests/vector_stores/test_*files require optional backends (chromadb, pgvector, mongodb, etc.) so I couldn't run them locally without installing each, but the only line I touched inconfigs.pyis the path default, which never reaches those backend paths.