Summary
The Lemonade Server bump to v10.7.0 (#1571) broke every CI job that loads the embedding model. llama-server in v10.7.0 (build ≥ b6524) cannot load nomic-embed-text-v2-moe-GGUF, so all RAG/embedding CI jobs fail at server startup — before any test runs. This is a known upstream regression: lemonade-sdk/lemonade#612.
This is not flaky — it is deterministic and reproduces on every branch. The tell: jobs that load an LLM model (API Tests, Chat, Code, Unit Tests) pass; only jobs that load the embedding model fail.
Error
model_load_error: Failed to load model 'nomic-embed-text-v2-moe-GGUF': llama-server failed to start
(higher-level form in some jobs: [ERROR] Server health check failed after 60 seconds / Server failed to start)
Affected workflows / jobs
| Job |
Workflow |
| RAG Integration Tests |
.github/workflows/test_rag.yml |
| Test Lemonade Embeddings API |
.github/workflows/test_embeddings.yml |
| Lemonade Server Smoke Test (stx) |
.github/workflows/test_lemonade_server.yml |
| Example Agents Integration Tests (stx) |
.github/workflows/test_examples.yml (intermittent) |
Evidence it is environment-wide
Test RAG / Test Lemonade Embeddings fail across multiple unrelated branches in the same window — e.g. claudia/task-8fa7ecef (#1455), feat/npu-flm-embedder, autofix/issue-1745, autofix/issue-1743. Onset (~2026-06-18) coincides with the v10.7.0 bump landing on main.
Root cause
LEMONADE_VERSION (in src/gaia/version.py) was bumped to v10.7.0 in #1571. v10.7.0 ships llama-server ≥ b6524, which per upstream lemonade-sdk/lemonade#612 does not work with nomic-embed-text-v2-moe. LLM GGUFs are unaffected, which is why only the embedding path breaks.
Fix
Related upstream issues
Summary
The Lemonade Server bump to v10.7.0 (#1571) broke every CI job that loads the embedding model.
llama-serverin v10.7.0 (build ≥ b6524) cannot loadnomic-embed-text-v2-moe-GGUF, so all RAG/embedding CI jobs fail at server startup — before any test runs. This is a known upstream regression: lemonade-sdk/lemonade#612.This is not flaky — it is deterministic and reproduces on every branch. The tell: jobs that load an LLM model (API Tests, Chat, Code, Unit Tests) pass; only jobs that load the embedding model fail.
Error
(higher-level form in some jobs:
[ERROR] Server health check failed after 60 seconds/Server failed to start)Affected workflows / jobs
.github/workflows/test_rag.yml.github/workflows/test_embeddings.yml.github/workflows/test_lemonade_server.yml.github/workflows/test_examples.yml(intermittent)Evidence it is environment-wide
Test RAG/Test Lemonade Embeddingsfail across multiple unrelated branches in the same window — e.g.claudia/task-8fa7ecef(#1455),feat/npu-flm-embedder,autofix/issue-1745,autofix/issue-1743. Onset (~2026-06-18) coincides with the v10.7.0 bump landing onmain.Root cause
LEMONADE_VERSION(insrc/gaia/version.py) was bumped to v10.7.0 in #1571. v10.7.0 shipsllama-server≥ b6524, which per upstream lemonade-sdk/lemonade#612 does not work withnomic-embed-text-v2-moe. LLM GGUFs are unaffected, which is why only the embedding path breaks.Fix
llama-serverbelow the b6524 boundary).llama-serverbuild that loadsnomic-embed-text-v2-moe.llama-serverchild stderr and only surface "Server failed to start" — they should print the child exit code/stderr so the next backend regression is debuggable from the CI log alone.Related upstream issues
/api/embedregressions after lemonade > 10.0.0