Skip to content

fix(opensearch): stop mapping knn similarity threshold onto boost#15681

Open
ef-rintaro wants to merge 1 commit into
infiniflow:mainfrom
ef-rintaro:fix/opensearch-knn-boost
Open

fix(opensearch): stop mapping knn similarity threshold onto boost#15681
ef-rintaro wants to merge 1 commit into
infiniflow:mainfrom
ef-rintaro:fix/opensearch-knn-boost

Conversation

@ef-rintaro
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

On the OpenSearch backend (DOC_ENGINE=opensearch), dense/vector retrieval is
silently dead for an entire KB: semantic queries and exact-id lookups return
"nothing found" even though chunks and their vectors are correctly indexed.
Only BM25/term matching works.

Root cause. OSConnection.search() maps MatchDenseExpr.extra_options["similarity"]
onto the OpenSearch knn boost. But similarity is a minimum-similarity
threshold
(Elasticsearch knn semantics), not a score multiplier --
es_conn.py correctly passes it as the knn similarity argument, not boost.
rag/nlp/search.py:_knn_scores() runs a clean-cosine second pass and
deliberately passes similarity=0.0 to disable thresholding. On OpenSearch that
became boost=0.0, multiplying every knn _score by zero, so get_scores()
returned all-zero, vector_similarity was always 0, and Dealer.retrieval()'s
similarity_threshold (default 0.2) then dropped every chunk.

Fix

Keep the knn boost at 1.0 so the _score is the raw cosine. The similarity
threshold is still enforced downstream in Dealer.retrieval() (post_threshold),
so no behavior is lost. Main search ranking is unaffected: boost is a uniform
positive scale and the text query is only used as a knn filter, so candidates
are ranked by cosine either way.

Type of change

  • Bug Fix (non-breaking change which fixes an issue)

Affected backends

OpenSearch only. Elasticsearch already passes similarity as the knn
similarity argument (not boost); Infinity / OceanBase use their own
similarity paths.

@dosubot dosubot Bot added size:S This PR changes 10-29 lines, ignoring generated files. 🐞 bug Something isn't working, pull request that fix bug. labels Jun 5, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 5, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f1259f88-5218-4504-b512-9ffedbdcbf30

📥 Commits

Reviewing files that changed from the base of the PR and between 7789862 and ee4df88.

📒 Files selected for processing (1)
  • rag/utils/opensearch_conn.py

📝 Walkthrough

Walkthrough

The change updates KNN query construction in OSConnection.search's dense vector search handling. Instead of mapping m.extra_options["similarity"] to the KNN boost parameter, the code now always sets boost to 1.0. Explanatory comments document the semantic mismatch between OpenSearch/Elasticsearch KNN implementations and clarify that downstream logic handles similarity thresholding.

Changes

KNN Vector Search

Layer / File(s) Summary
KNN boost standardization and semantic documentation
rag/utils/opensearch_conn.py
KNN query construction removes the mapping of similarity values to boost, standardizes boost to 1.0, and adds detailed comments explaining OpenSearch/Elasticsearch semantic differences and how downstream filtering applies similarity thresholds.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

size:L, lgtm

Suggested reviewers

  • dcc123456
  • KevinHuSh

Poem

A vector query once boosted high, 🐰
Now 1.0 is our reply,
With comments clear that explain the way,
OpenSearch semantics win the day! 🎯

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: stopping the mapping of KNN similarity threshold onto boost in OpenSearch code.
Description check ✅ Passed The description comprehensively covers the problem, root cause, fix, and affected backends with clear context, and includes the required Type of change checkbox.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🐞 bug Something isn't working, pull request that fix bug. size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant