fix(opensearch): stop mapping knn similarity threshold onto boost#15681
fix(opensearch): stop mapping knn similarity threshold onto boost#15681ef-rintaro wants to merge 1 commit into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe change updates KNN query construction in ChangesKNN Vector Search
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
What problem does this PR solve?
On the OpenSearch backend (
DOC_ENGINE=opensearch), dense/vector retrieval issilently dead for an entire KB: semantic queries and exact-id lookups return
"nothing found" even though chunks and their vectors are correctly indexed.
Only BM25/term matching works.
Root cause.
OSConnection.search()mapsMatchDenseExpr.extra_options["similarity"]onto the OpenSearch knn
boost. Butsimilarityis a minimum-similaritythreshold (Elasticsearch knn semantics), not a score multiplier --
es_conn.pycorrectly passes it as the knnsimilarityargument, notboost.rag/nlp/search.py:_knn_scores()runs a clean-cosine second pass anddeliberately passes
similarity=0.0to disable thresholding. On OpenSearch thatbecame
boost=0.0, multiplying every knn_scoreby zero, soget_scores()returned all-zero,
vector_similaritywas always 0, andDealer.retrieval()'ssimilarity_threshold(default 0.2) then dropped every chunk.Fix
Keep the knn
boostat 1.0 so the_scoreis the raw cosine. The similaritythreshold is still enforced downstream in
Dealer.retrieval()(post_threshold),so no behavior is lost. Main search ranking is unaffected: boost is a uniform
positive scale and the text query is only used as a knn filter, so candidates
are ranked by cosine either way.
Type of change
Affected backends
OpenSearch only. Elasticsearch already passes
similarityas the knnsimilarityargument (notboost); Infinity / OceanBase use their ownsimilarity paths.