feat: Dify-compatible retrieval API endpoint#15704
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a Dify-compatible retrieval handler (GET/POST + health), wires it into server/router, adds a searchbot retrieval-test endpoint, and refactors KG retrieval/search into package ChangesDify Retrieval Feature with KG Refactoring
Sequence DiagramsequenceDiagram
participant Client
participant Router
participant DifyHandler as DifyRetrievalHandler
participant KBService as KBService
participant RetrievalSvc as RetrievalService
participant KGPipeline as KG Pipeline
participant DocDAO as DocumentDAO
Client->>Router: POST/GET /api/v1/dify/retrieval
Router->>DifyHandler: forward request
DifyHandler->>DifyHandler: bind & validate input
DifyHandler->>KBService: Accessible(knowledge_id, user)
DifyHandler->>RetrievalSvc: Retrieval(nlp.RetrievalRequest)
RetrievalSvc-->>DifyHandler: chunks
alt use_kg == true
DifyHandler->>KGPipeline: NewPipeline(...).Retrieval(ctx)
KGPipeline-->>DifyHandler: KG chunks
DifyHandler->>DifyHandler: prepend KG chunk
end
DifyHandler->>DocDAO: GetByIDs(aggregated doc IDs)
DocDAO-->>DifyHandler: documents with metadata
DifyHandler-->>Client: { "records": [...] }
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly Related PRs
Suggested Labels
Suggested Reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
- Add DifyRetrievalHandler with interfaces for testability - POST/GET /api/v1/dify/retrieval endpoint (matching Python impl) - GET /api/v1/dify/retrieval/health endpoint - Wire handler + routes in router.go and server_main.go - Add SetChatModel/SetEmbModel setters to KGSearchPipeline - 12 tests covering all paths (success, errors, KG, meta filter) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Fix metadata condition JSON field mapping (name→Key, comparison_operator→Op) - Extract top_k/score_threshold from GET query params manually - Return 500 on KG chat model fetch failure (instead of silent skip) - Default metadata condition logic to "and" when omitted - Remove extra "message" field from health check response Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Option A — delete unused interface. KG pipeline is created inline where needed (6 lines), no polymorphism required. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Move Pipeline, scoring, search, types, helpers to internal/service/kg/ - Keep backward-compat type aliases and function wrappers in service package - Update handler to import kg package directly - Fix test package boundaries (consistent package kg) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Pipeline (was KGSearchPipeline) - Option (was KGSearchOption) - NewPipeline (was NewKGSearchPipeline) - WithSimThreshold / WithDenseTopK (was WithKG*) - Retrieval (was KGSearchRetrieval) - defaultSimThreshold / defaultDenseTopK (was defaultKG*) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Top and SimilarityThreshold now passed as nil when user omits them - Service layer nil-guard handles defaults (1024, 0.0, 0.3) - Remove redundant default-setting in handler - Single source of truth for defaults in nlp/service layer
7b0a9a6 to
ff8b4df
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
internal/service/kg/search.go (1)
97-103:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winAvoid nil dereference in embedding path (
embModel.ModelDriver).At Line 102,
embModel.ModelDriver.Embed(...)is called without verifyingModelDriveris non-nil. This can panic when an embedding model record is loaded but driver initialization failed/skipped.🔧 Proposed fix
-func buildKGDenseExpr(embModel *modelModule.EmbeddingModel, question string, topN int) (*types.MatchDenseExpr, error) { - if embModel == nil || question == "" { +func buildKGDenseExpr(embModel *modelModule.EmbeddingModel, question string, topN int) (*types.MatchDenseExpr, error) { + if embModel == nil || embModel.ModelDriver == nil || question == "" { return nil, nil }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/service/kg/search.go` around lines 97 - 103, buildKGDenseExpr may panic because it calls embModel.ModelDriver.Embed without checking ModelDriver; update buildKGDenseExpr to validate that embModel.ModelDriver is non-nil (and return a sensible nil/err) before calling Embed, e.g., check embModel != nil && embModel.ModelDriver != nil and return an error or nil result when the driver is absent, ensuring the function handles that case gracefully.internal/service/kg/pipeline.go (2)
195-202:⚠️ Potential issue | 🟠 Major | ⚡ Quick winInclude
_scorein KG search projections before score filtering.At Line 199 and Line 261,
SelectFieldsomits_score, but Line 215 and Line 277 immediately filter by score. If the engine does not return score fields by default, all rows can be filtered out incorrectly.🔧 Proposed fix
- SelectFields: []string{"entity_kwd", "entity_type_kwd", "rank_flt", "content_with_weight", "n_hop_with_weight"}, + SelectFields: []string{"entity_kwd", "entity_type_kwd", "rank_flt", "content_with_weight", "n_hop_with_weight", "_score"}, ... - SelectFields: []string{"from_entity_kwd", "to_entity_kwd", "weight_int", "content_with_weight"}, + SelectFields: []string{"from_entity_kwd", "to_entity_kwd", "weight_int", "content_with_weight", "_score"},Also applies to: 257-264
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/service/kg/pipeline.go` around lines 195 - 202, In Pipeline.searchEntities the SelectFields for KG searches (e.g., the entsReq and the other SearchRequest used later) omit "_score" but the code immediately filters results by score; add "_score" to the SelectFields slices so the engine returns score values before applying the score-based filtering in searchEntities (update the SelectFields in the SearchRequest instances referenced in searchEntities to include "_score").
177-183:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftSynthetic KG chunk currently violates downstream record contract.
At Line 181,
doc_idis empty. Ininternal/handler/dify_retrieval_handler.go(Line 326-359), records are skipped whendoc_idcannot resolve to a document, so KG content can be silently dropped from API responses.Please align the contract across layers: either (a) special-case synthetic KG chunks in handler serialization, or (b) attach a resolvable synthetic document contract.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/service/kg/pipeline.go` around lines 177 - 183, The synthetic KG chunk map currently leaves "doc_id" empty causing records to be dropped by the handler; fix by assigning a resolvable synthetic document id when building the chunk in pipeline.go (populate the "doc_id" key using a stable synthetic pattern tied to p.kbIDs and the chunk identifier, e.g. a "kg:<kb_id>:<chunk_id>" style), and ensure any document-metadata fields expected downstream (e.g., "docnm_kwd", "kb_id", and the chunk id key) are set consistently so the handler's resolution logic (in dify_retrieval_handler.go) will treat the KG chunk as a valid document rather than skipping it.internal/service/kg/retrieval.go (1)
235-242:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winGuard
chatModel.ModelDriverbefore invoking chat completion.At Line 241,
chatModel.ModelDriver.ChatWithMessages(...)is called without checkingchatModel.ModelDriver != nil. A partially initialized model can panic this path.🔧 Proposed fix
-if chatModel != nil && chatModel.ModelName != nil && chatModel.APIConfig != nil { +if chatModel != nil && chatModel.ModelDriver != nil && chatModel.ModelName != nil && chatModel.APIConfig != nil {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/service/kg/retrieval.go` around lines 235 - 242, The call to chatModel.ModelDriver.ChatWithMessages can panic if chatModel.ModelDriver is nil; before invoking ChatWithMessages (in the block checking chatModel != nil && chatModel.ModelName != nil && chatModel.APIConfig != nil) add a nil-check for chatModel.ModelDriver and return/handle the error path if it's nil, e.g. log or return an error indicating the model driver is uninitialized; ensure you reference the same symbols (chatModel, ModelDriver, ChatWithMessages, ModelName, APIConfig) when adding the guard so the code only calls ChatWithMessages on a non-nil ModelDriver.
🧹 Nitpick comments (1)
internal/handler/dify_retrieval_handler_test.go (1)
322-337: ⚡ Quick winStrengthen KG-path assertions in
TestDifyRetrieval_UseKG.This test only checks status code, so KG integration can regress without test failure. Assert that
recordsincludes the KG-prepended content whenuse_kg=true.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/handler/dify_retrieval_handler_test.go` around lines 322 - 337, Update TestDifyRetrieval_UseKG to parse the JSON response body and assert the returned retrieval records include the KG-prepended content when use_kg=true: after calling r.ServeHTTP, unmarshal w.Body into a response struct (e.g., with Records []struct{ Content string `json:"content"` }) and add an assertion that the first record's Content contains the KG label produced by your mock (for example contains "tag_1" or the expected KG prefix), referencing TestDifyRetrieval_UseKG, mockMetadataService and its labelQuestionFn to locate where the KG label is defined.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/handler/dify_retrieval_handler_test.go`:
- Around line 262-277: TestDifyRetrieval_NoAuth mounts a stub that always
returns 401 so it never exercises DifyRetrievalHandler.Retrieval or GetUser;
replace the stub route with the real handler (register
DifyRetrievalHandler.Retrieval on POST "/api/v1/dify/retrieval") and arrange the
auth path to fail by mocking/stubbing GetUser (or the auth middleware used by
Retrieval) to return an unauthorized error for the request; then send the same
POST and assert w.Code == http.StatusUnauthorized so the actual handler/auth
logic is exercised.
In `@internal/handler/dify_retrieval_handler.go`:
- Around line 317-324: The current logic swallows errors from docDAO.GetByIDs
and drops retrieval rows when a document isn't hydrated; change it so GetByIDs
failures are returned as errors (do not convert them to empty results) and when
iterating chunks (the loop that uses docMap and checks docMap[d.ID]) do not
unconditionally skip rows if the document is missing—preserve the retrieval row
(e.g., keep doc nil or attach a placeholder) so KG rows without doc_id still
pass through; refer to allDocIDs, h.docDAO.GetByIDs, and docMap to locate and
implement these changes.
- Around line 174-188: The current parsing of c.Query("top_k") and
c.Query("score_threshold") in the handler silently ignores parse errors and
allows invalid values; update the parsing logic around c.Query("top_k")
(strconv.Atoi) and c.Query("score_threshold") (strconv.ParseFloat) to validate
inputs and return a 400 error on invalid values instead of silently skipping
them: for TopK ensure parsed > 0 (and reject non-positive integers) before
assigning to req.RetrievalSetting.TopK, and for ScoreThreshold ensure the parsed
float is within an acceptable range (e.g. 0.0–1.0) before assigning to
req.RetrievalSetting.ScoreThreshold; use the handler's error response path (e.g.
c.JSON/c.AbortWithStatusJSON) to surface clear validation messages when strconv
parsing fails or values are out of range.
- Around line 202-205: The current handler treats any error from h.kbSvc.GetByID
as a 404; change the logic in the retrieval block that calls h.kbSvc.GetByID so
that if err != nil you return a 500 (use c.JSON with
http.StatusInternalServerError and an appropriate message/log) and only return
404 when kb == nil (keeping common.CodeNotFound for that branch); update the
error response for the 500 path to include minimal context (e.g., "Failed to
fetch Knowledgebase") while preserving the 404 branch for truly missing
resources.
In `@internal/service/kg/retrieval.go`:
- Around line 175-186: kgRelationFromChunk currently leaves KGRelation.Sim
unset, causing downstream scoring (FuseRelationScores/SortAndTrimRelations) to
zero out text-matched relations; update kgRelationFromChunk to read similarity
from the chunk keys "_score" or "score" (handle numeric types like float64 and
int) and assign that value to r.Sim after parsing PageRank, ensuring the
function sets KGRelation.Sim from chunk["_score"] or chunk["score"] when
present.
---
Outside diff comments:
In `@internal/service/kg/pipeline.go`:
- Around line 195-202: In Pipeline.searchEntities the SelectFields for KG
searches (e.g., the entsReq and the other SearchRequest used later) omit
"_score" but the code immediately filters results by score; add "_score" to the
SelectFields slices so the engine returns score values before applying the
score-based filtering in searchEntities (update the SelectFields in the
SearchRequest instances referenced in searchEntities to include "_score").
- Around line 177-183: The synthetic KG chunk map currently leaves "doc_id"
empty causing records to be dropped by the handler; fix by assigning a
resolvable synthetic document id when building the chunk in pipeline.go
(populate the "doc_id" key using a stable synthetic pattern tied to p.kbIDs and
the chunk identifier, e.g. a "kg:<kb_id>:<chunk_id>" style), and ensure any
document-metadata fields expected downstream (e.g., "docnm_kwd", "kb_id", and
the chunk id key) are set consistently so the handler's resolution logic (in
dify_retrieval_handler.go) will treat the KG chunk as a valid document rather
than skipping it.
In `@internal/service/kg/retrieval.go`:
- Around line 235-242: The call to chatModel.ModelDriver.ChatWithMessages can
panic if chatModel.ModelDriver is nil; before invoking ChatWithMessages (in the
block checking chatModel != nil && chatModel.ModelName != nil &&
chatModel.APIConfig != nil) add a nil-check for chatModel.ModelDriver and
return/handle the error path if it's nil, e.g. log or return an error indicating
the model driver is uninitialized; ensure you reference the same symbols
(chatModel, ModelDriver, ChatWithMessages, ModelName, APIConfig) when adding the
guard so the code only calls ChatWithMessages on a non-nil ModelDriver.
In `@internal/service/kg/search.go`:
- Around line 97-103: buildKGDenseExpr may panic because it calls
embModel.ModelDriver.Embed without checking ModelDriver; update buildKGDenseExpr
to validate that embModel.ModelDriver is non-nil (and return a sensible nil/err)
before calling Embed, e.g., check embModel != nil && embModel.ModelDriver != nil
and return an error or nil result when the driver is absent, ensuring the
function handles that case gracefully.
---
Nitpick comments:
In `@internal/handler/dify_retrieval_handler_test.go`:
- Around line 322-337: Update TestDifyRetrieval_UseKG to parse the JSON response
body and assert the returned retrieval records include the KG-prepended content
when use_kg=true: after calling r.ServeHTTP, unmarshal w.Body into a response
struct (e.g., with Records []struct{ Content string `json:"content"` }) and add
an assertion that the first record's Content contains the KG label produced by
your mock (for example contains "tag_1" or the expected KG prefix), referencing
TestDifyRetrieval_UseKG, mockMetadataService and its labelQuestionFn to locate
where the KG label is defined.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ca6403fe-063c-4df6-ab55-c4a17fbbb554
📒 Files selected for processing (13)
cmd/server_main.gointernal/handler/dify_retrieval_handler.gointernal/handler/dify_retrieval_handler_test.gointernal/router/router.gointernal/service/kg/pipeline.gointernal/service/kg/retrieval.gointernal/service/kg/retrieval_test.gointernal/service/kg/scoring.gointernal/service/kg/search.gointernal/service/kg/search_test.gointernal/service/kg/testutil_test.gointernal/service/kg/types.gointernal/service/kg_scoring_funcs_test.go
💤 Files with no reviewable changes (1)
- internal/service/kg_scoring_funcs_test.go
| if v := c.Query("top_k"); v != "" { | ||
| if parsed, err := strconv.Atoi(v); err == nil { | ||
| if req.RetrievalSetting == nil { | ||
| req.RetrievalSetting = &difyRetrievalSetting{} | ||
| } | ||
| req.RetrievalSetting.TopK = &parsed | ||
| } | ||
| } | ||
| if v := c.Query("score_threshold"); v != "" { | ||
| if parsed, err := strconv.ParseFloat(v, 64); err == nil { | ||
| if req.RetrievalSetting == nil { | ||
| req.RetrievalSetting = &difyRetrievalSetting{} | ||
| } | ||
| req.RetrievalSetting.ScoreThreshold = &parsed | ||
| } |
There was a problem hiding this comment.
Validate top_k / score_threshold instead of silently accepting invalid inputs.
Line 175 and Line 183 ignore parse errors, and Line 224 accepts non-positive top_k. This can send invalid pagination/scoring parameters downstream with unpredictable behavior.
Proposed fix
if v := c.Query("top_k"); v != "" {
- if parsed, err := strconv.Atoi(v); err == nil {
- if req.RetrievalSetting == nil {
- req.RetrievalSetting = &difyRetrievalSetting{}
- }
- req.RetrievalSetting.TopK = &parsed
- }
+ parsed, err := strconv.Atoi(v)
+ if err != nil || parsed <= 0 {
+ c.JSON(http.StatusBadRequest, gin.H{"code": common.CodeArgumentError, "message": "top_k must be a positive integer"})
+ return
+ }
+ if req.RetrievalSetting == nil {
+ req.RetrievalSetting = &difyRetrievalSetting{}
+ }
+ req.RetrievalSetting.TopK = &parsed
}
if v := c.Query("score_threshold"); v != "" {
- if parsed, err := strconv.ParseFloat(v, 64); err == nil {
- if req.RetrievalSetting == nil {
- req.RetrievalSetting = &difyRetrievalSetting{}
- }
- req.RetrievalSetting.ScoreThreshold = &parsed
- }
+ parsed, err := strconv.ParseFloat(v, 64)
+ if err != nil {
+ c.JSON(http.StatusBadRequest, gin.H{"code": common.CodeArgumentError, "message": "score_threshold must be a number"})
+ return
+ }
+ if req.RetrievalSetting == nil {
+ req.RetrievalSetting = &difyRetrievalSetting{}
+ }
+ req.RetrievalSetting.ScoreThreshold = &parsed
}Also applies to: 223-225
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/handler/dify_retrieval_handler.go` around lines 174 - 188, The
current parsing of c.Query("top_k") and c.Query("score_threshold") in the
handler silently ignores parse errors and allows invalid values; update the
parsing logic around c.Query("top_k") (strconv.Atoi) and
c.Query("score_threshold") (strconv.ParseFloat) to validate inputs and return a
400 error on invalid values instead of silently skipping them: for TopK ensure
parsed > 0 (and reject non-positive integers) before assigning to
req.RetrievalSetting.TopK, and for ScoreThreshold ensure the parsed float is
within an acceptable range (e.g. 0.0–1.0) before assigning to
req.RetrievalSetting.ScoreThreshold; use the handler's error response path (e.g.
c.JSON/c.AbortWithStatusJSON) to surface clear validation messages when strconv
parsing fails or values are out of range.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #15704 +/- ##
=======================================
Coverage 93.16% 93.16%
=======================================
Files 10 10
Lines 717 717
Branches 118 118
=======================================
Hits 668 668
Misses 29 29
Partials 20 20 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Otherwise FuseRelationScores and SortAndTrimRelations multiply zero, making all relations score-zero regardless of relevance.
- Weight int was never read downstream; SortAndTrimRelations uses Sim*PageRank - Python has no Weight field either; weight_int is used as pagerank directly - ParseKGRelationChunks now sets PageRank from weight_int (consistent with kgRelationFromChunk) - Tests updated to assert PageRank instead of Weight
- Use errors.Is(err, gorm.ErrRecordNotFound) to differentiate - Add TestDifyRetrieval_KBDBError for the 500 path - Update KBNotFound test to use gorm.ErrRecordNotFound
- GetByIDs error now returns 500 instead of silently dropping records - Add TestDifyRetrieval_DocLoadError
- Add setupDifyTestNoAuth() without user middleware - Test exercises GetUser → CodeUnauthorized → 401 - Replaces broken stub that bypassed the handler entirely
- SearchKGEntities → SearchEntities - ParseKGEntityChunks → ParseEntityChunks - BuildKGContent → BuildContent - kgEntityFromChunk → entityFromChunk - searchKGTypeSamples → searchTypeSamples - etc. (15 renames total) Package is already kg, so kg.SearchKGEntities stutters.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/service/kg/pipeline.go (1)
68-77:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix the constructor docs to match the actual API.
Line 76 points callers to
WithChatModel/WithEmbModel, but this type only exposesSetChatModelandSetEmbModel, and the fields are unexported. That comment currently documents a usage path that does not exist.✏️ Suggested diff
-// chatModel and embModel should be set via WithChatModel/WithEmbModel setters -// or passed directly after construction. +// chatModel and embModel can be attached after construction via +// SetChatModel and SetEmbModel.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/service/kg/pipeline.go` around lines 68 - 77, The constructor docs for NewPipeline incorrectly reference WithChatModel/WithEmbModel; update the comment to match the actual API by replacing that reference with SetChatModel and SetEmbModel (or explain that chatModel and embModel must be set via the SetChatModel/SetEmbModel methods or passed through the provided options), and note that the model fields are unexported so callers must use those setter methods; ensure the symbols NewPipeline, SetChatModel, and SetEmbModel are mentioned so readers can locate the correct API.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@internal/service/kg/pipeline.go`:
- Around line 68-77: The constructor docs for NewPipeline incorrectly reference
WithChatModel/WithEmbModel; update the comment to match the actual API by
replacing that reference with SetChatModel and SetEmbModel (or explain that
chatModel and embModel must be set via the SetChatModel/SetEmbModel methods or
passed through the provided options), and note that the model fields are
unexported so callers must use those setter methods; ensure the symbols
NewPipeline, SetChatModel, and SetEmbModel are mentioned so readers can locate
the correct API.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2d8ed88e-f724-4a20-95dd-85e92bc3b5ba
📒 Files selected for processing (8)
internal/handler/dify_retrieval_handler.gointernal/handler/dify_retrieval_handler_test.gointernal/service/kg/pipeline.gointernal/service/kg/retrieval.gointernal/service/kg/retrieval_test.gointernal/service/kg/scoring.gointernal/service/kg/search.gointernal/service/kg/search_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
- internal/handler/dify_retrieval_handler.go
- internal/handler/dify_retrieval_handler_test.go
- internal/service/kg/retrieval.go
- internal/service/kg/search.go
- Add RetrievalTest method to SearchbotHandler - Thin handler around chunkService.RetrievalTest (logic already implemented) - 7 tests covering success, errors, auth, and not_found paths
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
internal/handler/searchbot_test.go (1)
45-63: ⚡ Quick winStrengthen success test by asserting request mapping into
RetrievalTestRequest.Current tests validate status outcomes but not that input fields are correctly passed to
chunkSvc. Capture the incomingreqinmockChunkServiceand assert key mappings (e.g.,kb_id,question,meta_data_filter, pagination/top-k) to catch contract regressions.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/handler/searchbot_test.go` around lines 45 - 63, Update TestSearchbotsRetrieval_Basic to capture and assert the request object passed into the mockChunkService: modify the test's mockChunkService implementation to store the incoming RetrievalTestRequest (or the parameter type used by chunkSvc.Search/RunRetrievalTest) into a local variable, call the handler (TestSearchbotsRetrieval_Basic), then assert that the captured request's fields (kb_id, question, meta_data_filter, pagination/top_k) match the JSON input; reference the mockChunkService, RetrievalTestRequest type, and chunkSvc call used in the handler to locate where to capture and assert the mapping.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/handler/searchbot.go`:
- Around line 217-218: Add a nil guard for h.chunkSvc before calling
h.chunkSvc.RetrievalTest; if h.chunkSvc is nil return a clear error (or wrapped
error) instead of invoking RetrievalTest to avoid panics. Locate the call to
h.chunkSvc.RetrievalTest(svcReq, user.ID) in the handler method in searchbot.go,
check if h.chunkSvc == nil and return an appropriate error (e.g., fmt.Errorf or
the handler's standard error type) explaining the missing service, otherwise
proceed to call RetrievalTest.
- Around line 217-224: The handler's error branch for h.chunkSvc.RetrievalTest
currently returns err.Error() to the client which leaks internals; change this
to return a stable generic message (e.g. "internal server error" or "failed to
retrieve chunk") with the common.CodeServerError, and write the full err (and
any contextual fields like user.ID and svcReq) to the server log instead of the
response. Locate the error handling after the call to h.chunkSvc.RetrievalTest
in the searchbot handler and replace the c.JSON call that exposes err.Error()
with a generic client message while using the service/logger available in the
handler (e.g. h.logger, log, or processLogger) to record the detailed error.
- Around line 57-75: The SearchbotRetrievalTestRequest struct accepts a
Highlight field from the client request, but when constructing the service
request (svcReq), this field is not being forwarded. Locate where svcReq is
built and add the Highlight field from the request object to ensure it is
properly passed to the service layer, maintaining consistency with other request
parameters being forwarded.
---
Nitpick comments:
In `@internal/handler/searchbot_test.go`:
- Around line 45-63: Update TestSearchbotsRetrieval_Basic to capture and assert
the request object passed into the mockChunkService: modify the test's
mockChunkService implementation to store the incoming RetrievalTestRequest (or
the parameter type used by chunkSvc.Search/RunRetrievalTest) into a local
variable, call the handler (TestSearchbotsRetrieval_Basic), then assert that the
captured request's fields (kb_id, question, meta_data_filter, pagination/top_k)
match the JSON input; reference the mockChunkService, RetrievalTestRequest type,
and chunkSvc call used in the handler to locate where to capture and assert the
mapping.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 959a53e7-f1bb-457a-a631-6bc89db44dbf
📒 Files selected for processing (3)
internal/handler/searchbot.gointernal/handler/searchbot_test.gointernal/router/router.go
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/router/router.go
| // SearchbotRetrievalTestRequest is the request body for POST /api/v1/searchbots/retrieval_test. | ||
| type SearchbotRetrievalTestRequest struct { | ||
| KbIDs []string `json:"kb_id" binding:"required"` | ||
| Question string `json:"question" binding:"required"` | ||
| Page *int `json:"page,omitempty"` | ||
| Size *int `json:"size,omitempty"` | ||
| DocIDs []string `json:"doc_ids,omitempty"` | ||
| UseKG *bool `json:"use_kg,omitempty"` | ||
| TopK *int `json:"top_k,omitempty"` | ||
| CrossLanguages []string `json:"cross_languages,omitempty"` | ||
| SearchID *string `json:"search_id,omitempty"` | ||
| MetaDataFilter *map[string]interface{} `json:"meta_data_filter,omitempty"` | ||
| TenantRerankID *string `json:"tenant_rerank_id,omitempty"` | ||
| RerankID *string `json:"rerank_id,omitempty"` | ||
| Keyword *bool `json:"keyword,omitempty"` | ||
| SimilarityThreshold *float64 `json:"similarity_threshold,omitempty"` | ||
| VectorSimilarityWeight *float64 `json:"vector_similarity_weight,omitempty"` | ||
| Highlight *bool `json:"highlight,omitempty"` | ||
| } |
There was a problem hiding this comment.
highlight is accepted in request but never forwarded to service.
The handler binds highlight in SearchbotRetrievalTestRequest (Line 74), but svcReq construction does not include it (Line 199 onward). This silently drops client input and breaks request-contract expectations.
Also applies to: 199-215
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/handler/searchbot.go` around lines 57 - 75, The
SearchbotRetrievalTestRequest struct accepts a Highlight field from the client
request, but when constructing the service request (svcReq), this field is not
being forwarded. Locate where svcReq is built and add the Highlight field from
the request object to ensure it is properly passed to the service layer,
maintaining consistency with other request parameters being forwarded.
| result, err := h.chunkSvc.RetrievalTest(svcReq, user.ID) | ||
| if err != nil { |
There was a problem hiding this comment.
Add a nil guard for chunkSvc before invoking it.
h.chunkSvc.RetrievalTest(...) is called unconditionally. If wiring ever passes a nil service, this path will panic instead of returning a controlled error.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/handler/searchbot.go` around lines 217 - 218, Add a nil guard for
h.chunkSvc before calling h.chunkSvc.RetrievalTest; if h.chunkSvc is nil return
a clear error (or wrapped error) instead of invoking RetrievalTest to avoid
panics. Locate the call to h.chunkSvc.RetrievalTest(svcReq, user.ID) in the
handler method in searchbot.go, check if h.chunkSvc == nil and return an
appropriate error (e.g., fmt.Errorf or the handler's standard error type)
explaining the missing service, otherwise proceed to call RetrievalTest.
| result, err := h.chunkSvc.RetrievalTest(svcReq, user.ID) | ||
| if err != nil { | ||
| if strings.Contains(err.Error(), "not_found") { | ||
| c.JSON(http.StatusNotFound, gin.H{"code": common.CodeNotFound, "message": "No chunk found! Check the chunk status please!"}) | ||
| return | ||
| } | ||
| c.JSON(http.StatusInternalServerError, gin.H{"code": common.CodeServerError, "message": err.Error()}) | ||
| return |
There was a problem hiding this comment.
Avoid exposing internal error details in API responses.
Line 223 returns err.Error() directly to clients. That can leak backend internals. Return a stable generic message and log the detailed error server-side.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/handler/searchbot.go` around lines 217 - 224, The handler's error
branch for h.chunkSvc.RetrievalTest currently returns err.Error() to the client
which leaks internals; change this to return a stable generic message (e.g.
"internal server error" or "failed to retrieve chunk") with the
common.CodeServerError, and write the full err (and any contextual fields like
user.ID and svcReq) to the server log instead of the response. Locate the error
handling after the call to h.chunkSvc.RetrievalTest in the searchbot handler and
replace the c.JSON call that exposes err.Error() with a generic client message
while using the service/logger available in the handler (e.g. h.logger, log, or
processLogger) to record the detailed error.
This reverts commit c12f227.
Summary
Dify-compatible retrieval API for external knowledge base integration.
Changes
Files