feat: Dify-compatible retrieval API endpoint by xugangqiang · Pull Request #15704 · infiniflow/ragflow

xugangqiang · 2026-06-05T09:19:37Z

Summary

Dify-compatible retrieval API for external knowledge base integration.

Changes

New handler: DifyRetrievalHandler with POST/GET /api/v1/dify/retrieval
Health check: GET /api/v1/dify/retrieval/health
Full pipeline: KB validation -> permission check -> embedding -> metadata filter -> chunk retrieval -> child chunk aggregation -> optional KG search -> response assembly
12 tests covering all paths (success, errors, metadata filter, KG mode)
Testability: Handler dependencies defined as interfaces (KBServiceIface, ModelServiceIface, etc.)

Files

File	Type
internal/handler/dify_retrieval_handler.go	New — handler + interfaces
internal/handler/dify_retrieval_handler_test.go	New — 12 tests
internal/router/router.go	Modified — route registration
cmd/server_main.go	Modified — handler wiring
internal/service/kg/pipeline.go	Modified — SetChatModel/SetEmbModel
internal/service/kg/retrieval.go	New — helper functions
internal/service/kg/scoring.go	Moved from service package
internal/service/kg/search.go	New — KG search functions
internal/service/kg/types.go	New — type definitions

coderabbitai · 2026-06-05T09:19:44Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a Dify-compatible retrieval handler (GET/POST + health), wires it into server/router, adds a searchbot retrieval-test endpoint, and refactors KG retrieval/search into package kg with renamed APIs, types, and updated tests.

Changes

Dify Retrieval Feature with KG Refactoring

Layer / File(s)	Summary
KG types, pipeline and scoring API rename `internal/service/kg/types.go`, `internal/service/kg/pipeline.go`, `internal/service/kg/scoring.go`, `internal/service/kg/testutil_test.go`	Adds KG domain structs and renames/refactors pipeline/scoring exported APIs to package `kg` (`Pipeline`/`Option`/`NewPipeline`/`BuildContent`) and updates receivers.
KG retrieval, search, and parsing `internal/service/kg/retrieval.go`, `internal/service/kg/search.go`	Renames entrypoint to `Retrieval`, consolidates defaults/fusion weights, updates hybrid/dense fallback behavior, and hardens entity/relation/community/type-sample parsing.
KG tests and integration updates `internal/service/kg/retrieval_test.go`, `internal/service/kg/search_test.go`	Aligns tests to `kg` package and renamed helpers/constants; updates assertions and integration calls; adds small test util helper.
Dify Retrieval Handler and tests `internal/handler/dify_retrieval_handler.go`, `internal/handler/dify_retrieval_handler_test.go`	Adds `DifyRetrievalHandler` (GET/POST binding, auth/authorization, metadata filters, model loading, retrieval and child-chunk enrichment, optional KG pipeline prepend, document fetch/normalization) and comprehensive tests plus a health endpoint.
Searchbot retrieval-test and router/server wiring `internal/handler/searchbot.go`, `internal/handler/searchbot_test.go`, `internal/router/router.go`, `cmd/server_main.go`	Adds `RetrievalTest` endpoint with `ChunkServiceIface`, updates `SearchbotHandler` constructor, wires new routes, and constructs/passes `difyRetrievalHandler` in server startup and router creation.

Sequence Diagram

sequenceDiagram
  participant Client
  participant Router
  participant DifyHandler as DifyRetrievalHandler
  participant KBService as KBService
  participant RetrievalSvc as RetrievalService
  participant KGPipeline as KG Pipeline
  participant DocDAO as DocumentDAO

  Client->>Router: POST/GET /api/v1/dify/retrieval
  Router->>DifyHandler: forward request
  DifyHandler->>DifyHandler: bind & validate input
  DifyHandler->>KBService: Accessible(knowledge_id, user)
  DifyHandler->>RetrievalSvc: Retrieval(nlp.RetrievalRequest)
  RetrievalSvc-->>DifyHandler: chunks
  alt use_kg == true
    DifyHandler->>KGPipeline: NewPipeline(...).Retrieval(ctx)
    KGPipeline-->>DifyHandler: KG chunks
    DifyHandler->>DifyHandler: prepend KG chunk
  end
  DifyHandler->>DocDAO: GetByIDs(aggregated doc IDs)
  DocDAO-->>DifyHandler: documents with metadata
  DifyHandler-->>Client: { "records": [...] }

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly Related PRs

infiniflow/ragflow#15639: Related router/handler wiring for searchbot endpoints and integration surface overlap.
infiniflow/ragflow#15690: Related KG search/retrieval pipeline logic that this PR refactors into package kg.

Suggested Labels

💞 feature, 🧪 test, 🐖api

Suggested Reviewers

yingfeng
yuzhichang
JinHai-CN

Poem

🐰 I hopped through types and pipeline threads,
I coaxed the KG to wake from beds,
A Dify door now greets each query,
Tests hum bright and routes not weary,
The rabbit ships — small, fast, and merry.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.49% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The PR description is well-structured with Summary, Changes, and Files sections, but does not follow the template with 'What problem does this PR solve?' and 'Type of change' checkboxes.	Consider structuring the description to match the template: add a 'What problem does this PR solve?' section with background context, and check the appropriate 'Type of change' checkbox (likely 'New Feature').

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: Dify-compatible retrieval API endpoint' accurately and concisely describes the main change: adding a new Dify-compatible retrieval API endpoint.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

- Add DifyRetrievalHandler with interfaces for testability - POST/GET /api/v1/dify/retrieval endpoint (matching Python impl) - GET /api/v1/dify/retrieval/health endpoint - Wire handler + routes in router.go and server_main.go - Add SetChatModel/SetEmbModel setters to KGSearchPipeline - 12 tests covering all paths (success, errors, KG, meta filter) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

- Fix metadata condition JSON field mapping (name→Key, comparison_operator→Op) - Extract top_k/score_threshold from GET query params manually - Return 500 on KG chat model fetch failure (instead of silent skip) - Default metadata condition logic to "and" when omitted - Remove extra "message" field from health check response Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Option A — delete unused interface. KG pipeline is created inline where needed (6 lines), no polymorphism required. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

- Move Pipeline, scoring, search, types, helpers to internal/service/kg/ - Keep backward-compat type aliases and function wrappers in service package - Update handler to import kg package directly - Fix test package boundaries (consistent package kg) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

- Pipeline (was KGSearchPipeline) - Option (was KGSearchOption) - NewPipeline (was NewKGSearchPipeline) - WithSimThreshold / WithDenseTopK (was WithKG*) - Retrieval (was KGSearchRetrieval) - defaultSimThreshold / defaultDenseTopK (was defaultKG*) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…to 0.3 when nil)

- Top and SimilarityThreshold now passed as nil when user omits them - Service layer nil-guard handles defaults (1024, 0.0, 0.3) - Remove redundant default-setting in handler - Single source of truth for defaults in nlp/service layer

…nExpr

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

internal/service/kg/search.go (1)
97-103: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Avoid nil dereference in embedding path (embModel.ModelDriver).

At Line 102, embModel.ModelDriver.Embed(...) is called without verifying ModelDriver is non-nil. This can panic when an embedding model record is loaded but driver initialization failed/skipped.
🔧 Proposed fix
-func buildKGDenseExpr(embModel *modelModule.EmbeddingModel, question string, topN int) (*types.MatchDenseExpr, error) {
-	if embModel == nil || question == "" {
+func buildKGDenseExpr(embModel *modelModule.EmbeddingModel, question string, topN int) (*types.MatchDenseExpr, error) {
+	if embModel == nil || embModel.ModelDriver == nil || question == "" {
 		return nil, nil
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/service/kg/search.go` around lines 97 - 103, buildKGDenseExpr may
panic because it calls embModel.ModelDriver.Embed without checking ModelDriver;
update buildKGDenseExpr to validate that embModel.ModelDriver is non-nil (and
return a sensible nil/err) before calling Embed, e.g., check embModel != nil &&
embModel.ModelDriver != nil and return an error or nil result when the driver is
absent, ensuring the function handles that case gracefully.
internal/service/kg/pipeline.go (2)
195-202: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Include _score in KG search projections before score filtering.

At Line 199 and Line 261, SelectFields omits _score, but Line 215 and Line 277 immediately filter by score. If the engine does not return score fields by default, all rows can be filtered out incorrectly.
🔧 Proposed fix
- SelectFields: []string{"entity_kwd", "entity_type_kwd", "rank_flt", "content_with_weight", "n_hop_with_weight"},
+ SelectFields: []string{"entity_kwd", "entity_type_kwd", "rank_flt", "content_with_weight", "n_hop_with_weight", "_score"},
...
- SelectFields: []string{"from_entity_kwd", "to_entity_kwd", "weight_int", "content_with_weight"},
+ SelectFields: []string{"from_entity_kwd", "to_entity_kwd", "weight_int", "content_with_weight", "_score"},
Also applies to: 257-264
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/service/kg/pipeline.go` around lines 195 - 202, In
Pipeline.searchEntities the SelectFields for KG searches (e.g., the entsReq and
the other SearchRequest used later) omit "_score" but the code immediately
filters results by score; add "_score" to the SelectFields slices so the engine
returns score values before applying the score-based filtering in searchEntities
(update the SelectFields in the SearchRequest instances referenced in
searchEntities to include "_score").
177-183: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Synthetic KG chunk currently violates downstream record contract.

At Line 181, doc_id is empty. In internal/handler/dify_retrieval_handler.go (Line 326-359), records are skipped when doc_id cannot resolve to a document, so KG content can be silently dropped from API responses.

Please align the contract across layers: either (a) special-case synthetic KG chunks in handler serialization, or (b) attach a resolvable synthetic document contract.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/service/kg/pipeline.go` around lines 177 - 183, The synthetic KG
chunk map currently leaves "doc_id" empty causing records to be dropped by the
handler; fix by assigning a resolvable synthetic document id when building the
chunk in pipeline.go (populate the "doc_id" key using a stable synthetic pattern
tied to p.kbIDs and the chunk identifier, e.g. a "kg:<kb_id>:<chunk_id>" style),
and ensure any document-metadata fields expected downstream (e.g., "docnm_kwd",
"kb_id", and the chunk id key) are set consistently so the handler's resolution
logic (in dify_retrieval_handler.go) will treat the KG chunk as a valid document
rather than skipping it.
internal/service/kg/retrieval.go (1)
235-242: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Guard chatModel.ModelDriver before invoking chat completion.

At Line 241, chatModel.ModelDriver.ChatWithMessages(...) is called without checking chatModel.ModelDriver != nil. A partially initialized model can panic this path.
🔧 Proposed fix
-if chatModel != nil && chatModel.ModelName != nil && chatModel.APIConfig != nil {
+if chatModel != nil && chatModel.ModelDriver != nil && chatModel.ModelName != nil && chatModel.APIConfig != nil {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/service/kg/retrieval.go` around lines 235 - 242, The call to
chatModel.ModelDriver.ChatWithMessages can panic if chatModel.ModelDriver is
nil; before invoking ChatWithMessages (in the block checking chatModel != nil &&
chatModel.ModelName != nil && chatModel.APIConfig != nil) add a nil-check for
chatModel.ModelDriver and return/handle the error path if it's nil, e.g. log or
return an error indicating the model driver is uninitialized; ensure you
reference the same symbols (chatModel, ModelDriver, ChatWithMessages, ModelName,
APIConfig) when adding the guard so the code only calls ChatWithMessages on a
non-nil ModelDriver.

🧹 Nitpick comments (1)

internal/handler/dify_retrieval_handler_test.go (1)

322-337: ⚡ Quick win

Strengthen KG-path assertions in TestDifyRetrieval_UseKG.

This test only checks status code, so KG integration can regress without test failure. Assert that records includes the KG-prepended content when use_kg=true.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/handler/dify_retrieval_handler_test.go` around lines 322 - 337,
Update TestDifyRetrieval_UseKG to parse the JSON response body and assert the
returned retrieval records include the KG-prepended content when use_kg=true:
after calling r.ServeHTTP, unmarshal w.Body into a response struct (e.g., with
Records []struct{ Content string `json:"content"` }) and add an assertion that
the first record's Content contains the KG label produced by your mock (for
example contains "tag_1" or the expected KG prefix), referencing
TestDifyRetrieval_UseKG, mockMetadataService and its labelQuestionFn to locate
where the KG label is defined.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/handler/dify_retrieval_handler_test.go`:
- Around line 262-277: TestDifyRetrieval_NoAuth mounts a stub that always
returns 401 so it never exercises DifyRetrievalHandler.Retrieval or GetUser;
replace the stub route with the real handler (register
DifyRetrievalHandler.Retrieval on POST "/api/v1/dify/retrieval") and arrange the
auth path to fail by mocking/stubbing GetUser (or the auth middleware used by
Retrieval) to return an unauthorized error for the request; then send the same
POST and assert w.Code == http.StatusUnauthorized so the actual handler/auth
logic is exercised.

In `@internal/handler/dify_retrieval_handler.go`:
- Around line 317-324: The current logic swallows errors from docDAO.GetByIDs
and drops retrieval rows when a document isn't hydrated; change it so GetByIDs
failures are returned as errors (do not convert them to empty results) and when
iterating chunks (the loop that uses docMap and checks docMap[d.ID]) do not
unconditionally skip rows if the document is missing—preserve the retrieval row
(e.g., keep doc nil or attach a placeholder) so KG rows without doc_id still
pass through; refer to allDocIDs, h.docDAO.GetByIDs, and docMap to locate and
implement these changes.
- Around line 174-188: The current parsing of c.Query("top_k") and
c.Query("score_threshold") in the handler silently ignores parse errors and
allows invalid values; update the parsing logic around c.Query("top_k")
(strconv.Atoi) and c.Query("score_threshold") (strconv.ParseFloat) to validate
inputs and return a 400 error on invalid values instead of silently skipping
them: for TopK ensure parsed > 0 (and reject non-positive integers) before
assigning to req.RetrievalSetting.TopK, and for ScoreThreshold ensure the parsed
float is within an acceptable range (e.g. 0.0–1.0) before assigning to
req.RetrievalSetting.ScoreThreshold; use the handler's error response path (e.g.
c.JSON/c.AbortWithStatusJSON) to surface clear validation messages when strconv
parsing fails or values are out of range.
- Around line 202-205: The current handler treats any error from h.kbSvc.GetByID
as a 404; change the logic in the retrieval block that calls h.kbSvc.GetByID so
that if err != nil you return a 500 (use c.JSON with
http.StatusInternalServerError and an appropriate message/log) and only return
404 when kb == nil (keeping common.CodeNotFound for that branch); update the
error response for the 500 path to include minimal context (e.g., "Failed to
fetch Knowledgebase") while preserving the 404 branch for truly missing
resources.

In `@internal/service/kg/retrieval.go`:
- Around line 175-186: kgRelationFromChunk currently leaves KGRelation.Sim
unset, causing downstream scoring (FuseRelationScores/SortAndTrimRelations) to
zero out text-matched relations; update kgRelationFromChunk to read similarity
from the chunk keys "_score" or "score" (handle numeric types like float64 and
int) and assign that value to r.Sim after parsing PageRank, ensuring the
function sets KGRelation.Sim from chunk["_score"] or chunk["score"] when
present.

---

Outside diff comments:
In `@internal/service/kg/pipeline.go`:
- Around line 195-202: In Pipeline.searchEntities the SelectFields for KG
searches (e.g., the entsReq and the other SearchRequest used later) omit
"_score" but the code immediately filters results by score; add "_score" to the
SelectFields slices so the engine returns score values before applying the
score-based filtering in searchEntities (update the SelectFields in the
SearchRequest instances referenced in searchEntities to include "_score").
- Around line 177-183: The synthetic KG chunk map currently leaves "doc_id"
empty causing records to be dropped by the handler; fix by assigning a
resolvable synthetic document id when building the chunk in pipeline.go
(populate the "doc_id" key using a stable synthetic pattern tied to p.kbIDs and
the chunk identifier, e.g. a "kg:<kb_id>:<chunk_id>" style), and ensure any
document-metadata fields expected downstream (e.g., "docnm_kwd", "kb_id", and
the chunk id key) are set consistently so the handler's resolution logic (in
dify_retrieval_handler.go) will treat the KG chunk as a valid document rather
than skipping it.

In `@internal/service/kg/retrieval.go`:
- Around line 235-242: The call to chatModel.ModelDriver.ChatWithMessages can
panic if chatModel.ModelDriver is nil; before invoking ChatWithMessages (in the
block checking chatModel != nil && chatModel.ModelName != nil &&
chatModel.APIConfig != nil) add a nil-check for chatModel.ModelDriver and
return/handle the error path if it's nil, e.g. log or return an error indicating
the model driver is uninitialized; ensure you reference the same symbols
(chatModel, ModelDriver, ChatWithMessages, ModelName, APIConfig) when adding the
guard so the code only calls ChatWithMessages on a non-nil ModelDriver.

In `@internal/service/kg/search.go`:
- Around line 97-103: buildKGDenseExpr may panic because it calls
embModel.ModelDriver.Embed without checking ModelDriver; update buildKGDenseExpr
to validate that embModel.ModelDriver is non-nil (and return a sensible nil/err)
before calling Embed, e.g., check embModel != nil && embModel.ModelDriver != nil
and return an error or nil result when the driver is absent, ensuring the
function handles that case gracefully.

---

Nitpick comments:
In `@internal/handler/dify_retrieval_handler_test.go`:
- Around line 322-337: Update TestDifyRetrieval_UseKG to parse the JSON response
body and assert the returned retrieval records include the KG-prepended content
when use_kg=true: after calling r.ServeHTTP, unmarshal w.Body into a response
struct (e.g., with Records []struct{ Content string `json:"content"` }) and add
an assertion that the first record's Content contains the KG label produced by
your mock (for example contains "tag_1" or the expected KG prefix), referencing
TestDifyRetrieval_UseKG, mockMetadataService and its labelQuestionFn to locate
where the KG label is defined.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ca6403fe-063c-4df6-ab55-c4a17fbbb554

📥 Commits

Reviewing files that changed from the base of the PR and between ea79d65 and ff8b4df.

📒 Files selected for processing (13)

cmd/server_main.go
internal/handler/dify_retrieval_handler.go
internal/handler/dify_retrieval_handler_test.go
internal/router/router.go
internal/service/kg/pipeline.go
internal/service/kg/retrieval.go
internal/service/kg/retrieval_test.go
internal/service/kg/scoring.go
internal/service/kg/search.go
internal/service/kg/search_test.go
internal/service/kg/testutil_test.go
internal/service/kg/types.go
internal/service/kg_scoring_funcs_test.go

💤 Files with no reviewable changes (1)

internal/service/kg_scoring_funcs_test.go

coderabbitai · 2026-06-05T10:13:57Z

+		if v := c.Query("top_k"); v != "" {
+			if parsed, err := strconv.Atoi(v); err == nil {
+				if req.RetrievalSetting == nil {
+					req.RetrievalSetting = &difyRetrievalSetting{}
+				}
+				req.RetrievalSetting.TopK = &parsed
+			}
+		}
+		if v := c.Query("score_threshold"); v != "" {
+			if parsed, err := strconv.ParseFloat(v, 64); err == nil {
+				if req.RetrievalSetting == nil {
+					req.RetrievalSetting = &difyRetrievalSetting{}
+				}
+				req.RetrievalSetting.ScoreThreshold = &parsed
+			}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate top_k / score_threshold instead of silently accepting invalid inputs.

Line 175 and Line 183 ignore parse errors, and Line 224 accepts non-positive top_k. This can send invalid pagination/scoring parameters downstream with unpredictable behavior.

Proposed fix

if v := c.Query("top_k"); v != "" { - if parsed, err := strconv.Atoi(v); err == nil { - if req.RetrievalSetting == nil { - req.RetrievalSetting = &difyRetrievalSetting{} - } - req.RetrievalSetting.TopK = &parsed - } + parsed, err := strconv.Atoi(v) + if err != nil || parsed <= 0 { + c.JSON(http.StatusBadRequest, gin.H{"code": common.CodeArgumentError, "message": "top_k must be a positive integer"}) + return + } + if req.RetrievalSetting == nil { + req.RetrievalSetting = &difyRetrievalSetting{} + } + req.RetrievalSetting.TopK = &parsed } if v := c.Query("score_threshold"); v != "" { - if parsed, err := strconv.ParseFloat(v, 64); err == nil { - if req.RetrievalSetting == nil { - req.RetrievalSetting = &difyRetrievalSetting{} - } - req.RetrievalSetting.ScoreThreshold = &parsed - } + parsed, err := strconv.ParseFloat(v, 64) + if err != nil { + c.JSON(http.StatusBadRequest, gin.H{"code": common.CodeArgumentError, "message": "score_threshold must be a number"}) + return + } + if req.RetrievalSetting == nil { + req.RetrievalSetting = &difyRetrievalSetting{} + } + req.RetrievalSetting.ScoreThreshold = &parsed }

Also applies to: 223-225

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/handler/dify_retrieval_handler.go` around lines 174 - 188, The current parsing of c.Query("top_k") and c.Query("score_threshold") in the handler silently ignores parse errors and allows invalid values; update the parsing logic around c.Query("top_k") (strconv.Atoi) and c.Query("score_threshold") (strconv.ParseFloat) to validate inputs and return a 400 error on invalid values instead of silently skipping them: for TopK ensure parsed > 0 (and reject non-positive integers) before assigning to req.RetrievalSetting.TopK, and for ScoreThreshold ensure the parsed float is within an acceptable range (e.g. 0.0–1.0) before assigning to req.RetrievalSetting.ScoreThreshold; use the handler's error response path (e.g. c.JSON/c.AbortWithStatusJSON) to surface clear validation messages when strconv parsing fails or values are out of range.

codecov · 2026-06-05T10:21:01Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.16%. Comparing base (aab01af) to head (ef984a1).
⚠️ Report is 6 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main   #15704   +/-   ##
=======================================
  Coverage   93.16%   93.16%           
=======================================
  Files          10       10           
  Lines         717      717           
  Branches      118      118           
=======================================
  Hits          668      668           
  Misses         29       29           
  Partials       20       20

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Otherwise FuseRelationScores and SortAndTrimRelations multiply zero, making all relations score-zero regardless of relevance.

- Weight int was never read downstream; SortAndTrimRelations uses Sim*PageRank - Python has no Weight field either; weight_int is used as pagerank directly - ParseKGRelationChunks now sets PageRank from weight_int (consistent with kgRelationFromChunk) - Tests updated to assert PageRank instead of Weight

- Use errors.Is(err, gorm.ErrRecordNotFound) to differentiate - Add TestDifyRetrieval_KBDBError for the 500 path - Update KBNotFound test to use gorm.ErrRecordNotFound

- GetByIDs error now returns 500 instead of silently dropping records - Add TestDifyRetrieval_DocLoadError

- Add setupDifyTestNoAuth() without user middleware - Test exercises GetUser → CodeUnauthorized → 401 - Replaces broken stub that bypassed the handler entirely

- SearchKGEntities → SearchEntities - ParseKGEntityChunks → ParseEntityChunks - BuildKGContent → BuildContent - kgEntityFromChunk → entityFromChunk - searchKGTypeSamples → searchTypeSamples - etc. (15 renames total) Package is already kg, so kg.SearchKGEntities stutters.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

internal/service/kg/pipeline.go (1)

68-77: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the constructor docs to match the actual API.

Line 76 points callers to WithChatModel/WithEmbModel, but this type only exposes SetChatModel and SetEmbModel, and the fields are unexported. That comment currently documents a usage path that does not exist.

✏️ Suggested diff

-// chatModel and embModel should be set via WithChatModel/WithEmbModel setters
-// or passed directly after construction.
+// chatModel and embModel can be attached after construction via
+// SetChatModel and SetEmbModel.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/service/kg/pipeline.go` around lines 68 - 77, The constructor docs
for NewPipeline incorrectly reference WithChatModel/WithEmbModel; update the
comment to match the actual API by replacing that reference with SetChatModel
and SetEmbModel (or explain that chatModel and embModel must be set via the
SetChatModel/SetEmbModel methods or passed through the provided options), and
note that the model fields are unexported so callers must use those setter
methods; ensure the symbols NewPipeline, SetChatModel, and SetEmbModel are
mentioned so readers can locate the correct API.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@internal/service/kg/pipeline.go`:
- Around line 68-77: The constructor docs for NewPipeline incorrectly reference
WithChatModel/WithEmbModel; update the comment to match the actual API by
replacing that reference with SetChatModel and SetEmbModel (or explain that
chatModel and embModel must be set via the SetChatModel/SetEmbModel methods or
passed through the provided options), and note that the model fields are
unexported so callers must use those setter methods; ensure the symbols
NewPipeline, SetChatModel, and SetEmbModel are mentioned so readers can locate
the correct API.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2d8ed88e-f724-4a20-95dd-85e92bc3b5ba

📥 Commits

Reviewing files that changed from the base of the PR and between 938e489 and 936bfde.

📒 Files selected for processing (8)

internal/handler/dify_retrieval_handler.go
internal/handler/dify_retrieval_handler_test.go
internal/service/kg/pipeline.go
internal/service/kg/retrieval.go
internal/service/kg/retrieval_test.go
internal/service/kg/scoring.go
internal/service/kg/search.go
internal/service/kg/search_test.go

🚧 Files skipped from review as they are similar to previous changes (4)

internal/handler/dify_retrieval_handler.go
internal/handler/dify_retrieval_handler_test.go
internal/service/kg/retrieval.go
internal/service/kg/search.go

- Add RetrievalTest method to SearchbotHandler - Thin handler around chunkService.RetrievalTest (logic already implemented) - 7 tests covering success, errors, auth, and not_found paths

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

internal/handler/searchbot_test.go (1)
45-63: ⚡ Quick win

Strengthen success test by asserting request mapping into RetrievalTestRequest.

Current tests validate status outcomes but not that input fields are correctly passed to chunkSvc. Capture the incoming req in mockChunkService and assert key mappings (e.g., kb_id, question, meta_data_filter, pagination/top-k) to catch contract regressions.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/handler/searchbot_test.go` around lines 45 - 63, Update
TestSearchbotsRetrieval_Basic to capture and assert the request object passed
into the mockChunkService: modify the test's mockChunkService implementation to
store the incoming RetrievalTestRequest (or the parameter type used by
chunkSvc.Search/RunRetrievalTest) into a local variable, call the handler
(TestSearchbotsRetrieval_Basic), then assert that the captured request's fields
(kb_id, question, meta_data_filter, pagination/top_k) match the JSON input;
reference the mockChunkService, RetrievalTestRequest type, and chunkSvc call
used in the handler to locate where to capture and assert the mapping.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/handler/searchbot.go`:
- Around line 217-218: Add a nil guard for h.chunkSvc before calling
h.chunkSvc.RetrievalTest; if h.chunkSvc is nil return a clear error (or wrapped
error) instead of invoking RetrievalTest to avoid panics. Locate the call to
h.chunkSvc.RetrievalTest(svcReq, user.ID) in the handler method in searchbot.go,
check if h.chunkSvc == nil and return an appropriate error (e.g., fmt.Errorf or
the handler's standard error type) explaining the missing service, otherwise
proceed to call RetrievalTest.
- Around line 217-224: The handler's error branch for h.chunkSvc.RetrievalTest
currently returns err.Error() to the client which leaks internals; change this
to return a stable generic message (e.g. "internal server error" or "failed to
retrieve chunk") with the common.CodeServerError, and write the full err (and
any contextual fields like user.ID and svcReq) to the server log instead of the
response. Locate the error handling after the call to h.chunkSvc.RetrievalTest
in the searchbot handler and replace the c.JSON call that exposes err.Error()
with a generic client message while using the service/logger available in the
handler (e.g. h.logger, log, or processLogger) to record the detailed error.
- Around line 57-75: The SearchbotRetrievalTestRequest struct accepts a
Highlight field from the client request, but when constructing the service
request (svcReq), this field is not being forwarded. Locate where svcReq is
built and add the Highlight field from the request object to ensure it is
properly passed to the service layer, maintaining consistency with other request
parameters being forwarded.

---

Nitpick comments:
In `@internal/handler/searchbot_test.go`:
- Around line 45-63: Update TestSearchbotsRetrieval_Basic to capture and assert
the request object passed into the mockChunkService: modify the test's
mockChunkService implementation to store the incoming RetrievalTestRequest (or
the parameter type used by chunkSvc.Search/RunRetrievalTest) into a local
variable, call the handler (TestSearchbotsRetrieval_Basic), then assert that the
captured request's fields (kb_id, question, meta_data_filter, pagination/top_k)
match the JSON input; reference the mockChunkService, RetrievalTestRequest type,
and chunkSvc call used in the handler to locate where to capture and assert the
mapping.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 959a53e7-f1bb-457a-a631-6bc89db44dbf

📥 Commits

Reviewing files that changed from the base of the PR and between 936bfde and c12f227.

📒 Files selected for processing (3)

internal/handler/searchbot.go
internal/handler/searchbot_test.go
internal/router/router.go

🚧 Files skipped from review as they are similar to previous changes (1)

internal/router/router.go

coderabbitai · 2026-06-05T11:50:38Z

+// SearchbotRetrievalTestRequest is the request body for POST /api/v1/searchbots/retrieval_test.
+type SearchbotRetrievalTestRequest struct {
+	KbIDs                  []string                `json:"kb_id" binding:"required"`
+	Question               string                  `json:"question" binding:"required"`
+	Page                   *int                    `json:"page,omitempty"`
+	Size                   *int                    `json:"size,omitempty"`
+	DocIDs                 []string                `json:"doc_ids,omitempty"`
+	UseKG                  *bool                   `json:"use_kg,omitempty"`
+	TopK                   *int                    `json:"top_k,omitempty"`
+	CrossLanguages         []string                `json:"cross_languages,omitempty"`
+	SearchID               *string                 `json:"search_id,omitempty"`
+	MetaDataFilter         *map[string]interface{} `json:"meta_data_filter,omitempty"`
+	TenantRerankID         *string                 `json:"tenant_rerank_id,omitempty"`
+	RerankID               *string                 `json:"rerank_id,omitempty"`
+	Keyword                *bool                   `json:"keyword,omitempty"`
+	SimilarityThreshold    *float64                `json:"similarity_threshold,omitempty"`
+	VectorSimilarityWeight *float64                `json:"vector_similarity_weight,omitempty"`
+	Highlight              *bool                   `json:"highlight,omitempty"`
+}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

highlight is accepted in request but never forwarded to service.

The handler binds highlight in SearchbotRetrievalTestRequest (Line 74), but svcReq construction does not include it (Line 199 onward). This silently drops client input and breaks request-contract expectations.

Also applies to: 199-215

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/handler/searchbot.go` around lines 57 - 75, The SearchbotRetrievalTestRequest struct accepts a Highlight field from the client request, but when constructing the service request (svcReq), this field is not being forwarded. Locate where svcReq is built and add the Highlight field from the request object to ensure it is properly passed to the service layer, maintaining consistency with other request parameters being forwarded.

coderabbitai · 2026-06-05T11:50:38Z

+	result, err := h.chunkSvc.RetrievalTest(svcReq, user.ID)
+	if err != nil {


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add a nil guard for chunkSvc before invoking it.

h.chunkSvc.RetrievalTest(...) is called unconditionally. If wiring ever passes a nil service, this path will panic instead of returning a controlled error.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/handler/searchbot.go` around lines 217 - 218, Add a nil guard for h.chunkSvc before calling h.chunkSvc.RetrievalTest; if h.chunkSvc is nil return a clear error (or wrapped error) instead of invoking RetrievalTest to avoid panics. Locate the call to h.chunkSvc.RetrievalTest(svcReq, user.ID) in the handler method in searchbot.go, check if h.chunkSvc == nil and return an appropriate error (e.g., fmt.Errorf or the handler's standard error type) explaining the missing service, otherwise proceed to call RetrievalTest.

coderabbitai · 2026-06-05T11:50:38Z

+	result, err := h.chunkSvc.RetrievalTest(svcReq, user.ID)
+	if err != nil {
+		if strings.Contains(err.Error(), "not_found") {
+			c.JSON(http.StatusNotFound, gin.H{"code": common.CodeNotFound, "message": "No chunk found! Check the chunk status please!"})
+			return
+		}
+		c.JSON(http.StatusInternalServerError, gin.H{"code": common.CodeServerError, "message": err.Error()})
+		return


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid exposing internal error details in API responses.

Line 223 returns err.Error() directly to clients. That can leak backend internals. Return a stable generic message and log the detailed error server-side.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/handler/searchbot.go` around lines 217 - 224, The handler's error branch for h.chunkSvc.RetrievalTest currently returns err.Error() to the client which leaks internals; change this to return a stable generic message (e.g. "internal server error" or "failed to retrieve chunk") with the common.CodeServerError, and write the full err (and any contextual fields like user.ID and svcReq) to the server log instead of the response. Locate the error handling after the call to h.chunkSvc.RetrievalTest in the searchbot handler and replace the c.JSON call that exposes err.Error() with a generic client message while using the service/logger available in the handler (e.g. h.logger, log, or processLogger) to record the detailed error.

This reverts commit c12f227.

xugangqiang added the ci Continue Integration label Jun 5, 2026

xugangqiang and others added 10 commits June 5, 2026 18:02

chore: remove dead KGPipelineIface interface

9cce96d

Option A — delete unused interface. KG pipeline is created inline where needed (6 lines), no polymorphism required. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

chore: remove dead kg_retrieval.go wrapper (no external callers)

6734452

chore: remove dead kg_search.go wrapper (no external callers)

24d7aa5

chore: remove redundant VectorSimilarityWeight (nlp service defaults …

8c82b85

…to 0.3 when nil)

refactor: replace inline fusion weights with constants and buildFusio…

ff8b4df

…nExpr

xugangqiang force-pushed the feat/dify-retrieval-api branch from 7b0a9a6 to ff8b4df Compare June 5, 2026 10:03

xugangqiang marked this pull request as ready for review June 5, 2026 10:03

dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Jun 5, 2026

coderabbitai Bot reviewed Jun 5, 2026

View reviewed changes

xugangqiang added 7 commits June 5, 2026 18:34

fix: initialize KGRelation.Sim from _score/score in kgRelationFromChunk

e5e1548

Otherwise FuseRelationScores and SortAndTrimRelations multiply zero, making all relations score-zero regardless of relevance.

fix: distinguish 404 (not found) vs 500 (DB error) in KB lookup

ca25f98

- Use errors.Is(err, gorm.ErrRecordNotFound) to differentiate - Add TestDifyRetrieval_KBDBError for the 500 path - Update KBNotFound test to use gorm.ErrRecordNotFound

fix: return 500 when document loading fails, add test

89857ac

- GetByIDs error now returns 500 instead of silently dropping records - Add TestDifyRetrieval_DocLoadError

fix: TestDifyRetrieval_NoAuth now tests real handler auth path

9f58fc3

- Add setupDifyTestNoAuth() without user middleware - Test exercises GetUser → CodeUnauthorized → 401 - Replaces broken stub that bypassed the handler entirely

test: add RetrievalNotFound path (search returns "not_found" → 404)

fd9e29a

coderabbitai Bot reviewed Jun 5, 2026

View reviewed changes

feat: implement POST /api/v1/searchbots/retrieval_test

c12f227

- Add RetrievalTest method to SearchbotHandler - Thin handler around chunkService.RetrievalTest (logic already implemented) - 7 tests covering success, errors, auth, and not_found paths

coderabbitai Bot reviewed Jun 5, 2026

View reviewed changes

Revert "feat: implement POST /api/v1/searchbots/retrieval_test"

ef984a1

This reverts commit c12f227.

xugangqiang requested review from yingfeng and yuzhichang June 5, 2026 13:13

xugangqiang requested a review from JinHai-CN June 5, 2026 13:13

yingfeng merged commit 5a04ac0 into infiniflow:main Jun 5, 2026
2 checks passed

coderabbitai Bot mentioned this pull request Jun 8, 2026

feat: implement POST /api/v1/searchbots/retrieval_test #15710

Merged

3 tasks

dripsmvcp mentioned this pull request Jun 8, 2026

feat[Go]: implement GET /dify/retrieval/health (issue #15240) #15571

Closed

3 tasks

coderabbitai Bot mentioned this pull request Jun 9, 2026

feat: implement POST /api/v1/searchbots/ask — streaming RAG with citations and think-tag processing #15825

Open

		result, err := h.chunkSvc.RetrievalTest(svcReq, user.ID)
		if err != nil {

Conversation

xugangqiang commented Jun 5, 2026

Summary

Changes

Files

Uh oh!

coderabbitai Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Possibly Related PRs

Suggested Labels

Suggested Reviewers

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading

codecov Bot commented Jun 5, 2026 •

edited

Loading