Skip to content

feat: Dify-compatible retrieval API endpoint#15704

Merged
yingfeng merged 19 commits into
infiniflow:mainfrom
xugangqiang:feat/dify-retrieval-api
Jun 5, 2026
Merged

feat: Dify-compatible retrieval API endpoint#15704
yingfeng merged 19 commits into
infiniflow:mainfrom
xugangqiang:feat/dify-retrieval-api

Conversation

@xugangqiang

Copy link
Copy Markdown
Collaborator

Summary

Dify-compatible retrieval API for external knowledge base integration.

Changes

  • New handler: DifyRetrievalHandler with POST/GET /api/v1/dify/retrieval
  • Health check: GET /api/v1/dify/retrieval/health
  • Full pipeline: KB validation -> permission check -> embedding -> metadata filter -> chunk retrieval -> child chunk aggregation -> optional KG search -> response assembly
  • 12 tests covering all paths (success, errors, metadata filter, KG mode)
  • Testability: Handler dependencies defined as interfaces (KBServiceIface, ModelServiceIface, etc.)

Files

File Type
internal/handler/dify_retrieval_handler.go New — handler + interfaces
internal/handler/dify_retrieval_handler_test.go New — 12 tests
internal/router/router.go Modified — route registration
cmd/server_main.go Modified — handler wiring
internal/service/kg/pipeline.go Modified — SetChatModel/SetEmbModel
internal/service/kg/retrieval.go New — helper functions
internal/service/kg/scoring.go Moved from service package
internal/service/kg/search.go New — KG search functions
internal/service/kg/types.go New — type definitions

@xugangqiang xugangqiang added the ci Continue Integration label Jun 5, 2026
@coderabbitai

coderabbitai Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a Dify-compatible retrieval handler (GET/POST + health), wires it into server/router, adds a searchbot retrieval-test endpoint, and refactors KG retrieval/search into package kg with renamed APIs, types, and updated tests.

Changes

Dify Retrieval Feature with KG Refactoring

Layer / File(s) Summary
KG types, pipeline and scoring API rename
internal/service/kg/types.go, internal/service/kg/pipeline.go, internal/service/kg/scoring.go, internal/service/kg/testutil_test.go
Adds KG domain structs and renames/refactors pipeline/scoring exported APIs to package kg (Pipeline/Option/NewPipeline/BuildContent) and updates receivers.
KG retrieval, search, and parsing
internal/service/kg/retrieval.go, internal/service/kg/search.go
Renames entrypoint to Retrieval, consolidates defaults/fusion weights, updates hybrid/dense fallback behavior, and hardens entity/relation/community/type-sample parsing.
KG tests and integration updates
internal/service/kg/retrieval_test.go, internal/service/kg/search_test.go
Aligns tests to kg package and renamed helpers/constants; updates assertions and integration calls; adds small test util helper.
Dify Retrieval Handler and tests
internal/handler/dify_retrieval_handler.go, internal/handler/dify_retrieval_handler_test.go
Adds DifyRetrievalHandler (GET/POST binding, auth/authorization, metadata filters, model loading, retrieval and child-chunk enrichment, optional KG pipeline prepend, document fetch/normalization) and comprehensive tests plus a health endpoint.
Searchbot retrieval-test and router/server wiring
internal/handler/searchbot.go, internal/handler/searchbot_test.go, internal/router/router.go, cmd/server_main.go
Adds RetrievalTest endpoint with ChunkServiceIface, updates SearchbotHandler constructor, wires new routes, and constructs/passes difyRetrievalHandler in server startup and router creation.

Sequence Diagram

sequenceDiagram
  participant Client
  participant Router
  participant DifyHandler as DifyRetrievalHandler
  participant KBService as KBService
  participant RetrievalSvc as RetrievalService
  participant KGPipeline as KG Pipeline
  participant DocDAO as DocumentDAO

  Client->>Router: POST/GET /api/v1/dify/retrieval
  Router->>DifyHandler: forward request
  DifyHandler->>DifyHandler: bind & validate input
  DifyHandler->>KBService: Accessible(knowledge_id, user)
  DifyHandler->>RetrievalSvc: Retrieval(nlp.RetrievalRequest)
  RetrievalSvc-->>DifyHandler: chunks
  alt use_kg == true
    DifyHandler->>KGPipeline: NewPipeline(...).Retrieval(ctx)
    KGPipeline-->>DifyHandler: KG chunks
    DifyHandler->>DifyHandler: prepend KG chunk
  end
  DifyHandler->>DocDAO: GetByIDs(aggregated doc IDs)
  DocDAO-->>DifyHandler: documents with metadata
  DifyHandler-->>Client: { "records": [...] }
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly Related PRs

Suggested Labels

💞 feature, 🧪 test, 🐖api

Suggested Reviewers

  • yingfeng
  • yuzhichang
  • JinHai-CN

Poem

🐰 I hopped through types and pipeline threads,
I coaxed the KG to wake from beds,
A Dify door now greets each query,
Tests hum bright and routes not weary,
The rabbit ships — small, fast, and merry.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.49% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The PR description is well-structured with Summary, Changes, and Files sections, but does not follow the template with 'What problem does this PR solve?' and 'Type of change' checkboxes. Consider structuring the description to match the template: add a 'What problem does this PR solve?' section with background context, and check the appropriate 'Type of change' checkbox (likely 'New Feature').
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: Dify-compatible retrieval API endpoint' accurately and concisely describes the main change: adding a new Dify-compatible retrieval API endpoint.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

xugangqiang and others added 10 commits June 5, 2026 18:02
- Add DifyRetrievalHandler with interfaces for testability
- POST/GET /api/v1/dify/retrieval endpoint (matching Python impl)
- GET /api/v1/dify/retrieval/health endpoint
- Wire handler + routes in router.go and server_main.go
- Add SetChatModel/SetEmbModel setters to KGSearchPipeline
- 12 tests covering all paths (success, errors, KG, meta filter)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Fix metadata condition JSON field mapping (name→Key, comparison_operator→Op)
- Extract top_k/score_threshold from GET query params manually
- Return 500 on KG chat model fetch failure (instead of silent skip)
- Default metadata condition logic to "and" when omitted
- Remove extra "message" field from health check response

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Option A — delete unused interface. KG pipeline is created inline
where needed (6 lines), no polymorphism required.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Move Pipeline, scoring, search, types, helpers to internal/service/kg/
- Keep backward-compat type aliases and function wrappers in service package
- Update handler to import kg package directly
- Fix test package boundaries (consistent package kg)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Pipeline (was KGSearchPipeline)
- Option (was KGSearchOption)
- NewPipeline (was NewKGSearchPipeline)
- WithSimThreshold / WithDenseTopK (was WithKG*)
- Retrieval (was KGSearchRetrieval)
- defaultSimThreshold / defaultDenseTopK (was defaultKG*)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Top and SimilarityThreshold now passed as nil when user omits them
- Service layer nil-guard handles defaults (1024, 0.0, 0.3)
- Remove redundant default-setting in handler
- Single source of truth for defaults in nlp/service layer
@xugangqiang xugangqiang force-pushed the feat/dify-retrieval-api branch from 7b0a9a6 to ff8b4df Compare June 5, 2026 10:03
@xugangqiang xugangqiang marked this pull request as ready for review June 5, 2026 10:03
@dosubot dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Jun 5, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
internal/service/kg/search.go (1)

97-103: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Avoid nil dereference in embedding path (embModel.ModelDriver).

At Line 102, embModel.ModelDriver.Embed(...) is called without verifying ModelDriver is non-nil. This can panic when an embedding model record is loaded but driver initialization failed/skipped.

🔧 Proposed fix
-func buildKGDenseExpr(embModel *modelModule.EmbeddingModel, question string, topN int) (*types.MatchDenseExpr, error) {
-	if embModel == nil || question == "" {
+func buildKGDenseExpr(embModel *modelModule.EmbeddingModel, question string, topN int) (*types.MatchDenseExpr, error) {
+	if embModel == nil || embModel.ModelDriver == nil || question == "" {
 		return nil, nil
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/service/kg/search.go` around lines 97 - 103, buildKGDenseExpr may
panic because it calls embModel.ModelDriver.Embed without checking ModelDriver;
update buildKGDenseExpr to validate that embModel.ModelDriver is non-nil (and
return a sensible nil/err) before calling Embed, e.g., check embModel != nil &&
embModel.ModelDriver != nil and return an error or nil result when the driver is
absent, ensuring the function handles that case gracefully.
internal/service/kg/pipeline.go (2)

195-202: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Include _score in KG search projections before score filtering.

At Line 199 and Line 261, SelectFields omits _score, but Line 215 and Line 277 immediately filter by score. If the engine does not return score fields by default, all rows can be filtered out incorrectly.

🔧 Proposed fix
- SelectFields: []string{"entity_kwd", "entity_type_kwd", "rank_flt", "content_with_weight", "n_hop_with_weight"},
+ SelectFields: []string{"entity_kwd", "entity_type_kwd", "rank_flt", "content_with_weight", "n_hop_with_weight", "_score"},
...
- SelectFields: []string{"from_entity_kwd", "to_entity_kwd", "weight_int", "content_with_weight"},
+ SelectFields: []string{"from_entity_kwd", "to_entity_kwd", "weight_int", "content_with_weight", "_score"},

Also applies to: 257-264

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/service/kg/pipeline.go` around lines 195 - 202, In
Pipeline.searchEntities the SelectFields for KG searches (e.g., the entsReq and
the other SearchRequest used later) omit "_score" but the code immediately
filters results by score; add "_score" to the SelectFields slices so the engine
returns score values before applying the score-based filtering in searchEntities
(update the SelectFields in the SearchRequest instances referenced in
searchEntities to include "_score").

177-183: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Synthetic KG chunk currently violates downstream record contract.

At Line 181, doc_id is empty. In internal/handler/dify_retrieval_handler.go (Line 326-359), records are skipped when doc_id cannot resolve to a document, so KG content can be silently dropped from API responses.

Please align the contract across layers: either (a) special-case synthetic KG chunks in handler serialization, or (b) attach a resolvable synthetic document contract.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/service/kg/pipeline.go` around lines 177 - 183, The synthetic KG
chunk map currently leaves "doc_id" empty causing records to be dropped by the
handler; fix by assigning a resolvable synthetic document id when building the
chunk in pipeline.go (populate the "doc_id" key using a stable synthetic pattern
tied to p.kbIDs and the chunk identifier, e.g. a "kg:<kb_id>:<chunk_id>" style),
and ensure any document-metadata fields expected downstream (e.g., "docnm_kwd",
"kb_id", and the chunk id key) are set consistently so the handler's resolution
logic (in dify_retrieval_handler.go) will treat the KG chunk as a valid document
rather than skipping it.
internal/service/kg/retrieval.go (1)

235-242: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Guard chatModel.ModelDriver before invoking chat completion.

At Line 241, chatModel.ModelDriver.ChatWithMessages(...) is called without checking chatModel.ModelDriver != nil. A partially initialized model can panic this path.

🔧 Proposed fix
-if chatModel != nil && chatModel.ModelName != nil && chatModel.APIConfig != nil {
+if chatModel != nil && chatModel.ModelDriver != nil && chatModel.ModelName != nil && chatModel.APIConfig != nil {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/service/kg/retrieval.go` around lines 235 - 242, The call to
chatModel.ModelDriver.ChatWithMessages can panic if chatModel.ModelDriver is
nil; before invoking ChatWithMessages (in the block checking chatModel != nil &&
chatModel.ModelName != nil && chatModel.APIConfig != nil) add a nil-check for
chatModel.ModelDriver and return/handle the error path if it's nil, e.g. log or
return an error indicating the model driver is uninitialized; ensure you
reference the same symbols (chatModel, ModelDriver, ChatWithMessages, ModelName,
APIConfig) when adding the guard so the code only calls ChatWithMessages on a
non-nil ModelDriver.
🧹 Nitpick comments (1)
internal/handler/dify_retrieval_handler_test.go (1)

322-337: ⚡ Quick win

Strengthen KG-path assertions in TestDifyRetrieval_UseKG.

This test only checks status code, so KG integration can regress without test failure. Assert that records includes the KG-prepended content when use_kg=true.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/handler/dify_retrieval_handler_test.go` around lines 322 - 337,
Update TestDifyRetrieval_UseKG to parse the JSON response body and assert the
returned retrieval records include the KG-prepended content when use_kg=true:
after calling r.ServeHTTP, unmarshal w.Body into a response struct (e.g., with
Records []struct{ Content string `json:"content"` }) and add an assertion that
the first record's Content contains the KG label produced by your mock (for
example contains "tag_1" or the expected KG prefix), referencing
TestDifyRetrieval_UseKG, mockMetadataService and its labelQuestionFn to locate
where the KG label is defined.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/handler/dify_retrieval_handler_test.go`:
- Around line 262-277: TestDifyRetrieval_NoAuth mounts a stub that always
returns 401 so it never exercises DifyRetrievalHandler.Retrieval or GetUser;
replace the stub route with the real handler (register
DifyRetrievalHandler.Retrieval on POST "/api/v1/dify/retrieval") and arrange the
auth path to fail by mocking/stubbing GetUser (or the auth middleware used by
Retrieval) to return an unauthorized error for the request; then send the same
POST and assert w.Code == http.StatusUnauthorized so the actual handler/auth
logic is exercised.

In `@internal/handler/dify_retrieval_handler.go`:
- Around line 317-324: The current logic swallows errors from docDAO.GetByIDs
and drops retrieval rows when a document isn't hydrated; change it so GetByIDs
failures are returned as errors (do not convert them to empty results) and when
iterating chunks (the loop that uses docMap and checks docMap[d.ID]) do not
unconditionally skip rows if the document is missing—preserve the retrieval row
(e.g., keep doc nil or attach a placeholder) so KG rows without doc_id still
pass through; refer to allDocIDs, h.docDAO.GetByIDs, and docMap to locate and
implement these changes.
- Around line 174-188: The current parsing of c.Query("top_k") and
c.Query("score_threshold") in the handler silently ignores parse errors and
allows invalid values; update the parsing logic around c.Query("top_k")
(strconv.Atoi) and c.Query("score_threshold") (strconv.ParseFloat) to validate
inputs and return a 400 error on invalid values instead of silently skipping
them: for TopK ensure parsed > 0 (and reject non-positive integers) before
assigning to req.RetrievalSetting.TopK, and for ScoreThreshold ensure the parsed
float is within an acceptable range (e.g. 0.0–1.0) before assigning to
req.RetrievalSetting.ScoreThreshold; use the handler's error response path (e.g.
c.JSON/c.AbortWithStatusJSON) to surface clear validation messages when strconv
parsing fails or values are out of range.
- Around line 202-205: The current handler treats any error from h.kbSvc.GetByID
as a 404; change the logic in the retrieval block that calls h.kbSvc.GetByID so
that if err != nil you return a 500 (use c.JSON with
http.StatusInternalServerError and an appropriate message/log) and only return
404 when kb == nil (keeping common.CodeNotFound for that branch); update the
error response for the 500 path to include minimal context (e.g., "Failed to
fetch Knowledgebase") while preserving the 404 branch for truly missing
resources.

In `@internal/service/kg/retrieval.go`:
- Around line 175-186: kgRelationFromChunk currently leaves KGRelation.Sim
unset, causing downstream scoring (FuseRelationScores/SortAndTrimRelations) to
zero out text-matched relations; update kgRelationFromChunk to read similarity
from the chunk keys "_score" or "score" (handle numeric types like float64 and
int) and assign that value to r.Sim after parsing PageRank, ensuring the
function sets KGRelation.Sim from chunk["_score"] or chunk["score"] when
present.

---

Outside diff comments:
In `@internal/service/kg/pipeline.go`:
- Around line 195-202: In Pipeline.searchEntities the SelectFields for KG
searches (e.g., the entsReq and the other SearchRequest used later) omit
"_score" but the code immediately filters results by score; add "_score" to the
SelectFields slices so the engine returns score values before applying the
score-based filtering in searchEntities (update the SelectFields in the
SearchRequest instances referenced in searchEntities to include "_score").
- Around line 177-183: The synthetic KG chunk map currently leaves "doc_id"
empty causing records to be dropped by the handler; fix by assigning a
resolvable synthetic document id when building the chunk in pipeline.go
(populate the "doc_id" key using a stable synthetic pattern tied to p.kbIDs and
the chunk identifier, e.g. a "kg:<kb_id>:<chunk_id>" style), and ensure any
document-metadata fields expected downstream (e.g., "docnm_kwd", "kb_id", and
the chunk id key) are set consistently so the handler's resolution logic (in
dify_retrieval_handler.go) will treat the KG chunk as a valid document rather
than skipping it.

In `@internal/service/kg/retrieval.go`:
- Around line 235-242: The call to chatModel.ModelDriver.ChatWithMessages can
panic if chatModel.ModelDriver is nil; before invoking ChatWithMessages (in the
block checking chatModel != nil && chatModel.ModelName != nil &&
chatModel.APIConfig != nil) add a nil-check for chatModel.ModelDriver and
return/handle the error path if it's nil, e.g. log or return an error indicating
the model driver is uninitialized; ensure you reference the same symbols
(chatModel, ModelDriver, ChatWithMessages, ModelName, APIConfig) when adding the
guard so the code only calls ChatWithMessages on a non-nil ModelDriver.

In `@internal/service/kg/search.go`:
- Around line 97-103: buildKGDenseExpr may panic because it calls
embModel.ModelDriver.Embed without checking ModelDriver; update buildKGDenseExpr
to validate that embModel.ModelDriver is non-nil (and return a sensible nil/err)
before calling Embed, e.g., check embModel != nil && embModel.ModelDriver != nil
and return an error or nil result when the driver is absent, ensuring the
function handles that case gracefully.

---

Nitpick comments:
In `@internal/handler/dify_retrieval_handler_test.go`:
- Around line 322-337: Update TestDifyRetrieval_UseKG to parse the JSON response
body and assert the returned retrieval records include the KG-prepended content
when use_kg=true: after calling r.ServeHTTP, unmarshal w.Body into a response
struct (e.g., with Records []struct{ Content string `json:"content"` }) and add
an assertion that the first record's Content contains the KG label produced by
your mock (for example contains "tag_1" or the expected KG prefix), referencing
TestDifyRetrieval_UseKG, mockMetadataService and its labelQuestionFn to locate
where the KG label is defined.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ca6403fe-063c-4df6-ab55-c4a17fbbb554

📥 Commits

Reviewing files that changed from the base of the PR and between ea79d65 and ff8b4df.

📒 Files selected for processing (13)
  • cmd/server_main.go
  • internal/handler/dify_retrieval_handler.go
  • internal/handler/dify_retrieval_handler_test.go
  • internal/router/router.go
  • internal/service/kg/pipeline.go
  • internal/service/kg/retrieval.go
  • internal/service/kg/retrieval_test.go
  • internal/service/kg/scoring.go
  • internal/service/kg/search.go
  • internal/service/kg/search_test.go
  • internal/service/kg/testutil_test.go
  • internal/service/kg/types.go
  • internal/service/kg_scoring_funcs_test.go
💤 Files with no reviewable changes (1)
  • internal/service/kg_scoring_funcs_test.go

Comment thread internal/handler/dify_retrieval_handler_test.go
Comment on lines +174 to +188
if v := c.Query("top_k"); v != "" {
if parsed, err := strconv.Atoi(v); err == nil {
if req.RetrievalSetting == nil {
req.RetrievalSetting = &difyRetrievalSetting{}
}
req.RetrievalSetting.TopK = &parsed
}
}
if v := c.Query("score_threshold"); v != "" {
if parsed, err := strconv.ParseFloat(v, 64); err == nil {
if req.RetrievalSetting == nil {
req.RetrievalSetting = &difyRetrievalSetting{}
}
req.RetrievalSetting.ScoreThreshold = &parsed
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate top_k / score_threshold instead of silently accepting invalid inputs.

Line 175 and Line 183 ignore parse errors, and Line 224 accepts non-positive top_k. This can send invalid pagination/scoring parameters downstream with unpredictable behavior.

Proposed fix
 		if v := c.Query("top_k"); v != "" {
-			if parsed, err := strconv.Atoi(v); err == nil {
-				if req.RetrievalSetting == nil {
-					req.RetrievalSetting = &difyRetrievalSetting{}
-				}
-				req.RetrievalSetting.TopK = &parsed
-			}
+			parsed, err := strconv.Atoi(v)
+			if err != nil || parsed <= 0 {
+				c.JSON(http.StatusBadRequest, gin.H{"code": common.CodeArgumentError, "message": "top_k must be a positive integer"})
+				return
+			}
+			if req.RetrievalSetting == nil {
+				req.RetrievalSetting = &difyRetrievalSetting{}
+			}
+			req.RetrievalSetting.TopK = &parsed
 		}
 		if v := c.Query("score_threshold"); v != "" {
-			if parsed, err := strconv.ParseFloat(v, 64); err == nil {
-				if req.RetrievalSetting == nil {
-					req.RetrievalSetting = &difyRetrievalSetting{}
-				}
-				req.RetrievalSetting.ScoreThreshold = &parsed
-			}
+			parsed, err := strconv.ParseFloat(v, 64)
+			if err != nil {
+				c.JSON(http.StatusBadRequest, gin.H{"code": common.CodeArgumentError, "message": "score_threshold must be a number"})
+				return
+			}
+			if req.RetrievalSetting == nil {
+				req.RetrievalSetting = &difyRetrievalSetting{}
+			}
+			req.RetrievalSetting.ScoreThreshold = &parsed
 		}

Also applies to: 223-225

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/handler/dify_retrieval_handler.go` around lines 174 - 188, The
current parsing of c.Query("top_k") and c.Query("score_threshold") in the
handler silently ignores parse errors and allows invalid values; update the
parsing logic around c.Query("top_k") (strconv.Atoi) and
c.Query("score_threshold") (strconv.ParseFloat) to validate inputs and return a
400 error on invalid values instead of silently skipping them: for TopK ensure
parsed > 0 (and reject non-positive integers) before assigning to
req.RetrievalSetting.TopK, and for ScoreThreshold ensure the parsed float is
within an acceptable range (e.g. 0.0–1.0) before assigning to
req.RetrievalSetting.ScoreThreshold; use the handler's error response path (e.g.
c.JSON/c.AbortWithStatusJSON) to surface clear validation messages when strconv
parsing fails or values are out of range.

Comment thread internal/handler/dify_retrieval_handler.go
Comment thread internal/handler/dify_retrieval_handler.go
Comment thread internal/service/kg/retrieval.go Outdated
@codecov

codecov Bot commented Jun 5, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.16%. Comparing base (aab01af) to head (ef984a1).
⚠️ Report is 6 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #15704   +/-   ##
=======================================
  Coverage   93.16%   93.16%           
=======================================
  Files          10       10           
  Lines         717      717           
  Branches      118      118           
=======================================
  Hits          668      668           
  Misses         29       29           
  Partials       20       20           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Otherwise FuseRelationScores and SortAndTrimRelations multiply zero,
making all relations score-zero regardless of relevance.
- Weight int was never read downstream; SortAndTrimRelations uses Sim*PageRank
- Python has no Weight field either; weight_int is used as pagerank directly
- ParseKGRelationChunks now sets PageRank from weight_int (consistent with kgRelationFromChunk)
- Tests updated to assert PageRank instead of Weight
- Use errors.Is(err, gorm.ErrRecordNotFound) to differentiate
- Add TestDifyRetrieval_KBDBError for the 500 path
- Update KBNotFound test to use gorm.ErrRecordNotFound
- GetByIDs error now returns 500 instead of silently dropping records
- Add TestDifyRetrieval_DocLoadError
- Add setupDifyTestNoAuth() without user middleware
- Test exercises GetUser → CodeUnauthorized → 401
- Replaces broken stub that bypassed the handler entirely
- SearchKGEntities → SearchEntities
- ParseKGEntityChunks → ParseEntityChunks
- BuildKGContent → BuildContent
- kgEntityFromChunk → entityFromChunk
- searchKGTypeSamples → searchTypeSamples
- etc. (15 renames total)

Package is already kg, so kg.SearchKGEntities stutters.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/service/kg/pipeline.go (1)

68-77: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the constructor docs to match the actual API.

Line 76 points callers to WithChatModel/WithEmbModel, but this type only exposes SetChatModel and SetEmbModel, and the fields are unexported. That comment currently documents a usage path that does not exist.

✏️ Suggested diff
-// chatModel and embModel should be set via WithChatModel/WithEmbModel setters
-// or passed directly after construction.
+// chatModel and embModel can be attached after construction via
+// SetChatModel and SetEmbModel.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/service/kg/pipeline.go` around lines 68 - 77, The constructor docs
for NewPipeline incorrectly reference WithChatModel/WithEmbModel; update the
comment to match the actual API by replacing that reference with SetChatModel
and SetEmbModel (or explain that chatModel and embModel must be set via the
SetChatModel/SetEmbModel methods or passed through the provided options), and
note that the model fields are unexported so callers must use those setter
methods; ensure the symbols NewPipeline, SetChatModel, and SetEmbModel are
mentioned so readers can locate the correct API.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@internal/service/kg/pipeline.go`:
- Around line 68-77: The constructor docs for NewPipeline incorrectly reference
WithChatModel/WithEmbModel; update the comment to match the actual API by
replacing that reference with SetChatModel and SetEmbModel (or explain that
chatModel and embModel must be set via the SetChatModel/SetEmbModel methods or
passed through the provided options), and note that the model fields are
unexported so callers must use those setter methods; ensure the symbols
NewPipeline, SetChatModel, and SetEmbModel are mentioned so readers can locate
the correct API.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2d8ed88e-f724-4a20-95dd-85e92bc3b5ba

📥 Commits

Reviewing files that changed from the base of the PR and between 938e489 and 936bfde.

📒 Files selected for processing (8)
  • internal/handler/dify_retrieval_handler.go
  • internal/handler/dify_retrieval_handler_test.go
  • internal/service/kg/pipeline.go
  • internal/service/kg/retrieval.go
  • internal/service/kg/retrieval_test.go
  • internal/service/kg/scoring.go
  • internal/service/kg/search.go
  • internal/service/kg/search_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • internal/handler/dify_retrieval_handler.go
  • internal/handler/dify_retrieval_handler_test.go
  • internal/service/kg/retrieval.go
  • internal/service/kg/search.go

- Add RetrievalTest method to SearchbotHandler
- Thin handler around chunkService.RetrievalTest (logic already implemented)
- 7 tests covering success, errors, auth, and not_found paths

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
internal/handler/searchbot_test.go (1)

45-63: ⚡ Quick win

Strengthen success test by asserting request mapping into RetrievalTestRequest.

Current tests validate status outcomes but not that input fields are correctly passed to chunkSvc. Capture the incoming req in mockChunkService and assert key mappings (e.g., kb_id, question, meta_data_filter, pagination/top-k) to catch contract regressions.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/handler/searchbot_test.go` around lines 45 - 63, Update
TestSearchbotsRetrieval_Basic to capture and assert the request object passed
into the mockChunkService: modify the test's mockChunkService implementation to
store the incoming RetrievalTestRequest (or the parameter type used by
chunkSvc.Search/RunRetrievalTest) into a local variable, call the handler
(TestSearchbotsRetrieval_Basic), then assert that the captured request's fields
(kb_id, question, meta_data_filter, pagination/top_k) match the JSON input;
reference the mockChunkService, RetrievalTestRequest type, and chunkSvc call
used in the handler to locate where to capture and assert the mapping.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/handler/searchbot.go`:
- Around line 217-218: Add a nil guard for h.chunkSvc before calling
h.chunkSvc.RetrievalTest; if h.chunkSvc is nil return a clear error (or wrapped
error) instead of invoking RetrievalTest to avoid panics. Locate the call to
h.chunkSvc.RetrievalTest(svcReq, user.ID) in the handler method in searchbot.go,
check if h.chunkSvc == nil and return an appropriate error (e.g., fmt.Errorf or
the handler's standard error type) explaining the missing service, otherwise
proceed to call RetrievalTest.
- Around line 217-224: The handler's error branch for h.chunkSvc.RetrievalTest
currently returns err.Error() to the client which leaks internals; change this
to return a stable generic message (e.g. "internal server error" or "failed to
retrieve chunk") with the common.CodeServerError, and write the full err (and
any contextual fields like user.ID and svcReq) to the server log instead of the
response. Locate the error handling after the call to h.chunkSvc.RetrievalTest
in the searchbot handler and replace the c.JSON call that exposes err.Error()
with a generic client message while using the service/logger available in the
handler (e.g. h.logger, log, or processLogger) to record the detailed error.
- Around line 57-75: The SearchbotRetrievalTestRequest struct accepts a
Highlight field from the client request, but when constructing the service
request (svcReq), this field is not being forwarded. Locate where svcReq is
built and add the Highlight field from the request object to ensure it is
properly passed to the service layer, maintaining consistency with other request
parameters being forwarded.

---

Nitpick comments:
In `@internal/handler/searchbot_test.go`:
- Around line 45-63: Update TestSearchbotsRetrieval_Basic to capture and assert
the request object passed into the mockChunkService: modify the test's
mockChunkService implementation to store the incoming RetrievalTestRequest (or
the parameter type used by chunkSvc.Search/RunRetrievalTest) into a local
variable, call the handler (TestSearchbotsRetrieval_Basic), then assert that the
captured request's fields (kb_id, question, meta_data_filter, pagination/top_k)
match the JSON input; reference the mockChunkService, RetrievalTestRequest type,
and chunkSvc call used in the handler to locate where to capture and assert the
mapping.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 959a53e7-f1bb-457a-a631-6bc89db44dbf

📥 Commits

Reviewing files that changed from the base of the PR and between 936bfde and c12f227.

📒 Files selected for processing (3)
  • internal/handler/searchbot.go
  • internal/handler/searchbot_test.go
  • internal/router/router.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/router/router.go

Comment thread internal/handler/searchbot.go Outdated
Comment on lines +57 to +75
// SearchbotRetrievalTestRequest is the request body for POST /api/v1/searchbots/retrieval_test.
type SearchbotRetrievalTestRequest struct {
KbIDs []string `json:"kb_id" binding:"required"`
Question string `json:"question" binding:"required"`
Page *int `json:"page,omitempty"`
Size *int `json:"size,omitempty"`
DocIDs []string `json:"doc_ids,omitempty"`
UseKG *bool `json:"use_kg,omitempty"`
TopK *int `json:"top_k,omitempty"`
CrossLanguages []string `json:"cross_languages,omitempty"`
SearchID *string `json:"search_id,omitempty"`
MetaDataFilter *map[string]interface{} `json:"meta_data_filter,omitempty"`
TenantRerankID *string `json:"tenant_rerank_id,omitempty"`
RerankID *string `json:"rerank_id,omitempty"`
Keyword *bool `json:"keyword,omitempty"`
SimilarityThreshold *float64 `json:"similarity_threshold,omitempty"`
VectorSimilarityWeight *float64 `json:"vector_similarity_weight,omitempty"`
Highlight *bool `json:"highlight,omitempty"`
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

highlight is accepted in request but never forwarded to service.

The handler binds highlight in SearchbotRetrievalTestRequest (Line 74), but svcReq construction does not include it (Line 199 onward). This silently drops client input and breaks request-contract expectations.

Also applies to: 199-215

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/handler/searchbot.go` around lines 57 - 75, The
SearchbotRetrievalTestRequest struct accepts a Highlight field from the client
request, but when constructing the service request (svcReq), this field is not
being forwarded. Locate where svcReq is built and add the Highlight field from
the request object to ensure it is properly passed to the service layer,
maintaining consistency with other request parameters being forwarded.

Comment thread internal/handler/searchbot.go Outdated
Comment on lines +217 to +218
result, err := h.chunkSvc.RetrievalTest(svcReq, user.ID)
if err != nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add a nil guard for chunkSvc before invoking it.

h.chunkSvc.RetrievalTest(...) is called unconditionally. If wiring ever passes a nil service, this path will panic instead of returning a controlled error.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/handler/searchbot.go` around lines 217 - 218, Add a nil guard for
h.chunkSvc before calling h.chunkSvc.RetrievalTest; if h.chunkSvc is nil return
a clear error (or wrapped error) instead of invoking RetrievalTest to avoid
panics. Locate the call to h.chunkSvc.RetrievalTest(svcReq, user.ID) in the
handler method in searchbot.go, check if h.chunkSvc == nil and return an
appropriate error (e.g., fmt.Errorf or the handler's standard error type)
explaining the missing service, otherwise proceed to call RetrievalTest.

Comment thread internal/handler/searchbot.go Outdated
Comment on lines +217 to +224
result, err := h.chunkSvc.RetrievalTest(svcReq, user.ID)
if err != nil {
if strings.Contains(err.Error(), "not_found") {
c.JSON(http.StatusNotFound, gin.H{"code": common.CodeNotFound, "message": "No chunk found! Check the chunk status please!"})
return
}
c.JSON(http.StatusInternalServerError, gin.H{"code": common.CodeServerError, "message": err.Error()})
return

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid exposing internal error details in API responses.

Line 223 returns err.Error() directly to clients. That can leak backend internals. Return a stable generic message and log the detailed error server-side.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/handler/searchbot.go` around lines 217 - 224, The handler's error
branch for h.chunkSvc.RetrievalTest currently returns err.Error() to the client
which leaks internals; change this to return a stable generic message (e.g.
"internal server error" or "failed to retrieve chunk") with the
common.CodeServerError, and write the full err (and any contextual fields like
user.ID and svcReq) to the server log instead of the response. Locate the error
handling after the call to h.chunkSvc.RetrievalTest in the searchbot handler and
replace the c.JSON call that exposes err.Error() with a generic client message
while using the service/logger available in the handler (e.g. h.logger, log, or
processLogger) to record the detailed error.

@xugangqiang xugangqiang requested a review from JinHai-CN June 5, 2026 13:13
@yingfeng yingfeng merged commit 5a04ac0 into infiniflow:main Jun 5, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci Continue Integration size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants