feat[Go]: implement POST /api/v1/files/link-to-datasets by hunnyboy1217 · Pull Request #15674 · infiniflow/ragflow

hunnyboy1217 · 2026-06-04T14:25:07Z

What problem does this PR solve?

Closes #15673 — ports the Python file2document_api.py convert() endpoint to Go.

Method	Path	Handler
POST	`/api/v1/files/link-to-datasets`	`FileHandler.LinkToDatasets`

Type of change

New Feature (non-breaking change which adds functionality)

Implementation notes

Files changed:

internal/service/file2document.go  – new service (File2DocumentService)
internal/dao/file2document.go      – added Create method
internal/handler/file.go           – FileHandler gains file2DocumentService;
                                     LinkToDatasets HTTP handler
internal/router/router.go          – route registered

Functional parity table:

Concern	Go behaviour
Required fields	`file_ids` and `kb_ids` both required; missing either → `CodeDataError` mirroring Python `@validate_request`
File existence	`fileDAO.GetByIDs(fileIDs)` builds a set; any missing ID → `"File not found!"`
KB existence	`kbDAO.GetByID(kbID)` per KB; missing → `"Can't find this dataset!"`
Folder expansion	`getAllInnermostFileIDs` recursively calls `fileDAO.ListByParentID` — mirrors `FileService.get_all_innermost_file_ids`
File permissions	`checkFileTeamPermission`: `file.TenantID == userID` OR user in tenant's team — mirrors `check_file_team_permission`
KB permissions	`checkKBTeamPermission`: `kb.TenantID == userID` OR user in tenant's team — mirrors `check_kb_team_permission`
Fire-and-forget	`go convertFiles(...)` goroutine after all validation passes — mirrors `loop.run_in_executor(None, _convert_files, …)`
Conversion	`convertFiles`: for each file → delete existing mappings + hard-delete old documents → create new `Document` in each target KB → create `File2Document` mapping — mirrors Python `_convert_files`
`getParser`	Extension-based lookup with fallback to `kb.ParserID` — mirrors `FileService.get_parser`
Immediate return	`true` returned to caller as soon as goroutine is scheduled

Ports Python file2document_api.py convert() endpoint to Go, closing issue infiniflow#15673. POST /api/v1/files/link-to-datasets (FileHandler.LinkToDatasets) New files: internal/service/file2document.go – File2DocumentService with: LinkToDatasets – validates files/KBs, expands folders, checks team permissions, schedules _convertFiles goroutine convertFiles – deletes existing docs+mappings per file, then creates fresh documents in each target KB and new mappings getAllInnermostFileIDs – recursive folder expansion (mirrors Python FileService.get_all_innermost_file_ids) checkFileTeamPermission / checkKBTeamPermission – mirrors Python check_file_team_permission / check_kb_team_permission getParser – maps (file type, extension, kb.parser_id) → parser ID Changed files: internal/dao/file2document.go – added Create method internal/handler/file.go – FileHandler gains file2DocumentService; LinkToDatasets HTTP handler added internal/router/router.go – route registered in /api/v1/files group Behaviour matches Python exactly: - file_ids and kb_ids are required; missing either → error - All files validated before any work begins - All KBs validated before any work begins - Folder IDs expanded to leaf-file IDs recursively - File and KB team permissions checked for every expanded file - Goroutine (fire-and-forget) for the blocking DB work matches Python's loop.run_in_executor pattern - Returns true immediately without waiting for conversion

coderabbitai · 2026-06-04T14:25:26Z

📝 Walkthrough

Walkthrough

Adds File2DocumentService to link files/folders to knowledge bases, expand folders, enforce tenant/team permissions, and asynchronously convert and relink documents; exposes POST /api/v1/files/link-to-datasets, adds File2Document DAO create, and documents insertion/removal operations.

Changes

File-to-Dataset Linking Workflow

Layer / File(s)	Summary
Service errors, type, constructor `internal/service/file2document.go`	Exported sentinel errors; `File2DocumentService` type and `NewFile2DocumentService()` that wires required DAOs and DocumentService.
LinkToDatasets request flow and orchestration `internal/service/file2document.go`	`LinkToDatasets` validates file and KB existence, expands folder inputs to leaf file IDs, deduplicates IDs, enforces per-file and per-KB tenant/team authorization, and launches async `convertFiles`.
convertFiles: remove old mappings, create documents, and remap `internal/service/file2document.go`	For each file: load/remove existing File2Document mappings and documents (via `DocumentService.RemoveDocumentKeepFile`), delete mappings, create new `entity.Document` per target KB with parser/Kb config/location/pipeline, insert via `DocumentService.InsertDocument`, and create new File2Document mappings.
Helpers: folder expansion, auth, parser & dedupe `internal/service/file2document.go`	`getAllInnermostFileIDs` for recursive folder traversal; `checkFileTeamPermission`/`checkKBTeamPermission` for tenant/team checks; `getParser` for extension→parser mapping with KB fallback; `dedupeStrings`.
Document service persistence ops `internal/service/document.go`	`RemoveDocumentKeepFile` deletes document tasks/engine data and the document row while keeping file/storage/mappings; `InsertDocument` adds a document row and atomically increments KB `doc_num` in a transaction.
File2Document DAO Create Method `internal/dao/file2document.go`	Adds `Create` method persisting `*entity.File2Document` via `DB.Create(mapping)`.
HTTP handler and router `internal/handler/file.go`, `internal/router/router.go`	`FileHandler` gains `file2DocumentService` field and initializes it; `LinkToDatasets` handler binds request, validates non-empty `file_ids` and `kb_ids`, delegates to service, maps sentinel errors to `common.CodeDataError` (others to `common.CodeServerError`); router registers `POST /api/v1/files/link-to-datasets`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

infiniflow/ragflow#14011: Adds File2Document DAO methods used by convertFiles to fetch and remove existing mappings.
infiniflow/ragflow#14649: Related changes around /files/link-to-datasets authorization and relinking flow.

Suggested labels

💞 feature, ci, size:XL

Suggested reviewers

Haruko386
JinHai-CN

Poem

🐰 I hop through code with gentle paws,

Files to datasets, I mend the cause.
Folders expand, parsers align,
Documents reborn, knowledge to shine.
Hooray — new links, a tidy sign!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main change: implementing the POST endpoint for linking files to datasets, matching the primary objective of the pull request.
Description check	✅ Passed	The description covers the problem statement, type of change, implementation notes with file structure, and a detailed functional parity table comparing Go to Python behavior.
Linked Issues check	✅ Passed	The PR implements the POST /api/v1/files/link-to-datasets endpoint [`#15673`] with all required components: service, DAO, handler, router integration, folder expansion, permission checks, parser selection, and fire-and-forget conversion model matching Python behavior.
Out of Scope Changes check	✅ Passed	All changes are focused on the file2document endpoint implementation across service, DAO, handler, and router layers with supporting document service methods; no unrelated modifications detected.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/service/file2document.go`:
- Around line 145-223: The convertFiles flow deletes existing document mappings
and documents then recreates documents per-kb without transactional guarantees,
causing inconsistent state if any step fails; modify convertFiles to use a
per-file transactional approach (or an atomic two-phase approach): for each
fileID wrap operations in a DB transaction (or create new documents and
file2document mappings first using documentDAO.Create and
file2DocumentDAO.Create, verify all creates succeeded, then call
documentDAO.Delete and file2DocumentDAO.DeleteByFileID or replace old mappings),
and ensure error handling rolls back the transaction or aborts cleanup on
failure; locate and update the logic around file2DocumentDAO.DeleteByFileID,
documentDAO.Delete, documentDAO.Create, and file2DocumentDAO.Create to implement
the transactional/atomic reorder and proper rollback.
- Around line 94-106: The loop that expands folder IDs (using filesSet,
FileTypeFolder and s.getAllInnermostFileIDs) can produce duplicate leaf IDs when
requests mix folders and direct file IDs; change the collection logic to
deduplicate by using a set (map[string]struct{}) named e.g. seen to track added
IDs and only append to allFileIDs the first time an ID is encountered (for both
inner results from getAllInnermostFileIDs and direct ids), ensuring each leaf
file is scheduled for conversion exactly once.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b79c150d-0fd8-492c-921a-620b2dd65e92

📥 Commits

Reviewing files that changed from the base of the PR and between 96a4166 and 2bfbc00.

📒 Files selected for processing (4)

internal/dao/file2document.go
internal/handler/file.go
internal/router/router.go
internal/service/file2document.go

coderabbitai · 2026-06-04T14:35:58Z

+		mappings, err := s.file2DocumentDAO.GetByFileID(fileID)
+		if err != nil {
+			common.Warn("convertFiles: GetByFileID failed", zap.String("fileID", fileID), zap.Error(err))
+		}
+		for _, m := range mappings {
+			if m.DocumentID == nil {
+				continue
+			}
+			doc, err := s.documentDAO.GetByID(*m.DocumentID)
+			if err != nil || doc == nil {
+				continue
+			}
+			// Get tenant from KB.
+			kb, err := s.kbDAO.GetByID(doc.KbID)
+			if err != nil || kb == nil {
+				continue
+			}
+			// Hard-delete document (ignoring chunk store cleanup for simplicity).
+			if _, err := s.documentDAO.Delete(*m.DocumentID); err != nil {
+				common.Warn("convertFiles: Delete document failed",
+					zap.String("docID", *m.DocumentID), zap.Error(err))
+			}
+		}
+		if err := s.file2DocumentDAO.DeleteByFileID(fileID); err != nil {
+			common.Warn("convertFiles: DeleteByFileID failed", zap.String("fileID", fileID), zap.Error(err))
+		}
+
+		// Get source file.
+		file, err := s.fileDAO.GetByID(fileID)
+		if err != nil || file == nil {
+			continue
+		}
+
+		// Create document + mapping in each target KB.
+		for _, kbID := range kbIDs {
+			kb, ok := func() (*entity.Knowledgebase, bool) {
+				kb, err := s.kbDAO.GetByID(kbID)
+				return kb, err == nil && kb != nil
+			}()
+			if !ok {
+				continue
+			}
+
+			parserID := getParser(file.Type, file.Name, kb.ParserID)
+			suffix := strings.TrimPrefix(filepath.Ext(file.Name), ".")
+			doc := &entity.Document{
+				ID:           common.GenerateUUID(),
+				KbID:         kb.ID,
+				ParserID:     parserID,
+				ParserConfig: entity.JSONMap(kb.ParserConfig),
+				CreatedBy:    userID,
+				Type:         file.Type,
+				Name:         &file.Name,
+				Suffix:       suffix,
+				Size:         file.Size,
+			}
+			if file.Location != nil {
+				doc.Location = file.Location
+			}
+			if kb.PipelineID != nil {
+				doc.PipelineID = kb.PipelineID
+			}
+
+			if err := s.documentDAO.Create(doc); err != nil {
+				common.Warn("convertFiles: Create document failed",
+					zap.String("kbID", kbID), zap.String("fileID", fileID), zap.Error(err))
+				continue
+			}
+
+			mapping := &entity.File2Document{
+				ID:         common.GenerateUUID(),
+				FileID:     &fileID,
+				DocumentID: &doc.ID,
+			}
+			if err := s.file2DocumentDAO.Create(mapping); err != nil {
+				common.Warn("convertFiles: Create file2document mapping failed",
+					zap.String("fileID", fileID), zap.String("docID", doc.ID), zap.Error(err))
+			}
+		}


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Current delete-then-recreate flow can leave inconsistent file/document state.

convertFiles performs destructive deletes with warn-and-continue behavior and no per-file transaction. If any intermediate step fails, you can end up with removed mappings and missing recreated links (or orphaned docs).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/service/file2document.go` around lines 145 - 223, The convertFiles flow deletes existing document mappings and documents then recreates documents per-kb without transactional guarantees, causing inconsistent state if any step fails; modify convertFiles to use a per-file transactional approach (or an atomic two-phase approach): for each fileID wrap operations in a DB transaction (or create new documents and file2document mappings first using documentDAO.Create and file2DocumentDAO.Create, verify all creates succeeded, then call documentDAO.Delete and file2DocumentDAO.DeleteByFileID or replace old mappings), and ensure error handling rolls back the transaction or aborts cleanup on failure; locate and update the logic around file2DocumentDAO.DeleteByFileID, documentDAO.Delete, documentDAO.Create, and file2DocumentDAO.Create to implement the transactional/atomic reorder and proper rollback.

Hz-186

@hunnyboy1217

The Go background conversion directly deletes and creates document rows through DAO calls. Python uses DocumentService.remove_document() and DocumentService.insert(), which update dataset/document counters and clean up related tasks, chunks, metadata, thumbnails, and graph/ index state.

In my local test, Go created the new document and file2document mapping, but the target KB doc_num stayed at 0, confirming the counter update path is skipped.

There are also response-shape mismatches for errors. Missing file_ids returns Python code=101, data=null with the validate_request message, while Go returns code=102, data=false and exposes the Gin binding error. 401 and file-not-found responses also omit/change data.

Please route document removal/insertion through equivalent service logic and align the error response shapes with Python.

…w#15674) Applies the final-review feedback from Hz-186 and CodeRabbit. Hz-186 — counters not updated / error shapes: - convertFiles now routes document removal/insertion through DocumentService instead of raw DAO calls, so KB doc_num/chunk_num/token_num counters are updated (target KB doc_num previously stayed at 0): * new DocumentService.RemoveDocumentKeepFile — mirrors Python remove_document: deletes tasks, doc-store chunks/metadata, and the document row with counter decrement, WITHOUT touching the file record or its mappings (caller clears mappings via delete_by_file_id) * new DocumentService.InsertDocument — mirrors Python insert: creates the document row and increments KB doc_num in a single transaction - error response shapes aligned with Python: * missing file_ids/kb_ids now returns code=101 (ARGUMENT_ERROR), data=null with the validate_request-style message "required argument are missing: <fields>; " (was code=102, data=false) * all error paths use jsonError → data=null (was data=false); file/dataset not-found and no-authorization map to code=102 like Python's get_data_error_result in convert() CodeRabbit — robustness: - no longer propagate raw DAO/runtime errors through the API: the service returns sentinel errors (ErrLinkFileNotFound / ErrLinkDatasetNotFound / ErrLinkNoAuthorization / ErrLinkInternal); internal failures surface a generic "Internal server error." message instead of leaking DB details - deduplicate expanded file IDs after folder expansion so a leaf file reached via both a folder and a direct ID (or overlapping folders) is converted once - InsertDocument wraps the document row + counter bump in one transaction so a failed insert cannot leave a stale counter No route or request/response field changes beyond the corrected error codes.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

internal/service/file2document.go (1)

221-235: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Rollback the new document when mapping creation fails.

InsertDocument commits before file2DocumentDAO.Create(mapping). If the mapping insert fails, this leaves a live document in the KB with doc_num already incremented but no file2document row, so future relinks cannot find and clean it up.

💡 Proposed fix

 			if err := s.file2DocumentDAO.Create(mapping); err != nil {
 				common.Warn("convertFiles: Create file2document mapping failed",
 					zap.String("fileID", fileID), zap.String("docID", doc.ID), zap.Error(err))
+				if delErr := s.documentSvc.DeleteDocument(doc.ID); delErr != nil {
+					common.Warn("convertFiles: rollback orphan document failed",
+						zap.String("docID", doc.ID), zap.Error(delErr))
+				}
+				continue
 			}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/service/file2document.go` around lines 221 - 235, When file2document
mapping creation fails after a successful s.documentSvc.InsertDocument(doc),
delete or rollback the newly created document to avoid orphaned docs: after
s.file2DocumentDAO.Create(mapping) returns an error, call the document removal
API (e.g., s.documentSvc.DeleteDocument or the appropriate delete/remove method)
with doc.ID, log both the mapping create error and any delete error, and ensure
the code continues (or retries) only after attempting the cleanup; update the
existing warn log block to perform this delete using the same doc.ID and keep
the mapping creation and ID generation (entity.File2Document,
common.GenerateUUID) as-is.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/service/document.go`:
- Around line 288-296: In DocumentService.InsertDocument's transaction block,
guard the KB counter update by capturing the update result instead of only
checking Error: call tx.Model(&entity.Knowledgebase{}).Where("id = ?",
doc.KbID).Update("doc_num", gorm.Expr("doc_num + 1")) into a result variable,
return an error and rollback if result.Error != nil, and also return an error
when result.RowsAffected == 0 (indicating no KB row was updated) so the
transaction fails when the KB is missing (follow the same pattern used in
deleteDocRecordWithCounters); reference the tx variable, doc.KbID, and the
Update("doc_num", gorm.Expr("doc_num + 1")) call when making the change.

In `@internal/service/file2document.go`:
- Around line 143-145: Deduplicate the kb IDs before launching the background
goroutine: build a unique slice from the validated kbMap keys (instead of using
raw req.KbIDs) and pass that deduplicated slice into the goroutine that runs the
conversion; update the usage of kbIDs in the anonymous func launched in
file2document.go so it iterates over the unique KB IDs (derived from kbMap) to
avoid creating duplicate documents/mappings.

---

Duplicate comments:
In `@internal/service/file2document.go`:
- Around line 221-235: When file2document mapping creation fails after a
successful s.documentSvc.InsertDocument(doc), delete or rollback the newly
created document to avoid orphaned docs: after
s.file2DocumentDAO.Create(mapping) returns an error, call the document removal
API (e.g., s.documentSvc.DeleteDocument or the appropriate delete/remove method)
with doc.ID, log both the mapping create error and any delete error, and ensure
the code continues (or retries) only after attempting the cleanup; update the
existing warn log block to perform this delete using the same doc.ID and keep
the mapping creation and ID generation (entity.File2Document,
common.GenerateUUID) as-is.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 986c869b-e234-4eb4-95e0-9b61e287a1ad

📥 Commits

Reviewing files that changed from the base of the PR and between 2bfbc00 and bc345fd.

📒 Files selected for processing (3)

internal/handler/file.go
internal/service/document.go
internal/service/file2document.go

🚧 Files skipped from review as they are similar to previous changes (1)

internal/handler/file.go

coderabbitai · 2026-06-05T04:12:23Z

+	return dao.DB.Transaction(func(tx *gorm.DB) error {
+		if err := tx.Create(doc).Error; err != nil {
+			return fmt.Errorf("failed to create document: %w", err)
+		}
+		if err := tx.Model(&entity.Knowledgebase{}).
+			Where("id = ?", doc.KbID).
+			Update("doc_num", gorm.Expr("doc_num + 1")).Error; err != nil {
+			return fmt.Errorf("failed to increment doc_num for KB %s: %w", doc.KbID, err)
+		}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== Document / Knowledgebase model definitions ==" rg -nC2 'type Document struct|type Knowledgebase struct|KbID|foreignKey|constraint' --type=go echo echo "== Migrations / schema references for documents.kb_id ==" rg -nC2 'documents|knowledgebase|kb_id|FOREIGN KEY|foreign key|constraint' --type=go --type=sql

Repository: infiniflow/ragflow

Length of output: 50375

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== 1) Inspect Document entity KbID gorm tags ==" sed -n '1,120p' internal/entity/document.go | cat -n echo echo "== 2) Inspect knowledgebase entity (for delete behavior / tags) ==" sed -n '1,140p' internal/entity/kb.go | cat -n echo echo "== 3) Inspect the function doing tx.Create + KB doc_num increment ==" sed -n '240,340p' internal/service/document.go | cat -n echo echo "== 4) Inspect the test that mentions inserting a doc with a missing KB row ==" rg -n "orphan-doc|no-such-kb|kb_id that has no KB" internal/service/document_test.go # Print around the referenced region without dumping whole file start=680 end=780 sed -n "${start},${end}p" internal/service/document_test.go | cat -n echo echo "== 5) Search repo for actual FK constraints referencing kb_id -> knowledgebase/id ==" rg -n --hidden --glob '*.{sql,psql,postgres,mysql,sqlite,duckdb,migration,ddl}' "FOREIGN KEY" . rg -n --hidden --glob '*.sql' "kb_id[^\\n]*REFERENCES|REFERENCES[^\\n]*kb_id|kb_id[^\\n]*references|references[^\\n]*kb_id" . rg -n --hidden --glob '*.go' "foreignKey|constraint:|Constraints|OnDelete|OnUpdate|references|REFERENCES" internal/entity internal/dao

Repository: infiniflow/ragflow

Length of output: 21725

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== 1) Find setupServiceTestDB definition (DB type + FK enforcement) ==" rg -n "func setupServiceTestDB|setupServiceTestDB\\(" internal/service/document_test.go internal/service -S rg -n "PRAGMA foreign_keys|foreign_keys=on|ForeignKey|constraint" internal/service -S echo echo "== 2) Inspect the test DB setup file(s) ==" # Prefer the file that contains setupServiceTestDB file="$(rg -n "func setupServiceTestDB" -S internal/service | head -n1 | awk -F: '{print $1}')" if [ -n "${file:-}" ]; then echo "Using: $file" sed -n '1,220p' "$file" | cat -n fi echo echo "== 3) Search for migrations/DDL mentioning document.kb_id foreign keys (kb_id -> knowledgebase.id) ==" rg -n --hidden --glob '*.{sql,psql,postgres,mysql,sqlite,migration,ddl}' "document.*kb_id|documents.*kb_id|kb_id.*REFERENCES|REFERENCES.*knowledgebase|knowledgebase.*id" . rg -n --hidden --glob '*.go' "AutoMigrate|Migrate|schema|foreignKey|constraint|OnDelete|OnUpdate" internal -S

Repository: infiniflow/ragflow

Length of output: 13752

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== A) Check if any SQL/DDL/migrations exist in repo ==" fd -t f -i -d 6 -e sql -e ddl -e psql -e postgres -e sqlite -e mysql -e migration -e go -a --max-results 2000 . > /tmp/ddl_files.txt || true wc -l /tmp/ddl_files.txt || true head -n 50 /tmp/ddl_files.txt || true echo echo "== B) Search repo for FK definitions involving kb_id -> knowledgebase/id ==" rg -n --hidden --glob '*.{sql,ddl,migration,psql,postgres,mysql,sqlite}' "kb_id.*REFERENCES|REFERENCES.*kb_id|document.*kb_id|documents.*kb_id|knowledgebase\\.id" . || true echo echo "== C) Inspect deleteDocRecordWithCounters for RowsAffected handling ==" rg -n "deleteDocRecordWithCounters\$" internal/service/document.go sed -n '90,170p' internal/service/document.go | cat -n

Repository: infiniflow/ragflow

Length of output: 6259

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== 1) Show deleteDocRecordWithCounters function body ==" ln=$(rg -n "func \\(s \\*DocumentService\$ deleteDocRecordWithCounters" -n internal/service/document.go | head -n1 | cut -d: -f1) echo "function starts at line: $ln" start=$((ln-20)) end=$((ln+80)) sed -n "${start},${end}p" internal/service/document.go | cat -n echo echo "== 2) Inspect docker init.sql for any FK/constraint on document/kb ==" # Only show relevant lines to keep output small rg -n "document|documents|knowledgebase|knowledgebase\\.id|kb_id|FOREIGN KEY|REFERENCES" docker/init.sql docker/oceanbase/init.d/vec_memory.sql 2>/dev/null || true # broaden to all docker init scripts rg -n "FOREIGN KEY|REFERENCES|documents|knowledgebase|kb_id" docker -S || true

Repository: infiniflow/ragflow

Length of output: 4922

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== 1) Search docker/init.sql (and all docker init scripts) for FK/REFERENCES ==" rg -n "FOREIGN KEY|REFERENCES" docker -S || true echo echo "== 2) Search for document table DDL + kb_id inside init scripts ==" rg -n "CREATE TABLE.*document|CREATE TABLE.*documents|\\bdocument\\b" docker -S || true rg -n "kb_id" docker -S || true echo echo "== 3) Search repo-wide for DDL that mentions knowledgebase/doc_num/documents.kb_id ==" rg -n --hidden --glob '*.{sql,ddl,psql,postgres,mysql,sqlite}' "knowledgebase|doc_num|CREATE TABLE.*document|\\bdocuments\\b|\\bkb_id\\b" . || true

Repository: infiniflow/ragflow

Length of output: 420

Guard the KB doc_num increment with RowsAffected in DocumentService.InsertDocument.
entity.Document.KbID has no gorm FK/constraint tags, and the test suite inserts an “orphan” document (KbID: "no-such-kb") successfully, so the DB-level FK enforcement isn’t reliably present. As a result, the current Update(...).Error can be nil even when the KB row doesn’t exist (0 rows updated), leaving the counters inconsistent. Capture the update result and roll back when RowsAffected == 0 (similar to the deleteDocRecordWithCounters pattern).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/service/document.go` around lines 288 - 296, In DocumentService.InsertDocument's transaction block, guard the KB counter update by capturing the update result instead of only checking Error: call tx.Model(&entity.Knowledgebase{}).Where("id = ?", doc.KbID).Update("doc_num", gorm.Expr("doc_num + 1")) into a result variable, return an error and rollback if result.Error != nil, and also return an error when result.RowsAffected == 0 (indicating no KB row was updated) so the transaction fails when the KB is missing (follow the same pattern used in deleteDocRecordWithCounters); reference the tx variable, doc.KbID, and the Update("doc_num", gorm.Expr("doc_num + 1")) call when making the change.

coderabbitai · 2026-06-05T04:12:23Z

+	// ── 6. Run conversion in background (fire-and-forget) ────────────────────
+	kbIDs := req.KbIDs
+	go func() {


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Deduplicate kb_ids before scheduling conversion.

kbMap collapses duplicates for validation, but the goroutine still receives raw req.KbIDs. A payload like ["kb-a", "kb-a"] will create two documents and two mappings for every file in the same dataset.

💡 Proposed fix

- kbIDs := req.KbIDs + kbIDs := dedupeStrings(req.KbIDs)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/service/file2document.go` around lines 143 - 145, Deduplicate the kb IDs before launching the background goroutine: build a unique slice from the validated kbMap keys (instead of using raw req.KbIDs) and pass that deduplicated slice into the goroutine that runs the conversion; update the usage of kbIDs in the anonymous func launched in file2document.go so it iterates over the unique KB IDs (derived from kbMap) to avoid creating duplicate documents/mappings.

hunnyboy1217 · 2026-06-05T04:21:07Z

Hi, @Hz-186 ,
Updated based on your review!

Hz-186

@hunnyboy1217, LGTM

codecov · 2026-06-05T05:22:48Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.16%. Comparing base (461c190) to head (bc345fd).
⚠️ Report is 17 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main   #15674   +/-   ##
=======================================
  Coverage   93.16%   93.16%           
=======================================
  Files          10       10           
  Lines         717      717           
  Branches      118      118           
=======================================
  Hits          668      668           
  Misses         29       29           
  Partials       20       20

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

hunnyboy1217 · 2026-06-05T07:19:51Z

Hi, @Haruko386,
All CI tests are successfully passed.

hunnyboy1217 · 2026-06-05T07:59:16Z

Hi, @yingfeng , Could you please review this PR?

dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jun 4, 2026

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

Hz-186 suggested changes Jun 5, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 5, 2026

View reviewed changes

Hz-186 approved these changes Jun 5, 2026

View reviewed changes

Hz-186 added the ci Continue Integration label Jun 5, 2026

Haruko386 marked this pull request as draft June 5, 2026 05:01

Haruko386 marked this pull request as ready for review June 5, 2026 05:01

Conversation

hunnyboy1217 commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

Type of change

Implementation notes

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Hz-186 left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

hunnyboy1217 commented Jun 5, 2026

Uh oh!

Hz-186 left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Jun 5, 2026

Codecov Report

Uh oh!

hunnyboy1217 commented Jun 5, 2026

Uh oh!

hunnyboy1217 commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hunnyboy1217 commented Jun 4, 2026 •

edited

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading