Skip to content

refactor: decouple kotaemon from ktem and remove theflow dependency#837

Open
phv2312 wants to merge 17 commits into
feat/revamp-kotaemonfrom
refactor/kotaemon
Open

refactor: decouple kotaemon from ktem and remove theflow dependency#837
phv2312 wants to merge 17 commits into
feat/revamp-kotaemonfrom
refactor/kotaemon

Conversation

@phv2312

@phv2312 phv2312 commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

Description

This PR is a broad architecture refactor that separates core kotaemon primitives from ktem app concerns, removes the remaining theflow-driven plumbing from kotaemon.

  • decoupled kotaemon core logic from ktem application-specific behavior
  • replaced legacy theflow/flowsettings usage with internal typed/configured abstractions
  • split indexing and retrieval into dedicated base + implementation modules
  • decoupled vector store/doc store concerns from retriever/indexing flows
  • reorganized ktem.index into ktem.collections to better reflect file collection ownership
  • split DB models and CRUD logic into dedicated modules under ktem.db.models and ktem.db.cruds
  • added factories/registries for pluggable components across LLMs, embeddings, rerankers, stores, and web search
  • removed deprecated/unused surfaces such as promptui, templates, experimental dirs, and redundant folders
  • add coding agent rules in .agents/

Type of change

  • New features (non-breaking change).
  • Bug fix (non-breaking change).
  • Breaking change (fix or feature that would cause existing functionality not to work as expected).

Checklist

  • I have performed a self-review of my code.
  • I have added thorough tests if it is a core feature.
  • There is a reference to the original bug report and related work.
  • I have commented on my code, particularly in hard-to-understand areas.
  • The feature is well documented.

@phv2312

phv2312 commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator Author


def hash_password(password: str) -> str:
"""Return the SHA-256 hex digest used for stored passwords."""
return hashlib.sha256(password.encode()).hexdigest()
@cin-niko cin-niko changed the base branch from main to feat/revamp-kotaemon June 9, 2026 09:44
@cin-niko cin-niko requested a review from Copilot June 9, 2026 09:44

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

@cin-niko

cin-niko commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Please help fix the github sec scan and Copilot review first
Also, take a look at the failed unit tests



@dataclass(kw_only=True)
class EndpointChatLLM(ChatLLM):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this openai compatible endpoint provider while we can use the OpenAI provider class only

LC_GEMINI = "LCGeminiChat"
LC_COHERE = "LCCohereChat"
LC_OLLAMA = "LCOllamaChat"
LLAMA_CPP = "LlamaCppChat"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need two class for same purpose
ChatOpenAI vs. LC_CHAT_OPENAI
AZURE_CHAT_OPENAI vs. LCAzureChatOpenAI

if self._stream:
return self.stream(messages, **kwargs) # type: ignore
if self.streaming:
return self.stream(messages, **kwargs) # type: ignore[return-value]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should seperate two function for normal run and streaming run

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please help check, I think we could remove the completions module and use the chat completion only?
*Completions API is a legacy endpoint that accepts a single text string.


text = self.inflow.flow().text
return self.__call__(text)
@classmethod

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the purpose of ' describe ' is to show the parameter in the UI only, consider moving the describe logic to ktem and removing the ChatLLM (use BaseLLM only)

if vs_ids and self.VS:
self.VS.delete(vs_ids)
if ds_ids:
self.DS.delete(ds_ids)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please help check the related issue about deleting files: #807

@cin-niko cin-niko changed the title refactor: kotaemon refactor: decouple kotaemon from ktem and remove theflow dependency Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants