Skip to content

feat(ui): KG-scoped data source onboarding (k-extract flow)#737

Open
aredenba-rh wants to merge 143 commits into
mainfrom
feature/manage-knowledge-graph
Open

feat(ui): KG-scoped data source onboarding (k-extract flow)#737
aredenba-rh wants to merge 143 commits into
mainfrom
feature/manage-knowledge-graph

Conversation

@aredenba-rh

Copy link
Copy Markdown
Collaborator

Summary

  • Adds full-page data source onboarding at /knowledge-graphs/{kgId}/data-sources/new (URLs → configure → sequential initial sync → summary), modeled after k-extract designer/new.
  • Adds ongoing operations page at /knowledge-graphs/{kgId}/data-sources (phase1 equivalent) for sync, commits, diff, and maintenance focus.
  • KG manage workspace routes Data Sources to onboarding when dataSourceCount === 0, otherwise to the operations page.
  • Post–KG-create toast navigates to the new onboarding route.

Closes #736

Test plan

  • Create a KG → Manage → Data Sources → lands on /data-sources/new
  • Add GitHub URL(s), configure branch/token, connect → run Start initial sync → see progress and summary
  • Open data sources → operations page with cards, sync history, commit refs
  • Return to manage → Data Sources again → operations page (not wizard)
  • Maintain step → ?focus=maintain filters to maintenance-ready sources
  • Global sidebar /data-sources unchanged

Made with Cursor

aredenba-rh and others added 30 commits May 26, 2026 12:58
* chore(skills): add subagent delivery execution protocol

Add a reusable subagent skill that standardizes issue-based branching,
TDD execution, PR structure, and merge/conflict handling into
feature/manage-knowledge-graph.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(management): add knowledge graph workspace mode lifecycle

Implement schema_bootstrap as the default workspace mode and persist
irreversible transition state to extraction_operations across domain,
repository, API responses, and migration coverage.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
…681)

Add a workspace-status API projection with mode, readiness flags,
transition eligibility, and session pointers, including service and
route authorization coverage for manage workspace rendering.

Co-authored-by: Cursor <cursoragent@cursor.com>
…#682)

Enforce workspace readiness checks for minimum entity/relationship type
coverage and prepopulated type instance presence, and project blocking
reasons so validate/transition workflows can render actionable feedback.

Co-authored-by: Cursor <cursoragent@cursor.com>
Expose authorized validate and transition commands for knowledge graph
workspaces, persist session pointers, and create an extraction-mode
session identifier when moving from bootstrap to extraction operations.

Co-authored-by: Cursor <cursoragent@cursor.com>
Add durable run-level mutation metadata storage and lifecycle persistence
for session/scope identity, timestamps, token-cost totals, and
operation-count summaries linked to each sync run.

Co-authored-by: Cursor <cursoragent@cursor.com>
Emit operation-class counts and token/cost totals from mutation-log
application results into MutationsApplied payloads so downstream sync
lifecycle persistence can finalize run-level metadata.

Co-authored-by: Cursor <cursoragent@cursor.com>
#686)

Scaffold extraction application/presentation package structure and add
pytest-archon rules enforcing DDD layer boundaries plus cross-context
isolation so subsequent extraction features stay architecturally clean.

Co-authored-by: Cursor <cursoragent@cursor.com>
Implement per-user/per-knowledge-graph/per-mode extraction session
lifecycle behaviors with clear-chat reset semantics and archived-session
retention backed by repository ports and unit coverage.

Co-authored-by: Cursor <cursoragent@cursor.com>
Resolve mode-specific extraction skill templates from global defaults and
apply deterministic knowledge-graph override merges so session prompts are
stable, customizable, and repeatable.

Co-authored-by: Cursor <cursoragent@cursor.com>
)

Persist extraction agent sessions and expose scoped APIs for active/list/clear-chat so reset creates a fresh session while preserving archived history and runtime context audit records.

Co-authored-by: Cursor <cursoragent@cursor.com>
Persist clone-head, last-extraction baseline, and tracked-branch head
commit references for data sources and expose them in management API
responses for downstream ingestion and UI commit-status workflows.

Co-authored-by: Cursor <cursoragent@cursor.com>
Prepare Git-backed ingestion context by loading data-source commit references,
refreshing tracked branch head, and passing baseline commit plus resolved
credentials into the ingestion pipeline before packaging begins.

Co-authored-by: Cursor <cursoragent@cursor.com>
# Conflicts:
#	src/api/ingestion/application/services/ingestion_service.py
#	src/api/ingestion/infrastructure/event_handler.py
#	src/api/ingestion/ports/services.py
#	src/api/tests/unit/ingestion/infrastructure/test_ingestion_event_handler.py
Skip heavy extraction when tracked branch head equals the last extraction
baseline by emitting a completed lifecycle event and recording an explicit
no-change audit log entry on the sync run.

Co-authored-by: Cursor <cursoragent@cursor.com>
Expose a data-source diff summary API that compares the last extraction
baseline to tracked branch head and returns aggregate counts plus a
large-list-safe changed-file preview for maintenance decisions.

Co-authored-by: Cursor <cursoragent@cursor.com>
Show commit-based diff counts immediately on each data source card and
render the changed-file list as collapsed-by-default with explicit
expand/collapse controls for large-diff safe browsing.

Co-authored-by: Cursor <cursoragent@cursor.com>
…695)

Add explicit data-source actions to refresh tracked/clone commit references and adopt tracked head as the current extraction baseline. This lets the UI surface per-source changed-file counts with user-controlled commit context updates for maintenance decisioning.

Co-authored-by: Cursor <cursoragent@cursor.com>
Strengthen subagent delivery guidance with a parallel execution model, required context packs, and a blocker-question escalation flow so multiple agents can pause and ask focused questions without serializing delivery.

Co-authored-by: Cursor <cursoragent@cursor.com>
…678) (#697)

Add structured mode-specific agent configuration (system prompt, hierarchy, guardrails, and skill pack defaults) and wire session initialization to resolve and persist the configuration per knowledge graph scope.

Co-authored-by: Cursor <cursoragent@cursor.com>
) (#698)

Seed schema bootstrap sessions with a capabilities-intake prompt that offers first-pass or guided co-design paths, and persist the selected path/capability summary in session runtime context so the conversation remains continuous across requests.

Co-authored-by: Cursor <cursoragent@cursor.com>
…679) (#699)

Build a filesystem runtime context for extraction workloads by materializing ingestion package resources, reconstructing repository files, and exposing a deterministic skills directory path; wire it through extraction event handling and local/deployed container configuration.

Co-authored-by: Cursor <cursoragent@cursor.com>
#700)

Enhance schema browser rows to display prepopulated type indicators and live per-type instance counts with lazy query-backed loading, while extending shared type contracts and tests to cover the new inspector metadata behavior.

Co-authored-by: Cursor <cursoragent@cursor.com>
…671) (#701)

Add manage-authorized run-control operations (start, pause, halt, reset_running, reset_failed, reset_completed, reset_all) over data source sync runs, expose them via dedicated management routes, and verify behavior with unit tests for both service transitions and HTTP contract responses.

Co-authored-by: Cursor <cursoragent@cursor.com>
Expose sync-run token/cost metadata in management API responses and add an extraction telemetry dashboard in the data-sources workspace with active worker counts, status buckets, recent job events, and 24h cost trend indicators backed by auto-refreshing sync data.

Co-authored-by: Cursor <cursoragent@cursor.com>
Add knowledge-graph scoped maintenance schedule APIs with timezone-aware cron evaluation and persisted run outcomes, then expose the controls and history in the data-sources operations UI.

Co-authored-by: Cursor <cursoragent@cursor.com>
…704)

Extend the mutations console with a conversation-assisted draft flow and live entity/relationship inspector that highlights edited fields during the active session and resets highlights after apply/refresh.

Co-authored-by: Cursor <cursoragent@cursor.com>
Replace legacy row actions with Manage, Query, and Delete, remove inline edit controls from the list surface, and align structural tests to the new action contract.

Co-authored-by: Cursor <cursoragent@cursor.com>
)

Implement a dedicated manage workspace route that loads workspace status projection, shows readiness and session pointers, and provides Validate and transition-to-extraction controls.

Co-authored-by: Cursor <cursoragent@cursor.com>
Extend the manage workspace page with an always-visible extraction conversation panel, clear-chat reset action, and a tabbed lower operations area for extraction jobs, manual mutations, and run/log navigation.

Co-authored-by: Cursor <cursoragent@cursor.com>
aredenba-rh and others added 30 commits June 9, 2026 11:58
Replace the stub executor and broken busybox worker loop with opendatahub-io/agentic-ci containers, add by_files materialization with target_files persistence, and wire dev compose for migrations, gcloud ADC, and the ai-helpers image.

Co-authored-by: Cursor <cursoragent@cursor.com>
Stopped kartograph-sticky-* containers kept their names after API reloads, causing docker run name conflicts when reopening Graph Management modes like Extraction Jobs.

Co-authored-by: Cursor <cursoragent@cursor.com>
Graph Management Assistant prompts come from API-resolved agent_configuration, not filesystem SKILL.md mounts. Remove the sticky /app/skills bind, related env vars, and dead skills_dir plumbing from compose and deploy.

Co-authored-by: Cursor <cursoragent@cursor.com>
Expose extraction-jobs read/write on the workload API and wire MCP tools
so the assistant persists approved job set configs instead of directing
operators to fill the UI manually.

Co-authored-by: Cursor <cursoragent@cursor.com>
Default by_instances descriptions to full property and relationship
coverage with explicit per-field notes, and align GMA skills, save tool
guidance, and worker prompts on the same contract.

Co-authored-by: Cursor <cursoragent@cursor.com>
…and edge

Require EntityType -> rel -> CounterpartType lines and forbid theme-only
sections so GMA writes ontology-grounded extraction briefs.

Co-authored-by: Cursor <cursoragent@cursor.com>
Background polls no longer toggle the loading state that hid Job Status
every 1.5s; stale data stays on screen with a subtle header spinner.

Co-authored-by: Cursor <cursoragent@cursor.com>
Enable/disable job sets with partial pending-job sync while runs are active,
individual job cancel with container teardown, clearer regenerate UX, and
relationship ownership rules for per-instance descriptions. Fix agentic-ci
prompt delivery, host-reachable API URLs, GCP mounts, and activity watch parsing.

Co-authored-by: Cursor <cursoragent@cursor.com>
Mirror agentic-ci context/verdict patterns: unpack JobPackages into
repository-files (instance path targeting with full fallback), copy
helpers/workload-mutations.sh, pre-create writable mutations/, and fail
jobs unless mutations/result.json reports operations_applied > 0. Tighten
per-instance description skills with Adapter/Resource/ComponentTest counts.

Co-authored-by: Cursor <cursoragent@cursor.com>
…iptions

Job set descriptions must list counterpart-owned relationships under
'Ignore these relationships:' with IGNORE lines and instance counts.
Raise parallel extraction worker default from 2 to 20. Enforce per-instance
description ownership on save, expose relationship authoring hints in config
API, and keep assistant prompts correct on follow-up turns. Kill and Reset
Running now stop orphaned extraction containers.

Co-authored-by: Cursor <cursoragent@cursor.com>
…lization

Re-fetch ingest-only archives when ZIPs are absent on disk so extraction
jobs and sticky sessions populate repository-files. Gate readiness on archive
presence and inject workload credentials into agentic-ci container env.

Co-authored-by: Cursor <cursoragent@cursor.com>
…pare

Persist successful extraction jobs as archived with mutation history and surface that in graph management. Validate relationship authoring against ontology and merge token/graph-write metrics from JSONL and agent streams. Use tarball-based GitHub full refresh with auth fallback, and order sync runs newest-first so prepare retries show accurate UI state.

Co-authored-by: Cursor <cursoragent@cursor.com>
…aph failures

Add one-command dev DB backup and restore, auto-repair corrupt tenant AGE
graphs, return HTTP 503 for graph storage errors, and update GMA instructions
to smoke-test prepopulation and stop on infrastructure failures.

Co-authored-by: Cursor <cursoragent@cursor.com>
…pply chaining

Add run_scanner.py to combine scan-to-JSONL in one step, enrich readiness tasks
with order/run_command and underscore relationship paths, and return next_action
from apply so agents can chain labels without polling readiness every batch.

Co-authored-by: Cursor <cursoragent@cursor.com>
…one mutation log

Enable CREATE/UPDATE/DELETE in workload validation and tools, accumulate applied
JSONL per assistant session, and write one ARCHIVED extraction job when Clear chat
ends the session so it appears in Extraction Archive history.

Co-authored-by: Cursor <cursoragent@cursor.com>
… usage

Teach the Graph Management Assistant that each relationship UI row needs a
distinct edge_types label, with read-back verification before claiming saves.
Also propagate Claude SDK token/cost metrics into session journals and chat
turn handling for operator visibility.

Co-authored-by: Cursor <cursoragent@cursor.com>
…rhead

Front-load graph_id, property gaps, JSONL examples, and directory-prefix
file materialization so enrichment jobs spend less time probing formats and paths.

Co-authored-by: Cursor <cursoragent@cursor.com>
Implement GMA one-off mutations with session archiving, rename Mutation logs
to Graph Writes History, fix job set labels and cost display, and add a
template-driven manual mutation authoring panel with schema instance views.

Co-authored-by: Cursor <cursoragent@cursor.com>
Drop the session pointers rail item and detail panel from all GMA modes;
session history now lives in Graph Writes History when chat is cleared.

Co-authored-by: Cursor <cursoragent@cursor.com>
…schema explorer

Add bulk instance edit workflow guidance, helpers/sync_instances.py for diff-and-generate JSONL, and clearer list-instances MCP tool docs so agents batch deletes instead of per-slug loops. Replace manage overview type badges with GraphSchemaExplorer and extract reusable entity/relationship type list components.

Co-authored-by: Cursor <cursoragent@cursor.com>
…nagement

Load 100 instances per type instead of a global cap, merge observed properties into schema display, and add paginated instance APIs with property search plus load-more UI on entity and relationship panels.

Co-authored-by: Cursor <cursoragent@cursor.com>
Scope GMA containers and conversations by graph-management UI mode (three
parallel sessions per user/KG), add start/end/clear session APIs, terminate
containers without auto-restart, expire idle sessions after 1 hour, and archive
Graph Writes History only when a closed session has write_ops > 0. Update the
manage UI with Start/End session controls and fix archived write count sourcing.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ckend

Secure GMA agent containers with session-bound /v1/turn auth, Docker
hardening flags, and per-turn workload tokens instead of long-lived env
JWTs. Add OpenShell-backed sticky sessions and extraction jobs with
per-mode network policies, dev compose wiring, and prod manifest stubs.

Co-authored-by: Cursor <cursoragent@cursor.com>
…and manage UX

Move batch extraction to one reusable OpenShell sandbox per worker, route GMA
through inference.local with Vertex effort capping, and add maintain/archive
workspace improvements plus token-efficient partial UPDATE tooling for jobs.

Co-authored-by: Cursor <cursoragent@cursor.com>
…us sync

Repair OpenShell extraction start failures, cap workers at 50 without sandbox
UI noise, and keep recent job events accurate with status filters including archived.

Co-authored-by: Cursor <cursoragent@cursor.com>
…rkers

Split job prepare from long OpenShell execution to avoid pool exhaustion
with high worker counts, scale up live runs on Start, and add Failed filter
to recent job events.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

KG-scoped data source onboarding (k-extract-style full-page flow)

2 participants