feat(ui): KG-scoped data source onboarding (k-extract flow) by aredenba-rh · Pull Request #737 · openshift-hyperfleet/kartograph

aredenba-rh · 2026-05-26T18:26:15Z

Summary

Adds full-page data source onboarding at /knowledge-graphs/{kgId}/data-sources/new (URLs → configure → sequential initial sync → summary), modeled after k-extract designer/new.
Adds ongoing operations page at /knowledge-graphs/{kgId}/data-sources (phase1 equivalent) for sync, commits, diff, and maintenance focus.
KG manage workspace routes Data Sources to onboarding when dataSourceCount === 0, otherwise to the operations page.
Post–KG-create toast navigates to the new onboarding route.

Closes #736

Test plan

Create a KG → Manage → Data Sources → lands on /data-sources/new
Add GitHub URL(s), configure branch/token, connect → run Start initial sync → see progress and summary
Open data sources → operations page with cards, sync history, commit refs
Return to manage → Data Sources again → operations page (not wizard)
Maintain step → ?focus=maintain filters to maintenance-ready sources
Global sidebar /data-sources unchanged

Made with Cursor

* chore(skills): add subagent delivery execution protocol Add a reusable subagent skill that standardizes issue-based branching, TDD execution, PR structure, and merge/conflict handling into feature/manage-knowledge-graph. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(management): add knowledge graph workspace mode lifecycle Implement schema_bootstrap as the default workspace mode and persist irreversible transition state to extraction_operations across domain, repository, API responses, and migration coverage. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>

…681) Add a workspace-status API projection with mode, readiness flags, transition eligibility, and session pointers, including service and route authorization coverage for manage workspace rendering. Co-authored-by: Cursor <cursoragent@cursor.com>

…#682) Enforce workspace readiness checks for minimum entity/relationship type coverage and prepopulated type instance presence, and project blocking reasons so validate/transition workflows can render actionable feedback. Co-authored-by: Cursor <cursoragent@cursor.com>

Expose authorized validate and transition commands for knowledge graph workspaces, persist session pointers, and create an extraction-mode session identifier when moving from bootstrap to extraction operations. Co-authored-by: Cursor <cursoragent@cursor.com>

Add durable run-level mutation metadata storage and lifecycle persistence for session/scope identity, timestamps, token-cost totals, and operation-count summaries linked to each sync run. Co-authored-by: Cursor <cursoragent@cursor.com>

Emit operation-class counts and token/cost totals from mutation-log application results into MutationsApplied payloads so downstream sync lifecycle persistence can finalize run-level metadata. Co-authored-by: Cursor <cursoragent@cursor.com>

#686) Scaffold extraction application/presentation package structure and add pytest-archon rules enforcing DDD layer boundaries plus cross-context isolation so subsequent extraction features stay architecturally clean. Co-authored-by: Cursor <cursoragent@cursor.com>

Implement per-user/per-knowledge-graph/per-mode extraction session lifecycle behaviors with clear-chat reset semantics and archived-session retention backed by repository ports and unit coverage. Co-authored-by: Cursor <cursoragent@cursor.com>

Resolve mode-specific extraction skill templates from global defaults and apply deterministic knowledge-graph override merges so session prompts are stable, customizable, and repeatable. Co-authored-by: Cursor <cursoragent@cursor.com>

) Persist extraction agent sessions and expose scoped APIs for active/list/clear-chat so reset creates a fresh session while preserving archived history and runtime context audit records. Co-authored-by: Cursor <cursoragent@cursor.com>

Persist clone-head, last-extraction baseline, and tracked-branch head commit references for data sources and expose them in management API responses for downstream ingestion and UI commit-status workflows. Co-authored-by: Cursor <cursoragent@cursor.com>

Prepare Git-backed ingestion context by loading data-source commit references, refreshing tracked branch head, and passing baseline commit plus resolved credentials into the ingestion pipeline before packaging begins. Co-authored-by: Cursor <cursoragent@cursor.com> # Conflicts: # src/api/ingestion/application/services/ingestion_service.py # src/api/ingestion/infrastructure/event_handler.py # src/api/ingestion/ports/services.py # src/api/tests/unit/ingestion/infrastructure/test_ingestion_event_handler.py

Skip heavy extraction when tracked branch head equals the last extraction baseline by emitting a completed lifecycle event and recording an explicit no-change audit log entry on the sync run. Co-authored-by: Cursor <cursoragent@cursor.com>

Expose a data-source diff summary API that compares the last extraction baseline to tracked branch head and returns aggregate counts plus a large-list-safe changed-file preview for maintenance decisions. Co-authored-by: Cursor <cursoragent@cursor.com>

Show commit-based diff counts immediately on each data source card and render the changed-file list as collapsed-by-default with explicit expand/collapse controls for large-diff safe browsing. Co-authored-by: Cursor <cursoragent@cursor.com>

…695) Add explicit data-source actions to refresh tracked/clone commit references and adopt tracked head as the current extraction baseline. This lets the UI surface per-source changed-file counts with user-controlled commit context updates for maintenance decisioning. Co-authored-by: Cursor <cursoragent@cursor.com>

Strengthen subagent delivery guidance with a parallel execution model, required context packs, and a blocker-question escalation flow so multiple agents can pause and ask focused questions without serializing delivery. Co-authored-by: Cursor <cursoragent@cursor.com>

…678) (#697) Add structured mode-specific agent configuration (system prompt, hierarchy, guardrails, and skill pack defaults) and wire session initialization to resolve and persist the configuration per knowledge graph scope. Co-authored-by: Cursor <cursoragent@cursor.com>

) (#698) Seed schema bootstrap sessions with a capabilities-intake prompt that offers first-pass or guided co-design paths, and persist the selected path/capability summary in session runtime context so the conversation remains continuous across requests. Co-authored-by: Cursor <cursoragent@cursor.com>

…679) (#699) Build a filesystem runtime context for extraction workloads by materializing ingestion package resources, reconstructing repository files, and exposing a deterministic skills directory path; wire it through extraction event handling and local/deployed container configuration. Co-authored-by: Cursor <cursoragent@cursor.com>

#700) Enhance schema browser rows to display prepopulated type indicators and live per-type instance counts with lazy query-backed loading, while extending shared type contracts and tests to cover the new inspector metadata behavior. Co-authored-by: Cursor <cursoragent@cursor.com>

…671) (#701) Add manage-authorized run-control operations (start, pause, halt, reset_running, reset_failed, reset_completed, reset_all) over data source sync runs, expose them via dedicated management routes, and verify behavior with unit tests for both service transitions and HTTP contract responses. Co-authored-by: Cursor <cursoragent@cursor.com>

Expose sync-run token/cost metadata in management API responses and add an extraction telemetry dashboard in the data-sources workspace with active worker counts, status buckets, recent job events, and 24h cost trend indicators backed by auto-refreshing sync data. Co-authored-by: Cursor <cursoragent@cursor.com>

Add knowledge-graph scoped maintenance schedule APIs with timezone-aware cron evaluation and persisted run outcomes, then expose the controls and history in the data-sources operations UI. Co-authored-by: Cursor <cursoragent@cursor.com>

…704) Extend the mutations console with a conversation-assisted draft flow and live entity/relationship inspector that highlights edited fields during the active session and resets highlights after apply/refresh. Co-authored-by: Cursor <cursoragent@cursor.com>

Replace legacy row actions with Manage, Query, and Delete, remove inline edit controls from the list surface, and align structural tests to the new action contract. Co-authored-by: Cursor <cursoragent@cursor.com>

) Implement a dedicated manage workspace route that loads workspace status projection, shows readiness and session pointers, and provides Validate and transition-to-extraction controls. Co-authored-by: Cursor <cursoragent@cursor.com>

Extend the manage workspace page with an always-visible extraction conversation panel, clear-chat reset action, and a tabbed lower operations area for extraction jobs, manual mutations, and run/log navigation. Co-authored-by: Cursor <cursoragent@cursor.com>

Replace the stub executor and broken busybox worker loop with opendatahub-io/agentic-ci containers, add by_files materialization with target_files persistence, and wire dev compose for migrations, gcloud ADC, and the ai-helpers image. Co-authored-by: Cursor <cursoragent@cursor.com>

Stopped kartograph-sticky-* containers kept their names after API reloads, causing docker run name conflicts when reopening Graph Management modes like Extraction Jobs. Co-authored-by: Cursor <cursoragent@cursor.com>

Graph Management Assistant prompts come from API-resolved agent_configuration, not filesystem SKILL.md mounts. Remove the sticky /app/skills bind, related env vars, and dead skills_dir plumbing from compose and deploy. Co-authored-by: Cursor <cursoragent@cursor.com>

Expose extraction-jobs read/write on the workload API and wire MCP tools so the assistant persists approved job set configs instead of directing operators to fill the UI manually. Co-authored-by: Cursor <cursoragent@cursor.com>

Default by_instances descriptions to full property and relationship coverage with explicit per-field notes, and align GMA skills, save tool guidance, and worker prompts on the same contract. Co-authored-by: Cursor <cursoragent@cursor.com>

…and edge Require EntityType -> rel -> CounterpartType lines and forbid theme-only sections so GMA writes ontology-grounded extraction briefs. Co-authored-by: Cursor <cursoragent@cursor.com>

Background polls no longer toggle the loading state that hid Job Status every 1.5s; stale data stays on screen with a subtle header spinner. Co-authored-by: Cursor <cursoragent@cursor.com>

Enable/disable job sets with partial pending-job sync while runs are active, individual job cancel with container teardown, clearer regenerate UX, and relationship ownership rules for per-instance descriptions. Fix agentic-ci prompt delivery, host-reachable API URLs, GCP mounts, and activity watch parsing. Co-authored-by: Cursor <cursoragent@cursor.com>

Mirror agentic-ci context/verdict patterns: unpack JobPackages into repository-files (instance path targeting with full fallback), copy helpers/workload-mutations.sh, pre-create writable mutations/, and fail jobs unless mutations/result.json reports operations_applied > 0. Tighten per-instance description skills with Adapter/Resource/ComponentTest counts. Co-authored-by: Cursor <cursoragent@cursor.com>

…iptions Job set descriptions must list counterpart-owned relationships under 'Ignore these relationships:' with IGNORE lines and instance counts.

Raise parallel extraction worker default from 2 to 20. Enforce per-instance description ownership on save, expose relationship authoring hints in config API, and keep assistant prompts correct on follow-up turns. Kill and Reset Running now stop orphaned extraction containers. Co-authored-by: Cursor <cursoragent@cursor.com>

…lization Re-fetch ingest-only archives when ZIPs are absent on disk so extraction jobs and sticky sessions populate repository-files. Gate readiness on archive presence and inject workload credentials into agentic-ci container env. Co-authored-by: Cursor <cursoragent@cursor.com>

…pare Persist successful extraction jobs as archived with mutation history and surface that in graph management. Validate relationship authoring against ontology and merge token/graph-write metrics from JSONL and agent streams. Use tarball-based GitHub full refresh with auth fallback, and order sync runs newest-first so prepare retries show accurate UI state. Co-authored-by: Cursor <cursoragent@cursor.com>

…aph failures Add one-command dev DB backup and restore, auto-repair corrupt tenant AGE graphs, return HTTP 503 for graph storage errors, and update GMA instructions to smoke-test prepopulation and stop on infrastructure failures. Co-authored-by: Cursor <cursoragent@cursor.com>

…pply chaining Add run_scanner.py to combine scan-to-JSONL in one step, enrich readiness tasks with order/run_command and underscore relationship paths, and return next_action from apply so agents can chain labels without polling readiness every batch. Co-authored-by: Cursor <cursoragent@cursor.com>

…one mutation log Enable CREATE/UPDATE/DELETE in workload validation and tools, accumulate applied JSONL per assistant session, and write one ARCHIVED extraction job when Clear chat ends the session so it appears in Extraction Archive history. Co-authored-by: Cursor <cursoragent@cursor.com>

… usage Teach the Graph Management Assistant that each relationship UI row needs a distinct edge_types label, with read-back verification before claiming saves. Also propagate Claude SDK token/cost metrics into session journals and chat turn handling for operator visibility. Co-authored-by: Cursor <cursoragent@cursor.com>

…rhead Front-load graph_id, property gaps, JSONL examples, and directory-prefix file materialization so enrichment jobs spend less time probing formats and paths. Co-authored-by: Cursor <cursoragent@cursor.com>

Implement GMA one-off mutations with session archiving, rename Mutation logs to Graph Writes History, fix job set labels and cost display, and add a template-driven manual mutation authoring panel with schema instance views. Co-authored-by: Cursor <cursoragent@cursor.com>

Drop the session pointers rail item and detail panel from all GMA modes; session history now lives in Graph Writes History when chat is cleared. Co-authored-by: Cursor <cursoragent@cursor.com>

…schema explorer Add bulk instance edit workflow guidance, helpers/sync_instances.py for diff-and-generate JSONL, and clearer list-instances MCP tool docs so agents batch deletes instead of per-slug loops. Replace manage overview type badges with GraphSchemaExplorer and extract reusable entity/relationship type list components. Co-authored-by: Cursor <cursoragent@cursor.com>

…nagement Load 100 instances per type instead of a global cap, merge observed properties into schema display, and add paginated instance APIs with property search plus load-more UI on entity and relationship panels. Co-authored-by: Cursor <cursoragent@cursor.com>

Scope GMA containers and conversations by graph-management UI mode (three parallel sessions per user/KG), add start/end/clear session APIs, terminate containers without auto-restart, expire idle sessions after 1 hour, and archive Graph Writes History only when a closed session has write_ops > 0. Update the manage UI with Start/End session controls and fix archived write count sourcing. Co-authored-by: Cursor <cursoragent@cursor.com>

…ckend Secure GMA agent containers with session-bound /v1/turn auth, Docker hardening flags, and per-turn workload tokens instead of long-lived env JWTs. Add OpenShell-backed sticky sessions and extraction jobs with per-mode network policies, dev compose wiring, and prod manifest stubs. Co-authored-by: Cursor <cursoragent@cursor.com>

…and manage UX Move batch extraction to one reusable OpenShell sandbox per worker, route GMA through inference.local with Vertex effort capping, and add maintain/archive workspace improvements plus token-efficient partial UPDATE tooling for jobs. Co-authored-by: Cursor <cursoragent@cursor.com>

…us sync Repair OpenShell extraction start failures, cap workers at 50 without sandbox UI noise, and keep recent job events accurate with status filters including archived. Co-authored-by: Cursor <cursoragent@cursor.com>

…rkers Split job prepare from long OpenShell execution to avoid pool exhaustion with high worker counts, scale up live runs on Start, and add Failed filter to recent job events. Co-authored-by: Cursor <cursoragent@cursor.com>

aredenba-rh and others added 30 commits May 26, 2026 12:58

manage kg specs

48e428b

minor edits to specs; github issues created

fcdbe4d

feat(dev-ui): switch KG row actions to manage/query/delete (#705)

565d100

Replace legacy row actions with Manage, Query, and Delete, remove inline edit controls from the list surface, and align structural tests to the new action contract. Co-authored-by: Cursor <cursoragent@cursor.com>

aredenba-rh and others added 30 commits June 9, 2026 11:58

agent session service

9fe1569

feat(extraction): template per-instance job descriptions by property …

bccf1b7

…and edge Require EntityType -> rel -> CounterpartType lines and forbid theme-only sections so GMA writes ontology-grounded extraction briefs. Co-authored-by: Cursor <cursoragent@cursor.com>

fix(ui): keep job status visible during extraction polling refresh

05612f2

Background polls no longer toggle the loading state that hid Job Status every 1.5s; stale data stays on screen with a subtle header spinner. Co-authored-by: Cursor <cursoragent@cursor.com>

docs(extraction): require explicit IGNORE lines in per-instance descr…

7ad79e9

…iptions Job set descriptions must list counterpart-owned relationships under 'Ignore these relationships:' with IGNORE lines and instance counts.

feat(extraction): pre-seed job context and reduce agent discovery ove…

14e1fd5

…rhead Front-load graph_id, property gaps, JSONL examples, and directory-prefix file materialization so enrichment jobs spend less time probing formats and paths. Co-authored-by: Cursor <cursoragent@cursor.com>

refactor(dev-ui): remove session pointers from graph management

2897324

Drop the session pointers rail item and detail panel from all GMA modes; session history now lives in Graph Writes History when chat is cleared. Co-authored-by: Cursor <cursoragent@cursor.com>

Dont show #RelationshipTypes x2.. show real count

fd006a3

kg-backups

96f3340

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ui): KG-scoped data source onboarding (k-extract flow)#737

feat(ui): KG-scoped data source onboarding (k-extract flow)#737
aredenba-rh wants to merge 143 commits into
mainfrom
feature/manage-knowledge-graph

aredenba-rh commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aredenba-rh commented May 26, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants