feat(ui): KG-scoped data source onboarding (k-extract flow)#737
Open
aredenba-rh wants to merge 143 commits into
Open
feat(ui): KG-scoped data source onboarding (k-extract flow)#737aredenba-rh wants to merge 143 commits into
aredenba-rh wants to merge 143 commits into
Conversation
* chore(skills): add subagent delivery execution protocol Add a reusable subagent skill that standardizes issue-based branching, TDD execution, PR structure, and merge/conflict handling into feature/manage-knowledge-graph. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(management): add knowledge graph workspace mode lifecycle Implement schema_bootstrap as the default workspace mode and persist irreversible transition state to extraction_operations across domain, repository, API responses, and migration coverage. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>
…681) Add a workspace-status API projection with mode, readiness flags, transition eligibility, and session pointers, including service and route authorization coverage for manage workspace rendering. Co-authored-by: Cursor <cursoragent@cursor.com>
…#682) Enforce workspace readiness checks for minimum entity/relationship type coverage and prepopulated type instance presence, and project blocking reasons so validate/transition workflows can render actionable feedback. Co-authored-by: Cursor <cursoragent@cursor.com>
Expose authorized validate and transition commands for knowledge graph workspaces, persist session pointers, and create an extraction-mode session identifier when moving from bootstrap to extraction operations. Co-authored-by: Cursor <cursoragent@cursor.com>
Add durable run-level mutation metadata storage and lifecycle persistence for session/scope identity, timestamps, token-cost totals, and operation-count summaries linked to each sync run. Co-authored-by: Cursor <cursoragent@cursor.com>
Emit operation-class counts and token/cost totals from mutation-log application results into MutationsApplied payloads so downstream sync lifecycle persistence can finalize run-level metadata. Co-authored-by: Cursor <cursoragent@cursor.com>
#686) Scaffold extraction application/presentation package structure and add pytest-archon rules enforcing DDD layer boundaries plus cross-context isolation so subsequent extraction features stay architecturally clean. Co-authored-by: Cursor <cursoragent@cursor.com>
Implement per-user/per-knowledge-graph/per-mode extraction session lifecycle behaviors with clear-chat reset semantics and archived-session retention backed by repository ports and unit coverage. Co-authored-by: Cursor <cursoragent@cursor.com>
Resolve mode-specific extraction skill templates from global defaults and apply deterministic knowledge-graph override merges so session prompts are stable, customizable, and repeatable. Co-authored-by: Cursor <cursoragent@cursor.com>
Persist clone-head, last-extraction baseline, and tracked-branch head commit references for data sources and expose them in management API responses for downstream ingestion and UI commit-status workflows. Co-authored-by: Cursor <cursoragent@cursor.com>
Prepare Git-backed ingestion context by loading data-source commit references, refreshing tracked branch head, and passing baseline commit plus resolved credentials into the ingestion pipeline before packaging begins. Co-authored-by: Cursor <cursoragent@cursor.com> # Conflicts: # src/api/ingestion/application/services/ingestion_service.py # src/api/ingestion/infrastructure/event_handler.py # src/api/ingestion/ports/services.py # src/api/tests/unit/ingestion/infrastructure/test_ingestion_event_handler.py
Skip heavy extraction when tracked branch head equals the last extraction baseline by emitting a completed lifecycle event and recording an explicit no-change audit log entry on the sync run. Co-authored-by: Cursor <cursoragent@cursor.com>
Expose a data-source diff summary API that compares the last extraction baseline to tracked branch head and returns aggregate counts plus a large-list-safe changed-file preview for maintenance decisions. Co-authored-by: Cursor <cursoragent@cursor.com>
Show commit-based diff counts immediately on each data source card and render the changed-file list as collapsed-by-default with explicit expand/collapse controls for large-diff safe browsing. Co-authored-by: Cursor <cursoragent@cursor.com>
…695) Add explicit data-source actions to refresh tracked/clone commit references and adopt tracked head as the current extraction baseline. This lets the UI surface per-source changed-file counts with user-controlled commit context updates for maintenance decisioning. Co-authored-by: Cursor <cursoragent@cursor.com>
Strengthen subagent delivery guidance with a parallel execution model, required context packs, and a blocker-question escalation flow so multiple agents can pause and ask focused questions without serializing delivery. Co-authored-by: Cursor <cursoragent@cursor.com>
) (#698) Seed schema bootstrap sessions with a capabilities-intake prompt that offers first-pass or guided co-design paths, and persist the selected path/capability summary in session runtime context so the conversation remains continuous across requests. Co-authored-by: Cursor <cursoragent@cursor.com>
…679) (#699) Build a filesystem runtime context for extraction workloads by materializing ingestion package resources, reconstructing repository files, and exposing a deterministic skills directory path; wire it through extraction event handling and local/deployed container configuration. Co-authored-by: Cursor <cursoragent@cursor.com>
#700) Enhance schema browser rows to display prepopulated type indicators and live per-type instance counts with lazy query-backed loading, while extending shared type contracts and tests to cover the new inspector metadata behavior. Co-authored-by: Cursor <cursoragent@cursor.com>
…671) (#701) Add manage-authorized run-control operations (start, pause, halt, reset_running, reset_failed, reset_completed, reset_all) over data source sync runs, expose them via dedicated management routes, and verify behavior with unit tests for both service transitions and HTTP contract responses. Co-authored-by: Cursor <cursoragent@cursor.com>
Expose sync-run token/cost metadata in management API responses and add an extraction telemetry dashboard in the data-sources workspace with active worker counts, status buckets, recent job events, and 24h cost trend indicators backed by auto-refreshing sync data. Co-authored-by: Cursor <cursoragent@cursor.com>
Add knowledge-graph scoped maintenance schedule APIs with timezone-aware cron evaluation and persisted run outcomes, then expose the controls and history in the data-sources operations UI. Co-authored-by: Cursor <cursoragent@cursor.com>
…704) Extend the mutations console with a conversation-assisted draft flow and live entity/relationship inspector that highlights edited fields during the active session and resets highlights after apply/refresh. Co-authored-by: Cursor <cursoragent@cursor.com>
Replace legacy row actions with Manage, Query, and Delete, remove inline edit controls from the list surface, and align structural tests to the new action contract. Co-authored-by: Cursor <cursoragent@cursor.com>
Extend the manage workspace page with an always-visible extraction conversation panel, clear-chat reset action, and a tabbed lower operations area for extraction jobs, manual mutations, and run/log navigation. Co-authored-by: Cursor <cursoragent@cursor.com>
Replace the stub executor and broken busybox worker loop with opendatahub-io/agentic-ci containers, add by_files materialization with target_files persistence, and wire dev compose for migrations, gcloud ADC, and the ai-helpers image. Co-authored-by: Cursor <cursoragent@cursor.com>
Stopped kartograph-sticky-* containers kept their names after API reloads, causing docker run name conflicts when reopening Graph Management modes like Extraction Jobs. Co-authored-by: Cursor <cursoragent@cursor.com>
Graph Management Assistant prompts come from API-resolved agent_configuration, not filesystem SKILL.md mounts. Remove the sticky /app/skills bind, related env vars, and dead skills_dir plumbing from compose and deploy. Co-authored-by: Cursor <cursoragent@cursor.com>
Expose extraction-jobs read/write on the workload API and wire MCP tools so the assistant persists approved job set configs instead of directing operators to fill the UI manually. Co-authored-by: Cursor <cursoragent@cursor.com>
Default by_instances descriptions to full property and relationship coverage with explicit per-field notes, and align GMA skills, save tool guidance, and worker prompts on the same contract. Co-authored-by: Cursor <cursoragent@cursor.com>
…and edge Require EntityType -> rel -> CounterpartType lines and forbid theme-only sections so GMA writes ontology-grounded extraction briefs. Co-authored-by: Cursor <cursoragent@cursor.com>
Background polls no longer toggle the loading state that hid Job Status every 1.5s; stale data stays on screen with a subtle header spinner. Co-authored-by: Cursor <cursoragent@cursor.com>
Enable/disable job sets with partial pending-job sync while runs are active, individual job cancel with container teardown, clearer regenerate UX, and relationship ownership rules for per-instance descriptions. Fix agentic-ci prompt delivery, host-reachable API URLs, GCP mounts, and activity watch parsing. Co-authored-by: Cursor <cursoragent@cursor.com>
Mirror agentic-ci context/verdict patterns: unpack JobPackages into repository-files (instance path targeting with full fallback), copy helpers/workload-mutations.sh, pre-create writable mutations/, and fail jobs unless mutations/result.json reports operations_applied > 0. Tighten per-instance description skills with Adapter/Resource/ComponentTest counts. Co-authored-by: Cursor <cursoragent@cursor.com>
…iptions Job set descriptions must list counterpart-owned relationships under 'Ignore these relationships:' with IGNORE lines and instance counts.
Raise parallel extraction worker default from 2 to 20. Enforce per-instance description ownership on save, expose relationship authoring hints in config API, and keep assistant prompts correct on follow-up turns. Kill and Reset Running now stop orphaned extraction containers. Co-authored-by: Cursor <cursoragent@cursor.com>
…lization Re-fetch ingest-only archives when ZIPs are absent on disk so extraction jobs and sticky sessions populate repository-files. Gate readiness on archive presence and inject workload credentials into agentic-ci container env. Co-authored-by: Cursor <cursoragent@cursor.com>
…pare Persist successful extraction jobs as archived with mutation history and surface that in graph management. Validate relationship authoring against ontology and merge token/graph-write metrics from JSONL and agent streams. Use tarball-based GitHub full refresh with auth fallback, and order sync runs newest-first so prepare retries show accurate UI state. Co-authored-by: Cursor <cursoragent@cursor.com>
…aph failures Add one-command dev DB backup and restore, auto-repair corrupt tenant AGE graphs, return HTTP 503 for graph storage errors, and update GMA instructions to smoke-test prepopulation and stop on infrastructure failures. Co-authored-by: Cursor <cursoragent@cursor.com>
…pply chaining Add run_scanner.py to combine scan-to-JSONL in one step, enrich readiness tasks with order/run_command and underscore relationship paths, and return next_action from apply so agents can chain labels without polling readiness every batch. Co-authored-by: Cursor <cursoragent@cursor.com>
…one mutation log Enable CREATE/UPDATE/DELETE in workload validation and tools, accumulate applied JSONL per assistant session, and write one ARCHIVED extraction job when Clear chat ends the session so it appears in Extraction Archive history. Co-authored-by: Cursor <cursoragent@cursor.com>
… usage Teach the Graph Management Assistant that each relationship UI row needs a distinct edge_types label, with read-back verification before claiming saves. Also propagate Claude SDK token/cost metrics into session journals and chat turn handling for operator visibility. Co-authored-by: Cursor <cursoragent@cursor.com>
…rhead Front-load graph_id, property gaps, JSONL examples, and directory-prefix file materialization so enrichment jobs spend less time probing formats and paths. Co-authored-by: Cursor <cursoragent@cursor.com>
Implement GMA one-off mutations with session archiving, rename Mutation logs to Graph Writes History, fix job set labels and cost display, and add a template-driven manual mutation authoring panel with schema instance views. Co-authored-by: Cursor <cursoragent@cursor.com>
Drop the session pointers rail item and detail panel from all GMA modes; session history now lives in Graph Writes History when chat is cleared. Co-authored-by: Cursor <cursoragent@cursor.com>
…schema explorer Add bulk instance edit workflow guidance, helpers/sync_instances.py for diff-and-generate JSONL, and clearer list-instances MCP tool docs so agents batch deletes instead of per-slug loops. Replace manage overview type badges with GraphSchemaExplorer and extract reusable entity/relationship type list components. Co-authored-by: Cursor <cursoragent@cursor.com>
…nagement Load 100 instances per type instead of a global cap, merge observed properties into schema display, and add paginated instance APIs with property search plus load-more UI on entity and relationship panels. Co-authored-by: Cursor <cursoragent@cursor.com>
Scope GMA containers and conversations by graph-management UI mode (three parallel sessions per user/KG), add start/end/clear session APIs, terminate containers without auto-restart, expire idle sessions after 1 hour, and archive Graph Writes History only when a closed session has write_ops > 0. Update the manage UI with Start/End session controls and fix archived write count sourcing. Co-authored-by: Cursor <cursoragent@cursor.com>
…ckend Secure GMA agent containers with session-bound /v1/turn auth, Docker hardening flags, and per-turn workload tokens instead of long-lived env JWTs. Add OpenShell-backed sticky sessions and extraction jobs with per-mode network policies, dev compose wiring, and prod manifest stubs. Co-authored-by: Cursor <cursoragent@cursor.com>
…and manage UX Move batch extraction to one reusable OpenShell sandbox per worker, route GMA through inference.local with Vertex effort capping, and add maintain/archive workspace improvements plus token-efficient partial UPDATE tooling for jobs. Co-authored-by: Cursor <cursoragent@cursor.com>
…us sync Repair OpenShell extraction start failures, cap workers at 50 without sandbox UI noise, and keep recent job events accurate with status filters including archived. Co-authored-by: Cursor <cursoragent@cursor.com>
…rkers Split job prepare from long OpenShell execution to avoid pool exhaustion with high worker counts, scale up live runs on Start, and add Failed filter to recent job events. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
/knowledge-graphs/{kgId}/data-sources/new(URLs → configure → sequential initial sync → summary), modeled after k-extractdesigner/new./knowledge-graphs/{kgId}/data-sources(phase1 equivalent) for sync, commits, diff, and maintenance focus.dataSourceCount === 0, otherwise to the operations page.Closes #736
Test plan
/data-sources/new?focus=maintainfilters to maintenance-ready sources/data-sourcesunchangedMade with Cursor