Skip to content

3quarter/jarvis-runtime

Repository files navigation

jarvis-runtime

A production-grade, multi-channel personal AI-assistant runtime in TypeScript — built to orchestrate Claude Code and Gemini CLIs as long-running, tool-using, autonomous agents.

jarvis-runtime is a long-lived Node service that turns CLI-based coding agents (Claude Code, Gemini CLI) into a persistent assistant you can talk to from Telegram or a local web UI. It manages sessions, routes work across LLM providers, exposes an in-process tool layer (filesystem, full-text search, headless browser, sandboxed shell), and can run multi-step engineering projects autonomously — including overnight, under budget caps and PM-style oversight.

Snapshot notice: This repository is a March 2026 snapshot of a real, working system. It is published as an engineering portfolio reference. The agent ecosystem has moved fast since — see Status & 2026 Context for an honest account of what still holds up and what I'd rebuild today.


Why this exists

Most "AI assistant" demos are a single chat loop around one API. This is the opposite: a runtime with the unglamorous parts that make an agent usable day-to-day — session persistence, provider failover, rate limiting, crash recovery, a scheduler, budget enforcement, and a unified message layer across channels. The interesting design decisions are in the seams between subsystems, not in any single prompt.


Architecture

Nine cooperating subsystems, wired together in src/index.ts:

# Subsystem What it does Key source
1 Channels Unified message interface across Telegram (grammY, locked to a single allowed user ID) and a local web UI (browser chat with Web Speech API voice input, localhost-only). Both feed the same gateway/session/tool stack. src/channels/
2 Gateway The agentic loop. Builds context, routes to a provider, runs a bounded tool-call loop, applies token-bucket rate limiting, and does retry-with-provider-fallback on failure. src/gateway/, src/util/rate-limiter.ts
3 LLM Router & Cascade Task-type routing (chat, worker, engineering, …) with provider:model override syntax. Providers: Claude Code CLI, Gemini CLI, OpenRouter (HTTP), and Google AI REST. A cascade provider chains free-tier Gemini API keys → paid Anthropic fallbacks for cost-controlled bulk work. src/llm/
4 Session + Transcripts In-memory session manager (history window + idle timeout) backed by an append-only JSONL transcript store per chat, plus persistence of CLI session IDs so conversations survive restarts. src/session/, src/state/
5 Search (SQLite + FTS5) Full-text index over the markdown knowledge base using SQLite FTS5 (porter tokenizer), with incremental mtime-based refresh and snippet/rank scoring. Rebuilt on boot, refreshed on an interval. src/search/
6 Tool Registry A single in-process tool layer exposed to LLMs: filesystem read/write/append, search, headless browser (Playwright), sandboxed shell, project control, orchestrator control, and reminders. src/tools/, src/browser/, src/shell/
7 Project Autonomy Autonomous multi-step engineering: plan → execute → validate, with a git branch per project, dependency-ordered tasks, retry/self-healing, and validation commands as the success gate. src/project/
8 Orchestrator Spawns long-lived Claude Code sessions over the stream-json protocol, supervised by a cheaper PM model (via OpenRouter) that classifies output and decides continue / wait_for_human / stop / escalate. Includes per-session budget limits and loop detection. src/orchestrator/
9 Overnight Runner & Scheduler A time-windowed autonomous queue (e.g. 22:00–06:00) with nightly + per-session USD budget caps, plus an interval scheduler driving proactive checks (inbox size, stale memory, stale projects) and morning digests. src/overnight/, src/scheduler/

Cross-cutting concerns: structured logging with rotation (winston), a /health HTTP endpoint, global crash handlers, request-scoped IDs, and graceful shutdown with a hard timeout guard.


Tech stack

  • Language / runtime: TypeScript (strict), Node.js ≥ 20, ES modules
  • Storage: better-sqlite3 with FTS5 for search; append-only JSONL for transcripts
  • Channels: grammy (Telegram); native ws + a static web UI for the browser channel
  • Browser automation: playwright (headless Chromium, pooled)
  • Validation / config: zod schema over a json5 config file with ${ENV} interpolation
  • Observability: winston + winston-daily-rotate-file
  • Process management: PM2 (ecosystem.config.cjs)
  • Testing: vitest

Test status

505 unit/integration tests passing (vitest, 518 total; 13 live-CLI end-to-end tests are env-gated and skipped by default), spanning every subsystem: router/cascade, providers, gateway loop, session/transcript persistence, FTS5 search, tool registry, shell security, browser pool, project planner/runner/healing, orchestrator store/PM classification, overnight scheduling/budgets, and channel command handling.

npm test

Quick start

Prerequisites

  • Node.js ≥ 20
  • (Optional) Claude Code CLI authenticated (claude login) for the Anthropic provider
  • (Optional) Gemini CLI authenticated for the Gemini provider
  • A Telegram bot token (from @BotFather) if you use the Telegram channel

Install & build

npm install
npm run build
npm test

Configure

Copy the example environment file and fill in your own values:

cp .env.example .env

The Claude and Gemini providers use CLI subscription auth (no API key in env). OpenRouter is the only required key for the pay-per-token fallback path; the Google AI REST keys are optional and power the free-tier worker cascade. Provider blocks whose credentials resolve to empty are stripped automatically at boot, so you can run with whatever subset you have.

Runtime behavior (which channels, tools, autonomy levels, budgets, and routing are enabled) is configured in config/default.json5. Browser, shell, project, orchestrator, and overnight modes are all individually toggleable and default to safe limits (allowed base dirs, timeouts, max output sizes, USD budget caps).

Run

# Development (watch mode)
npm run dev

# Production
npm start

# Or under PM2
pm2 start ecosystem.config.cjs

Health check: GET /health on the configured health port. Local web UI (if enabled): http://localhost:3000.


Security model

This is a single-user system by design. The Telegram channel only responds to one configured user ID; the shell executor restricts execution to an allowlist of base directories with timeout, output-size, and concurrency caps plus blocked-pattern filtering; the browser runs headless and pooled; and autonomous modes enforce per-session and nightly USD budget limits with loop detection. Secrets live only in .env (gitignored); config references them via ${ENV} interpolation, never literals.


Status & 2026 Context

I'm publishing this as an honest snapshot: a real system, frozen at its last commit (2026-03-01), with a clear-eyed read on how it ages. That last commit pre-dates the Claude Agent SDK (Apr 8, 2026) and a wave of agent infrastructure that landed right after — so several subsystems I hand-rolled here now have first-party or best-in-class equivalents. Knowing exactly which is the point.

Still defensible — I'd keep these:

  • In-process tool registry (subsystem 6). Running tools in-process instead of behind out-of-process servers is a deliberate token-economics call: external tool servers re-inject large schemas into context on every turn. For a latency- and cost-sensitive personal runtime, keeping the hot tools in-process is still the right tradeoff.
  • SQLite + FTS5 for memory/search (subsystem 5). The 2026 consensus has converged toward embedded SQLite-backed agent memory, not away from it. This was ahead of the curve. The clear next step is hybrid retrieval — add a vector column and fuse lexical + semantic ranking with Reciprocal Rank Fusion.
  • Unified multi-channel session layer (subsystems 1 & 4). No mainstream agent framework cleanly owns session unification across heterogeneous channels (Telegram + web sharing one session/router/tool stack). This remains genuinely useful glue.

What I'd rebuild on newer tooling today:

  • Overnight queue (subsystem 9) → a durable-execution platform (Trigger.dev v3 / Inngest). My cron-window + budget-cap runner works, but durable execution gives you retries, replay, and observability for free.
  • Model cascade / provider failover (subsystem 3) → a model gateway (Vercel AI Gateway / OpenRouter routing). My hand-rolled cascade is sound, but gateways now do failover, cost routing, and key rotation as a managed concern.
  • Orchestrator & project autonomy (subsystems 7 & 8) → the Claude Agent SDK, Vercel AI SDK's agent primitives, or Claude Code's own goal-driven modes. The patterns here (PM-supervised sessions, plan/execute/validate, loop detection, budget gating) are still exactly right — I'd just implement them on the SDK rather than against the raw stream-json protocol.

Net: the architecture and the judgment calls hold up; some of the plumbing now has better off-the-shelf parts. That's the honest state of any system built on the leading edge of a fast-moving field — and being precise about it is the whole point of publishing this.


License

MIT. See LICENSE.

About

Multi-channel personal AI assistant runtime orchestrating Claude Code + Gemini CLIs — SQLite/FTS5 state, cascade LLM router, autonomous project/overnight modes (March 2026 snapshot).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors