A contract-first agentic AI engineering platform for observable, replay-aware, human-supervised automation.
ARIA is built for the engineering problems that appear after an agent demo starts becoming a system: structured planning, tool boundaries, runtime safety, human approval, traceability, replay, and learning from execution outcomes.
ARIA separates the system into explicit runtime planes:
- Brain: planning, reasoning, orchestration, HITL routing, and state transitions.
- Eye: screenshot capture, VLM/OCR perception, UI state recognition, and UIRef extraction.
- Hand: browser, desktop, ML, and vendor-backed execution adapters.
- Memory: working, episodic, semantic, and learning-oriented memory layers.
- Safety: domain policy, risk detection, PII protection, captcha handling, rate limits, and human approval gates.
- Event & Replay: structured events, trace ids, step ids, deterministic trace envelopes, and audit-friendly execution history.
The result is not a single automation bot. It is a platform architecture for building reliable agentic workflows under real-world constraints.
| Area | Public v0.2 Snapshot |
|---|---|
| System type | Agentic AI engineering platform |
| Architecture | Brain / Eye / Hand / Memory / Safety / Event & Replay |
| Orchestration | LangGraph-style state-machine execution |
| Runtime surfaces | FastAPI, WebSocket, legacy Streamlit operator UI |
| Safety posture | HITL-first, domain-aware, PII-aware, rate-limited |
| Public code slice | Replay trace contracts and Job Apply model restoration |
| Verification | 96 unit tests passing; 27 integration tests passing, 7 skipped |
| License | AGPL-3.0-or-later |
This public release is designed to show engineering judgment, not only feature count:
- how an agent runtime is separated into durable system boundaries,
- how planning and execution are kept apart through capability contracts,
- how sensitive actions route through Safety and HITL,
- how traces can be shaped for deterministic replay and audit,
- how a domain plugin can sit on top of the platform instead of becoming the platform,
- how a private research workspace can be published as a clean, reviewable public release.
This repository is a curated public preview of ARIA.
The earlier public line, v0.1.x, covered the first foundation phases. The current v0.2 preview refreshes those foundations and publishes the architecture up to Phase 12 without dumping the full private workspace.
Release notes: ARIA v0.2.0 Public Preview.
- Refreshed public documentation for Phase 00 through Phase 12.
- A clearer architecture story from infrastructure to core completion.
- A small but real replay/trace contract module:
TraceEnvelopeStepRecordReplayRequest- deterministic content hashing
- Pydantic validation of replay-critical invariants
- Targeted tests for the replay contract slice.
- A roadmap for the next public releases: observability, artifacts/replay hardening, trust governance, MCP, and control-plane UI.
Some newer internal work is intentionally not published in this preview: large evidence artifacts, private run outputs, QLoRA experiments, long-horizon planning work, advanced policy-learning internals, full Next.js control-plane implementation, private traces, and environment-specific runtime data.
That boundary is deliberate. The public repo is meant to be readable, reviewable, and safe to evaluate.
If you are reviewing this project quickly, start here:
- Read this README for the system story and public/private boundary.
- Open Phase 12: Platform Consolidation to understand the v0.2 architecture checkpoint.
- Inspect src/aria/core/replay/trace.py for the public replay contract.
- Run tests/unit/test_replay_trace.py for the smallest verification slice.
- Browse Docs/English/phases/README.md for the phased release map.
ARIA Runtime
User / API / UI
|
v
+----------------------+ +----------------------+
| Brain |<----->| Memory |
| planner / executor | | working / episodic |
| observer / HITL | | semantic / learning |
+----------+-----------+ +----------+-----------+
| ^
v |
+----------------------+ +----------+-----------+
| Safety & Policy |<----->| Event / Trace Plane |
| risk / PII / HITL | | envelope / replay |
| domain / rate limit | | audit / evidence |
+----------+-----------+ +----------+-----------+
|
v
+----------------------+ +----------------------+
| Hand |<----->| Eye |
| browser / desktop | | screenshots / VLM |
| tools / vendors | | OCR / UIRef |
+----------------------+ +----------------------+
The main design choice is explicit separation of responsibilities. The Brain should not know browser internals. The Hand should not invent policy. The Eye should describe state, not execute actions. The Event/Trace plane should make every meaningful action reconstructable.
The public v0.2 release line now documents the project through Phase 12.
| Phase | Public Status | Focus |
|---|---|---|
| 00 | refreshed | Repository, configuration, Docker, logging, and base layout |
| 01 | refreshed | Event envelope, Kafka/Redpanda, Redis state, topic taxonomy |
| 02 | refreshed | Working, episodic, semantic memory and vector storage |
| 03 | refreshed | LangGraph Brain, planner, executor, observer, HITL |
| 04 | refreshed | Screenshot, VLM/OCR perception, UIRef extraction |
| 05 | refreshed | Browser/desktop execution adapters and capability routing |
| 06 | refreshed | First domain plugin: job search, matching, application flow |
| 07 | refreshed | Skill extraction, policy learning, feedback loops |
| 08 | refreshed | Operator UI, live view, HITL, bilingual/RTL support |
| 09 | refreshed | Unit, integration, E2E, CI, documentation hardening |
| 10 | refreshed | Safety gate, domain policy, PII, captcha, rate limits |
| 11 | public preview | AIHawk, Skyvern, OpenAdapt, browser-use integration boundaries |
| 12 | public preview | Core completion, event-bus abstraction, trace/replay contracts |
See the full phase index: Docs/English/phases/README.md.
The code added in this preview is intentionally small and reviewable:
src/aria/core/replay/
├── __init__.py
└── trace.py
tests/unit/
└── test_replay_trace.py
It introduces replay-safe contracts without exposing private traces or internal evidence:
- terminal traces require
completed_at, - failed steps require an explicit
error, - step ids are unique within a trace,
- trace hashes are deterministic,
- replay requests can verify integrity before execution.
The latest local verification for this public branch passed:
pytest tests/unit -q
# 96 passed
pytest tests/integration -q
# 27 passed, 7 skipped
ruff check src/aria/core/replay src/aria/plugins/job_apply/models \
tests/unit/test_replay_trace.py tests/unit/plugins/job_apply \
tests/integration/test_hand.py tests/integration/test_brain_graph.py \
--select E,F,I,ANN,UP,DTZ,TC,PLC,PLW
# All checks passed
The skipped integration tests are service-dependent paths that require external runtime services such as Redis/Redpanda in specific configurations.
- Python 3.11+
- Docker and Docker Compose
- Redis, Redpanda/Kafka, and Qdrant for full integration scenarios
- Local or remote LLM provider configured through environment variables
git clone https://github.com/MahdiNavaei/aria.git
cd aria
python -m venv .venv
# Windows PowerShell
.\.venv\Scripts\Activate.ps1
# Linux/macOS
source .venv/bin/activate
pip install -e ".[dev]"
cp .env.example .envpytest tests/unit/test_replay_trace.py -q
ruff check src/aria/core/replay tests/unit/test_replay_trace.pyExpected targeted result:
4 passed
All checks passed
docker compose up -d
uvicorn aria.api.main:app --host 0.0.0.0 --port 8000
streamlit run src/aria/ui/app.pyThe legacy Streamlit dashboard is available at http://localhost:8501.
aria/
├── src/aria/
│ ├── adapters/ # Browser, desktop, Redis, Kafka, ML adapters
│ ├── api/ # FastAPI routes and WebSocket runtime
│ ├── core/
│ │ ├── brain/ # Planner, executor, observer, HITL graph nodes
│ │ ├── eye/ # Screenshot, VLM/OCR, UIRef perception
│ │ ├── hand/ # Capability abstraction and execution boundary
│ │ ├── learning/ # Skill extraction and policy feedback
│ │ ├── memory/ # Working, episodic, semantic memory
│ │ ├── replay/ # v0.2 public trace/replay contract
│ │ └── safety/ # Domain, risk, PII, captcha, rate-limiting
│ ├── plugins/ # Domain plugins, starting with job apply
│ └── ui/ # Legacy Streamlit operator surface
├── config/ # YAML configuration
├── Docs/English/ # Public architecture and phase docs
├── tests/ # Unit, integration, and E2E tests
└── vendor/ # Vendored integrations and license-governed sources
ARIA is built around production-oriented agent constraints:
- Schema-first boundaries: agent state, tool calls, traces, and replay requests are structured.
- Human authority: sensitive actions route through HITL instead of silent execution.
- Auditability: every serious runtime path should be explainable through ids, events, and artifacts.
- Local-first capability: local LLMs and local state stores are first-class for privacy and cost control.
- Progressive public releases: new private capabilities are published only after they can be explained, tested, and separated from private artifacts.
The next public releases should be staged rather than dumped all at once.
Goal: publish the first clean observability slice with structured logs, metrics naming, trace correlation, latency budgets, and operator-facing health signals.
Expected public outputs:
- trace context propagation notes,
- metric taxonomy,
- health and readiness documentation,
- minimal tests around trace ids and event correlation.
Goal: show how ARIA records execution evidence without leaking private data. This phase should introduce artifact manifests, redaction rules, replay-safe summaries, and failure records.
Expected public outputs:
- artifact manifest schema,
- redaction policy notes,
- replay/failure examples with synthetic data,
- evidence-pack validation tests.
Goal: document and publish a first governance layer around approvals, risk levels, trust scopes, RBAC expectations, and policy-gated execution.
Expected public outputs:
- trust envelope schema,
- approval lifecycle,
- safety escalation matrix,
- policy compatibility tests.
Later public releases can then introduce MCP runtime selection, frontend control-plane previews, learning evaluation, and adaptive routing as separate, readable milestones.
Core docs:
- Architecture Overview
- Project Structure
- Event Model
- Safety and Guardrails
- Testing Strategy
- Phase Index
Key ADRs:
- ADR-001: Event Sourcing
- ADR-002: Brain/Hand Capability Contract
- ADR-006: HITL First-Class
- ADR-009: Safety Gates
ARIA is licensed under the GNU Affero General Public License v3.0 or later. See LICENSE, NOTICE, THIRD_PARTY_LICENSES.md, and LICENSE_COMPLIANCE.md.
The AGPL license is intentional because ARIA includes AGPL-governed vendor components and is designed for network-accessible agent runtimes.


