Skip to content

feat(billing): OSS metering foundation (usage signals + DI seams)#126

Open
yilu331 wants to merge 18 commits into
mainfrom
billing/metering-foundation
Open

feat(billing): OSS metering foundation (usage signals + DI seams)#126
yilu331 wants to merge 18 commits into
mainfrom
billing/metering-foundation

Conversation

@yilu331

@yilu331 yilu331 commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds the OSS metering foundation that the (closed-source) billing layer builds on. This is the open-source half: it emits structured usage signals and exposes clean dependency-injection seams so a downstream deployment can attach billing/enforcement without the OSS core depending on any of it. With no overrides registered, behavior is unchanged — every hook is a no-op and metering is fully isolated from the request success path.

Three value signals are emitted as UsageEvents:

  1. Learning — extraction input tokens (input-anchored, model-agnostic cl100k_base count) + learnings generated. Metered only for profile/playbook generation; eval/reflection pipelines do not bill.
  2. Application — applied learnings (surfaced top-K) for production agents only; dashboard/non-agent callers and empty results emit nothing.
  3. Access — caller-type classification seam (default_get_caller_type, no-op "internal" in OSS).

Changes

  • billing_signals.py — single source of the platform-LLM rule + the canonical billing tokenizer (count_input_tokens).
  • billing_meter.py — emission helpers; the one source of usage-event shape.
  • usage_metrics.pyUsageEvent gains token/attribution/caller fields; record_usage_event swallows recorder failures so metering can never 500 a request.
  • llm/token_accounting.pyRunTokenTotals + sum_trace_tokens fold tool-loop trace tokens into run totals reachable by the generation service.
  • services/base_generation_service.pyEMITS_LEARNING_BILLING flag (true only on profile/playbook generation); emission isolated in try/except.
  • server/_auth.py / server/api.pydefault_get_caller_type + no-op default_billing_gate DI seams; create_app(get_caller_type=, get_billing_gate=); applied-learnings metering wired into the 5 search/read endpoints; get_agent_playbooks gains a 120/minute rate limit.

Isolation guarantees

  • No new dependency on any closed-source / enterprise code.
  • Every metering call site is wrapped so a failure logs and continues — the product response is never affected (covered by test_metering_failure_does_not_break_search_response).
  • All hooks default to no-ops; a deployment opts in via app.dependency_overrides.

Test plan

  • tests/server/test_billing_signals.py, tests/server/test_billing_meter.py — emission helpers + tokenizer (empty-string, encoding-drift guard, special-token handling).
  • tests/server/llm/test_token_accounting.py — None-coercion, empty/missing-trace guards, multi-turn summation.
  • tests/server/api_endpoints/test_applied_learnings_metering.py — per-endpoint surfaced-count metering across all 5 endpoints × {production-agent, dashboard, empty}.
  • tests/server/services/test_generation_billing_emission.py — learning emission gated to profile/playbook generation; eval pipeline does not bill.
  • 36 metering/billing tests pass; ruff + pyright clean; import reflexio verified.

Summary by CodeRabbit

  • New Features

    • Metering for learnings (generated and applied) with per-run token accounting, optional generation billing, and configurable per-line billing gates with overridable caller-type classification.
    • Usage events now include prompt/completion/input token counts, platform/caller context, and accept request/session correlation IDs for billing attribution; app factory can toggle mounting of data-plane routes.
  • Tests

    • Added comprehensive tests for applied‑learnings metering, token accounting, billing signal detection, generation billing emission, and app router mounting behavior.

@coderabbitai

coderabbitai Bot commented Jun 5, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds OSS billing primitives (token counting, token totals, billing-meter wrappers), extends UsageEvent, passes token totals from extractors to generation, emits learning/extraction usage events from opted-in services, and meters applied-learnings in search endpoints with tests.

Changes

OSS Billing Infrastructure

Layer / File(s) Summary
Usage event schema and billing helpers
reflexio/server/usage_metrics.py, reflexio/server/billing_meter.py, tests/server/test_billing_meter.py
UsageEvent gains optional token/platform/caller fields; record_usage_event accepts them; billing helpers (record_extraction_tokens, record_learnings_generated, record_applied_learnings) emit usage events with conditional no-op logic and are unit-tested.
Token accounting and platform signals
reflexio/server/llm/token_accounting.py, reflexio/server/billing_signals.py, tests/server/llm/test_token_accounting.py, tests/server/test_billing_signals.py
RunTokenTotals accumulates prompt/completion tokens with None coercion; sum_trace_tokens aggregates trace turns; count_input_tokens uses a canonical encoding; platform_llm_from_config infers platform-vs-BYO LLM from API key config.
Extraction outcome token propagation
reflexio/server/services/extraction/outcome.py, reflexio/server/services/playbook/playbook_extractor.py, reflexio/server/services/profile/profile_extractor.py
ExtractionOutcome includes optional token_totals; extractors compute and attach trace token totals to completed outcomes.
Generation service learning billing
reflexio/server/services/base_generation_service.py, reflexio/server/services/profile/profile_generation_service.py, reflexio/server/services/playbook/playbook_generation_service.py, tests/server/services/test_generation_billing_emission.py
BaseGenerationService adds EMITS_LEARNING_BILLING opt-in, captures per-run provider totals, reconstructs extractor input to count input tokens, and emits learnings_generated and extraction_tokens events; Profile/Playbook services opt in and integration tests cover emit/skip/opt-out.
API caller classification and applied-learnings metering
reflexio/server/_auth.py, reflexio/server/api.py, tests/server/api_endpoints/test_applied_learnings_metering.py
Adds default_get_caller_type() and cached default_billing_gate(line) for OSS, exposes create_app overrides (get_caller_type, get_billing_gate), injects caller_type and per-line billing gates into endpoints, and meters applied-learnings for production_agent callers while swallowing billing failures; endpoints updated to call _meter_applied_learnings.
create_app mount tests
tests/server/test_create_app.py
Tests that mount_data_plane toggles data-plane routes and that additional routers passed to create_app are mounted.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • ReflexioAI/reflexio#42: Both PRs extend UsageEvent and record_usage_event, which this PR relies on for billing/event emission.
  • ReflexioAI/reflexio#101: Overlaps on BaseGenerationService orchestration and extractor changes intersecting with token_totals and billing emission plumbing.

Poem

"I'm a rabbit with a ledger and a hop,
Counting tokens as I bop,
Learnings logged for agents bright,
OSS gates keep billing light,
Hooray — metrics hop into the night!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 65.05% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding OSS metering foundation with usage signals and dependency-injection seams for billing/authorization customization without enterprise code dependency.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch billing/metering-foundation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/server/api_endpoints/test_applied_learnings_metering.py (1)

41-43: ⚡ Quick win

Add one test path for the get_billing_gate override seam.

These tests cover get_caller_type, but not the identity-based default_billing_gate(...) override path that this PR introduces for enterprise enforcement. A small fixture extension here to pass get_billing_gate into create_app() and assert it fires on at least one "application" route and /api/publish_interaction would protect the core DI contract from silently regressing.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/server/api_endpoints/test_applied_learnings_metering.py` around lines
41 - 43, Add a test path that exercises the identity-based billing override by
extending the _client fixture to accept and pass a get_billing_gate callable
into create_app (similar to how get_caller_type is passed), then add at least
one test that uses _client(caller_type="application",
get_billing_gate=your_stub) to hit an "application" route and POST to
/api/publish_interaction and assert your_stub (the default_billing_gate
override) was invoked; update any test helpers that construct TestClient
(function _client and usages of create_app and TestClient) so the new parameter
propagates to the app.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/server/api_endpoints/test_applied_learnings_metering.py`:
- Around line 41-43: Add a test path that exercises the identity-based billing
override by extending the _client fixture to accept and pass a get_billing_gate
callable into create_app (similar to how get_caller_type is passed), then add at
least one test that uses _client(caller_type="application",
get_billing_gate=your_stub) to hit an "application" route and POST to
/api/publish_interaction and assert your_stub (the default_billing_gate
override) was invoked; update any test helpers that construct TestClient
(function _client and usages of create_app and TestClient) so the new parameter
propagates to the app.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 6cac5196-fa2c-4cad-8135-f48b90dc3a69

📥 Commits

Reviewing files that changed from the base of the PR and between 3f13f00 and 49c30c8.

📒 Files selected for processing (17)
  • reflexio/server/_auth.py
  • reflexio/server/api.py
  • reflexio/server/billing_meter.py
  • reflexio/server/billing_signals.py
  • reflexio/server/llm/token_accounting.py
  • reflexio/server/services/base_generation_service.py
  • reflexio/server/services/extraction/outcome.py
  • reflexio/server/services/playbook/playbook_extractor.py
  • reflexio/server/services/playbook/playbook_generation_service.py
  • reflexio/server/services/profile/profile_extractor.py
  • reflexio/server/services/profile/profile_generation_service.py
  • reflexio/server/usage_metrics.py
  • tests/server/api_endpoints/test_applied_learnings_metering.py
  • tests/server/llm/test_token_accounting.py
  • tests/server/services/test_generation_billing_emission.py
  • tests/server/test_billing_meter.py
  • tests/server/test_billing_signals.py

@yilu331 yilu331 force-pushed the billing/metering-foundation branch from 49c30c8 to 05fe010 Compare June 5, 2026 17:29

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
reflexio/server/api.py (1)

2976-2978: 💤 Low value

Loop variable _line underscore prefix is misleading.

The underscore prefix conventionally indicates an unused variable, but _line is used in the loop body. Consider renaming to line for clarity.

♻️ Suggested fix
     if get_billing_gate is not None:
-        for _line in ("application", "learnings_generated"):
-            app.dependency_overrides[default_billing_gate(_line)] = get_billing_gate(_line)
+        for line in ("application", "learnings_generated"):
+            app.dependency_overrides[default_billing_gate(line)] = get_billing_gate(line)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@reflexio/server/api.py` around lines 2976 - 2978, The loop uses a misleading
variable name `_line` (underscore suggests unused) while it's actually used;
rename `_line` to `line` in the loop that iterates over ("application",
"learnings_generated") and update its usages inside the block where you call
get_billing_gate(line) and default_billing_gate(line) so the code reads clearly;
references: get_billing_gate, default_billing_gate, and
app.dependency_overrides.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@reflexio/server/api.py`:
- Around line 2976-2978: The loop uses a misleading variable name `_line`
(underscore suggests unused) while it's actually used; rename `_line` to `line`
in the loop that iterates over ("application", "learnings_generated") and update
its usages inside the block where you call get_billing_gate(line) and
default_billing_gate(line) so the code reads clearly; references:
get_billing_gate, default_billing_gate, and app.dependency_overrides.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: a5f7a7c4-5830-4a48-a53c-146a5c77ce02

📥 Commits

Reviewing files that changed from the base of the PR and between 49c30c8 and 05fe010.

📒 Files selected for processing (17)
  • reflexio/server/_auth.py
  • reflexio/server/api.py
  • reflexio/server/billing_meter.py
  • reflexio/server/billing_signals.py
  • reflexio/server/llm/token_accounting.py
  • reflexio/server/services/base_generation_service.py
  • reflexio/server/services/extraction/outcome.py
  • reflexio/server/services/playbook/playbook_extractor.py
  • reflexio/server/services/playbook/playbook_generation_service.py
  • reflexio/server/services/profile/profile_extractor.py
  • reflexio/server/services/profile/profile_generation_service.py
  • reflexio/server/usage_metrics.py
  • tests/server/api_endpoints/test_applied_learnings_metering.py
  • tests/server/llm/test_token_accounting.py
  • tests/server/services/test_generation_billing_emission.py
  • tests/server/test_billing_meter.py
  • tests/server/test_billing_signals.py
✅ Files skipped from review due to trivial changes (1)
  • reflexio/server/services/profile/profile_generation_service.py
🚧 Files skipped from review as they are similar to previous changes (11)
  • reflexio/server/services/playbook/playbook_generation_service.py
  • reflexio/server/llm/token_accounting.py
  • reflexio/server/billing_signals.py
  • tests/server/test_billing_signals.py
  • tests/server/test_billing_meter.py
  • tests/server/llm/test_token_accounting.py
  • reflexio/server/_auth.py
  • reflexio/server/billing_meter.py
  • tests/server/services/test_generation_billing_emission.py
  • reflexio/server/services/extraction/outcome.py
  • reflexio/server/services/profile/profile_extractor.py

yilu331 added 15 commits June 6, 2026 08:58
…he generation service

- Add RunTokenTotals dataclass and sum_trace_tokens() in server/llm/token_accounting.py
- Add optional token_totals field to ExtractionOutcome (backward-compatible, last + defaulted)
- Thread trace tokens through ProfileExtractor and PlaybookExtractor into ExtractionOutcome
- Stash self._last_token_totals in BaseGenerationService._execute_extractor (mirrors _last_extractor_run_stats)
…otals fallback

Reset _last_token_totals to None at the same point _last_extraction_run_ids is
cleared, so a timeout or non-ExtractionOutcome exit path cannot leave the
previous run's totals to be billed against the next operation.

Add a comment on the `token_totals or self._last_resumable_trace` expression in
profile_extractor.py to explain why `or` is safe: a RunTokenTotals dataclass
instance is always truthy, so a real-but-zero total is never silently discarded.
…run skip charges nothing

- Add count_input_tokens(text) to billing_signals.py (cl100k_base canonical tokenizer).
- Add _record_billing_learning_events() and _extraction_input_text() to
  BaseGenerationService: emit extraction_tokens + learnings_generated on the
  success path only; the should_run skip path emits nothing on the learning line.
- Add test_generation_billing_emission.py verifying both emission and skip paths.
Add default_get_caller_type() to _auth.py (returns "internal"; no reflexio_ext
import). Extend create_app() with an optional get_caller_type param that wires
the override via dependency_overrides, mirroring the existing get_org_id pattern
(backward-compatible — existing callers unchanged).

Also add reflexio/server/tracing.py with configure_tracer() / profile_step()
so enterprise storage and api modules that already import this module can resolve
it (was a pre-existing missing file blocking enterprise imports).
Add ``default_billing_gate(line)`` factory to ``_auth.py`` — returns a stable
no-op FastAPI dependency via ``@cache`` so FastAPI's ``dependency_overrides``
keying works reliably across all endpoints.  Add ``get_billing_gate`` factory
param to ``create_app`` that overrides the no-op sentinel for each billing line
when supplied.

Wire gate dependencies on billable endpoints:
- Learning line (``"learnings_generated"``): ``/api/publish_interaction``
- Application line (``"application"``): ``/api/search``, ``/api/search_profiles``,
  ``/api/search_user_playbooks``, ``/api/search_agent_playbooks``,
  ``/api/get_agent_playbooks``

OSS keeps the no-op default; enterprise overrides via create_app factory param.
… cap

The endpoint was listed in _BILLABLE_ENDPOINT_NAMES but lacked a
@limiter.limit decorator, so slowapi's _check_request_limit never ran
for it and apply_qps_limits was dead code for this route.

Add @limiter.limit("120/minute") (consistent with the other read/search
siblings) and add the required request: Request parameter (renaming the
body param to payload), so the programmatic per-token QPS cap from
apply_qps_limits now takes effect.
…F016

- F009: fix I001 import order in playbook_extractor
- F003: drop always-None session_id kwarg from learning billing events (no session_id source on generation path)
- F016: rename _last_resumable_trace -> _last_resumable_token_totals
- F007: add unit tests for token_accounting + count_input_tokens
- F008: add per-endpoint applied-learnings metering tests (4 endpoints x 3 scenarios)
@yilu331 yilu331 force-pushed the billing/metering-foundation branch from 05fe010 to 6193e21 Compare June 6, 2026 08:59

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
reflexio/server/llm/token_accounting.py (1)

28-29: 💤 Low value

Optional: simplify int() wrapper.

The or 0 fallback already produces an int, so wrapping in int() is redundant. This is harmless and defensive, but if you prefer minimal code:

-        self.prompt_tokens += int(prompt_tokens or 0)
-        self.completion_tokens += int(completion_tokens or 0)
+        self.prompt_tokens += prompt_tokens or 0
+        self.completion_tokens += completion_tokens or 0
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@reflexio/server/llm/token_accounting.py` around lines 28 - 29, The two lines
updating token counters use redundant int() wrapping around the fallback "or 0";
update the assignments in the method where self.prompt_tokens and
self.completion_tokens are incremented so they simply add (prompt_tokens or 0)
and (completion_tokens or 0) respectively, removing the unnecessary int() calls
while still preserving the existing fallback behavior and attributes
self.prompt_tokens / self.completion_tokens.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@reflexio/server/llm/token_accounting.py`:
- Around line 28-29: The two lines updating token counters use redundant int()
wrapping around the fallback "or 0"; update the assignments in the method where
self.prompt_tokens and self.completion_tokens are incremented so they simply add
(prompt_tokens or 0) and (completion_tokens or 0) respectively, removing the
unnecessary int() calls while still preserving the existing fallback behavior
and attributes self.prompt_tokens / self.completion_tokens.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 52c2ec49-71ba-4063-8a3f-afb2badd49df

📥 Commits

Reviewing files that changed from the base of the PR and between 05fe010 and 6193e21.

📒 Files selected for processing (17)
  • reflexio/server/_auth.py
  • reflexio/server/api.py
  • reflexio/server/billing_meter.py
  • reflexio/server/billing_signals.py
  • reflexio/server/llm/token_accounting.py
  • reflexio/server/services/base_generation_service.py
  • reflexio/server/services/extraction/outcome.py
  • reflexio/server/services/playbook/playbook_extractor.py
  • reflexio/server/services/playbook/playbook_generation_service.py
  • reflexio/server/services/profile/profile_extractor.py
  • reflexio/server/services/profile/profile_generation_service.py
  • reflexio/server/usage_metrics.py
  • tests/server/api_endpoints/test_applied_learnings_metering.py
  • tests/server/llm/test_token_accounting.py
  • tests/server/services/test_generation_billing_emission.py
  • tests/server/test_billing_meter.py
  • tests/server/test_billing_signals.py
🚧 Files skipped from review as they are similar to previous changes (11)
  • reflexio/server/services/playbook/playbook_generation_service.py
  • tests/server/llm/test_token_accounting.py
  • reflexio/server/services/profile/profile_extractor.py
  • tests/server/test_billing_signals.py
  • reflexio/server/billing_signals.py
  • reflexio/server/services/profile/profile_generation_service.py
  • reflexio/server/services/playbook/playbook_extractor.py
  • reflexio/server/_auth.py
  • tests/server/test_billing_meter.py
  • reflexio/server/billing_meter.py
  • tests/server/api_endpoints/test_applied_learnings_metering.py

yilu331 added 3 commits June 6, 2026 17:04
…g + cleanups

- Add optional request_id/session_id to the 5 search request models so the
  Application-line usage events can actually carry per-request attribution
  (Pydantic extra='ignore' was silently dropping client-sent values).
- Remove the unreachable ExtractionOutcome branch in profile_extractor.
- Guard platform_llm_from_config against a non-mapping model_dump.
…okens

On a should-run-gated generation the interaction window was fetched twice — once
by the gate (_collect_scoped_interactions_for_precheck) and again by the billing
path (_extraction_input_text) purely to recompute billing_input_tokens. The
billing path now reuses the window the gate already fetched (stashed per run),
falling back to its own fetch on the bypass paths (force_extraction / skip-check
/ non-auto) where the gate doesn't run. Reusing the earlier window is also more
accurate (it's exactly what the extractor saw). The metered token count is
byte-identical — same query params, same formatting — guarded by an equivalence
test asserting reuse-path == refetch-path token counts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant