feat(billing): OSS metering foundation (usage signals + DI seams)#126
feat(billing): OSS metering foundation (usage signals + DI seams)#126yilu331 wants to merge 18 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds OSS billing primitives (token counting, token totals, billing-meter wrappers), extends UsageEvent, passes token totals from extractors to generation, emits learning/extraction usage events from opted-in services, and meters applied-learnings in search endpoints with tests. ChangesOSS Billing Infrastructure
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/server/api_endpoints/test_applied_learnings_metering.py (1)
41-43: ⚡ Quick winAdd one test path for the
get_billing_gateoverride seam.These tests cover
get_caller_type, but not the identity-baseddefault_billing_gate(...)override path that this PR introduces for enterprise enforcement. A small fixture extension here to passget_billing_gateintocreate_app()and assert it fires on at least one"application"route and/api/publish_interactionwould protect the core DI contract from silently regressing.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/server/api_endpoints/test_applied_learnings_metering.py` around lines 41 - 43, Add a test path that exercises the identity-based billing override by extending the _client fixture to accept and pass a get_billing_gate callable into create_app (similar to how get_caller_type is passed), then add at least one test that uses _client(caller_type="application", get_billing_gate=your_stub) to hit an "application" route and POST to /api/publish_interaction and assert your_stub (the default_billing_gate override) was invoked; update any test helpers that construct TestClient (function _client and usages of create_app and TestClient) so the new parameter propagates to the app.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/server/api_endpoints/test_applied_learnings_metering.py`:
- Around line 41-43: Add a test path that exercises the identity-based billing
override by extending the _client fixture to accept and pass a get_billing_gate
callable into create_app (similar to how get_caller_type is passed), then add at
least one test that uses _client(caller_type="application",
get_billing_gate=your_stub) to hit an "application" route and POST to
/api/publish_interaction and assert your_stub (the default_billing_gate
override) was invoked; update any test helpers that construct TestClient
(function _client and usages of create_app and TestClient) so the new parameter
propagates to the app.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 6cac5196-fa2c-4cad-8135-f48b90dc3a69
📒 Files selected for processing (17)
reflexio/server/_auth.pyreflexio/server/api.pyreflexio/server/billing_meter.pyreflexio/server/billing_signals.pyreflexio/server/llm/token_accounting.pyreflexio/server/services/base_generation_service.pyreflexio/server/services/extraction/outcome.pyreflexio/server/services/playbook/playbook_extractor.pyreflexio/server/services/playbook/playbook_generation_service.pyreflexio/server/services/profile/profile_extractor.pyreflexio/server/services/profile/profile_generation_service.pyreflexio/server/usage_metrics.pytests/server/api_endpoints/test_applied_learnings_metering.pytests/server/llm/test_token_accounting.pytests/server/services/test_generation_billing_emission.pytests/server/test_billing_meter.pytests/server/test_billing_signals.py
49c30c8 to
05fe010
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
reflexio/server/api.py (1)
2976-2978: 💤 Low valueLoop variable
_lineunderscore prefix is misleading.The underscore prefix conventionally indicates an unused variable, but
_lineis used in the loop body. Consider renaming tolinefor clarity.♻️ Suggested fix
if get_billing_gate is not None: - for _line in ("application", "learnings_generated"): - app.dependency_overrides[default_billing_gate(_line)] = get_billing_gate(_line) + for line in ("application", "learnings_generated"): + app.dependency_overrides[default_billing_gate(line)] = get_billing_gate(line)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@reflexio/server/api.py` around lines 2976 - 2978, The loop uses a misleading variable name `_line` (underscore suggests unused) while it's actually used; rename `_line` to `line` in the loop that iterates over ("application", "learnings_generated") and update its usages inside the block where you call get_billing_gate(line) and default_billing_gate(line) so the code reads clearly; references: get_billing_gate, default_billing_gate, and app.dependency_overrides.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@reflexio/server/api.py`:
- Around line 2976-2978: The loop uses a misleading variable name `_line`
(underscore suggests unused) while it's actually used; rename `_line` to `line`
in the loop that iterates over ("application", "learnings_generated") and update
its usages inside the block where you call get_billing_gate(line) and
default_billing_gate(line) so the code reads clearly; references:
get_billing_gate, default_billing_gate, and app.dependency_overrides.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: a5f7a7c4-5830-4a48-a53c-146a5c77ce02
📒 Files selected for processing (17)
reflexio/server/_auth.pyreflexio/server/api.pyreflexio/server/billing_meter.pyreflexio/server/billing_signals.pyreflexio/server/llm/token_accounting.pyreflexio/server/services/base_generation_service.pyreflexio/server/services/extraction/outcome.pyreflexio/server/services/playbook/playbook_extractor.pyreflexio/server/services/playbook/playbook_generation_service.pyreflexio/server/services/profile/profile_extractor.pyreflexio/server/services/profile/profile_generation_service.pyreflexio/server/usage_metrics.pytests/server/api_endpoints/test_applied_learnings_metering.pytests/server/llm/test_token_accounting.pytests/server/services/test_generation_billing_emission.pytests/server/test_billing_meter.pytests/server/test_billing_signals.py
✅ Files skipped from review due to trivial changes (1)
- reflexio/server/services/profile/profile_generation_service.py
🚧 Files skipped from review as they are similar to previous changes (11)
- reflexio/server/services/playbook/playbook_generation_service.py
- reflexio/server/llm/token_accounting.py
- reflexio/server/billing_signals.py
- tests/server/test_billing_signals.py
- tests/server/test_billing_meter.py
- tests/server/llm/test_token_accounting.py
- reflexio/server/_auth.py
- reflexio/server/billing_meter.py
- tests/server/services/test_generation_billing_emission.py
- reflexio/server/services/extraction/outcome.py
- reflexio/server/services/profile/profile_extractor.py
…he generation service - Add RunTokenTotals dataclass and sum_trace_tokens() in server/llm/token_accounting.py - Add optional token_totals field to ExtractionOutcome (backward-compatible, last + defaulted) - Thread trace tokens through ProfileExtractor and PlaybookExtractor into ExtractionOutcome - Stash self._last_token_totals in BaseGenerationService._execute_extractor (mirrors _last_extractor_run_stats)
…otals fallback Reset _last_token_totals to None at the same point _last_extraction_run_ids is cleared, so a timeout or non-ExtractionOutcome exit path cannot leave the previous run's totals to be billed against the next operation. Add a comment on the `token_totals or self._last_resumable_trace` expression in profile_extractor.py to explain why `or` is safe: a RunTokenTotals dataclass instance is always truthy, so a real-but-zero total is never silently discarded.
…run skip charges nothing - Add count_input_tokens(text) to billing_signals.py (cl100k_base canonical tokenizer). - Add _record_billing_learning_events() and _extraction_input_text() to BaseGenerationService: emit extraction_tokens + learnings_generated on the success path only; the should_run skip path emits nothing on the learning line. - Add test_generation_billing_emission.py verifying both emission and skip paths.
Add default_get_caller_type() to _auth.py (returns "internal"; no reflexio_ext import). Extend create_app() with an optional get_caller_type param that wires the override via dependency_overrides, mirroring the existing get_org_id pattern (backward-compatible — existing callers unchanged). Also add reflexio/server/tracing.py with configure_tracer() / profile_step() so enterprise storage and api modules that already import this module can resolve it (was a pre-existing missing file blocking enterprise imports).
…isolate emit from success path
Add ``default_billing_gate(line)`` factory to ``_auth.py`` — returns a stable no-op FastAPI dependency via ``@cache`` so FastAPI's ``dependency_overrides`` keying works reliably across all endpoints. Add ``get_billing_gate`` factory param to ``create_app`` that overrides the no-op sentinel for each billing line when supplied. Wire gate dependencies on billable endpoints: - Learning line (``"learnings_generated"``): ``/api/publish_interaction`` - Application line (``"application"``): ``/api/search``, ``/api/search_profiles``, ``/api/search_user_playbooks``, ``/api/search_agent_playbooks``, ``/api/get_agent_playbooks`` OSS keeps the no-op default; enterprise overrides via create_app factory param.
… cap
The endpoint was listed in _BILLABLE_ENDPOINT_NAMES but lacked a
@limiter.limit decorator, so slowapi's _check_request_limit never ran
for it and apply_qps_limits was dead code for this route.
Add @limiter.limit("120/minute") (consistent with the other read/search
siblings) and add the required request: Request parameter (renaming the
body param to payload), so the programmatic per-token QPS cap from
apply_qps_limits now takes effect.
…F016 - F009: fix I001 import order in playbook_extractor - F003: drop always-None session_id kwarg from learning billing events (no session_id source on generation path) - F016: rename _last_resumable_trace -> _last_resumable_token_totals - F007: add unit tests for token_accounting + count_input_tokens - F008: add per-endpoint applied-learnings metering tests (4 endpoints x 3 scenarios)
05fe010 to
6193e21
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
reflexio/server/llm/token_accounting.py (1)
28-29: 💤 Low valueOptional: simplify
int()wrapper.The
or 0fallback already produces anint, so wrapping inint()is redundant. This is harmless and defensive, but if you prefer minimal code:- self.prompt_tokens += int(prompt_tokens or 0) - self.completion_tokens += int(completion_tokens or 0) + self.prompt_tokens += prompt_tokens or 0 + self.completion_tokens += completion_tokens or 0🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@reflexio/server/llm/token_accounting.py` around lines 28 - 29, The two lines updating token counters use redundant int() wrapping around the fallback "or 0"; update the assignments in the method where self.prompt_tokens and self.completion_tokens are incremented so they simply add (prompt_tokens or 0) and (completion_tokens or 0) respectively, removing the unnecessary int() calls while still preserving the existing fallback behavior and attributes self.prompt_tokens / self.completion_tokens.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@reflexio/server/llm/token_accounting.py`:
- Around line 28-29: The two lines updating token counters use redundant int()
wrapping around the fallback "or 0"; update the assignments in the method where
self.prompt_tokens and self.completion_tokens are incremented so they simply add
(prompt_tokens or 0) and (completion_tokens or 0) respectively, removing the
unnecessary int() calls while still preserving the existing fallback behavior
and attributes self.prompt_tokens / self.completion_tokens.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 52c2ec49-71ba-4063-8a3f-afb2badd49df
📒 Files selected for processing (17)
reflexio/server/_auth.pyreflexio/server/api.pyreflexio/server/billing_meter.pyreflexio/server/billing_signals.pyreflexio/server/llm/token_accounting.pyreflexio/server/services/base_generation_service.pyreflexio/server/services/extraction/outcome.pyreflexio/server/services/playbook/playbook_extractor.pyreflexio/server/services/playbook/playbook_generation_service.pyreflexio/server/services/profile/profile_extractor.pyreflexio/server/services/profile/profile_generation_service.pyreflexio/server/usage_metrics.pytests/server/api_endpoints/test_applied_learnings_metering.pytests/server/llm/test_token_accounting.pytests/server/services/test_generation_billing_emission.pytests/server/test_billing_meter.pytests/server/test_billing_signals.py
🚧 Files skipped from review as they are similar to previous changes (11)
- reflexio/server/services/playbook/playbook_generation_service.py
- tests/server/llm/test_token_accounting.py
- reflexio/server/services/profile/profile_extractor.py
- tests/server/test_billing_signals.py
- reflexio/server/billing_signals.py
- reflexio/server/services/profile/profile_generation_service.py
- reflexio/server/services/playbook/playbook_extractor.py
- reflexio/server/_auth.py
- tests/server/test_billing_meter.py
- reflexio/server/billing_meter.py
- tests/server/api_endpoints/test_applied_learnings_metering.py
…g + cleanups - Add optional request_id/session_id to the 5 search request models so the Application-line usage events can actually carry per-request attribution (Pydantic extra='ignore' was silently dropping client-sent values). - Remove the unreachable ExtractionOutcome branch in profile_extractor. - Guard platform_llm_from_config against a non-mapping model_dump.
…okens On a should-run-gated generation the interaction window was fetched twice — once by the gate (_collect_scoped_interactions_for_precheck) and again by the billing path (_extraction_input_text) purely to recompute billing_input_tokens. The billing path now reuses the window the gate already fetched (stashed per run), falling back to its own fetch on the bypass paths (force_extraction / skip-check / non-auto) where the gate doesn't run. Reusing the earlier window is also more accurate (it's exactly what the extractor saw). The metered token count is byte-identical — same query params, same formatting — guarded by an equivalence test asserting reuse-path == refetch-path token counts.
Summary
Adds the OSS metering foundation that the (closed-source) billing layer builds on. This is the open-source half: it emits structured usage signals and exposes clean dependency-injection seams so a downstream deployment can attach billing/enforcement without the OSS core depending on any of it. With no overrides registered, behavior is unchanged — every hook is a no-op and metering is fully isolated from the request success path.
Three value signals are emitted as
UsageEvents:cl100k_basecount) + learnings generated. Metered only for profile/playbook generation; eval/reflection pipelines do not bill.default_get_caller_type, no-op"internal"in OSS).Changes
billing_signals.py— single source of the platform-LLM rule + the canonical billing tokenizer (count_input_tokens).billing_meter.py— emission helpers; the one source of usage-event shape.usage_metrics.py—UsageEventgains token/attribution/caller fields;record_usage_eventswallows recorder failures so metering can never 500 a request.llm/token_accounting.py—RunTokenTotals+sum_trace_tokensfold tool-loop trace tokens into run totals reachable by the generation service.services/base_generation_service.py—EMITS_LEARNING_BILLINGflag (true only on profile/playbook generation); emission isolated in try/except.server/_auth.py/server/api.py—default_get_caller_type+ no-opdefault_billing_gateDI seams;create_app(get_caller_type=, get_billing_gate=); applied-learnings metering wired into the 5 search/read endpoints;get_agent_playbooksgains a120/minuterate limit.Isolation guarantees
test_metering_failure_does_not_break_search_response).app.dependency_overrides.Test plan
tests/server/test_billing_signals.py,tests/server/test_billing_meter.py— emission helpers + tokenizer (empty-string, encoding-drift guard, special-token handling).tests/server/llm/test_token_accounting.py— None-coercion, empty/missing-trace guards, multi-turn summation.tests/server/api_endpoints/test_applied_learnings_metering.py— per-endpoint surfaced-count metering across all 5 endpoints × {production-agent, dashboard, empty}.tests/server/services/test_generation_billing_emission.py— learning emission gated to profile/playbook generation; eval pipeline does not bill.ruff+pyrightclean;import reflexioverified.Summary by CodeRabbit
New Features
Tests