langpred

Predict your agent's behavior — then act on it, before bad outcomes land. Drop-in for Langfuse: keep your SDK, redirect ingestion at us, and get pre-emptive intervention that Langfuse can't give you because Langfuse only looks backward.

$ python examples/08_action_menu.py
ACTION 1: reject-upfront   →  cohort p90 $0.030 > budget $0.001. Don't start.
ACTION 2: route-at-start   →  cohort cost p90 = $0.030 → routing to sonnet
ACTION 3: alert webhook    →  POST https://hooks/slack on risk.loop_risk > 0.7
ACTION 4: scope-reduce     →  callback fired; agent shrinks max_tokens. Same model.
ACTION 5: hard kill        →  BudgetExceeded raised; agent loop bails out cleanly.

The moat: act before the bad outcome

	Langfuse	langpred
When you learn it cost $500	After the run	Step 3 of 50
What you can do about it	Adjust next quote	Stop, reduce scope, alert, reroute, refuse to start
Customer experience	Surprise bill	Bounded bill, calibrated ETA, fixed quote

Information without the ability to act is just expensive observability. langpred predicts and gives you a menu of pre-emptive actions for every outcome the prediction flags.

The five actions

Each one matches a different time-in-trace and confidence level:

Action	When	What it does	KV cache
reject-upfront	Before trace starts	Decline the request; predicted cost or risk too high	n/a (no trace)
route-at-start	Before trace starts	Pick the cheapest viable model based on the cohort	n/a (no trace)
alert	Any step	POST a webhook (Slack / PagerDuty / yours) on a threshold	preserved
scope-reduce	Mid-trace	Signal the agent to shrink remaining work (max_tokens etc.)	preserved
kill	Any step (early is best)	SDK raises `BudgetExceeded`; agent loop exits	invalidated

⚠️ We don't ship mid-trace model downgrade. Switching models invalidates Anthropic prompt caching and breaks chain-of-thought coherence — the math usually loses. Use route-at-start instead; pick the model before any KV state has been built.

Code for each action

1. reject-upfront — don't even start

forecast = lp.forecast(trace_name="research_agent")
if forecast.cost.usd_total_p90 > customer.budget:
    refuse_quote(reason=f"predicted cost ${forecast.cost.usd_total_p90:.2f}")

2. route-at-start — pick the model upfront

forecast = lp.forecast(trace_name="research_agent")
model = "claude-haiku-4-5" if forecast.cost.usd_total_p90 < 0.01 \
   else "claude-sonnet-4-6" if forecast.cost.usd_total_p90 < 0.10 \
   else "claude-opus-4-7"
trace = lp.trace(name="research_agent")   # then run with `model`

3. alert — fire a webhook on any threshold

trace.alert_when("cost.usd_total_p90 > 1.00", webhook_url="https://hooks.slack.com/...")
trace.alert_when("risk.loop_risk > 0.70",    webhook_url="https://events.pagerduty.com/...")
trace.alert_when("time.remaining_seconds_p90 > 300", webhook_url="...")

Webhooks POST {trace_id, condition, value, threshold, prediction}. Re-fires at most once per 30s by default.

4. scope-reduce — shrink the work, keep the model

trace.on_scope_reduce(lambda: agent.set_max_tokens(256))   # or skip optional steps
trace.set_budget(usd=0.50, on_exceed="scope_reduce")       # callback fires on breach

KV cache and chain-of-thought stay intact; the agent just does less.

5. kill — hard stop

with trace.set_budget(usd=0.50, on_exceed="kill"):
    agent.run()       # raises BudgetExceeded when predicted_total > $0.50

What it predicts — five dimensions, one round-trip

	answers
time	total / remaining seconds, next-step time, compute vs I/O split
cost	total / remaining USD, next-step USD, per-model breakdown
resources	tokens (prompt + completion), steps remaining, per-tool call counts
next action	distribution over `{generation, tool_call, end}`, top-k next tools, likely model
risk	`off-rails` · `loop` · `context-overflow` · `budget-overshoot` · `cost-spike`

All five from one trace.predict() call against a shared kNN cohort — internally consistent.

Migrate from Langfuse — one line

Zero code change — point your env var at us:

export LANGFUSE_HOST=http://localhost:7187

Your existing langfuse.Langfuse() calls now write to Langpred; predictions appear immediately.

One import change — get the action menu in code:

- from langfuse import Langfuse
+ from langpred.langfuse_compat import Langfuse

Same constructor, same trace / span / generation / @observe / flush. New: lp.forecast(), trace.predict(), trace.alert_when(), trace.on_scope_reduce(), trace.set_budget().

✓ Validated against the real Langfuse SDK. tests/test_real_langfuse_integration.py runs the actual pip-installed langfuse package against a live Langpred server. CI catches upstream envelope changes.

Install — 30 seconds

git clone https://github.com/memovai/langpred && cd langpred
pip install -e ./server -e ./sdk-python
uvicorn langpred_server.main:app --port 7187 &
open http://localhost:7187/ui/
python examples/08_action_menu.py     # walks through all 5 actions live

SQLite by default. No Postgres, no Redis, no GPU.

Eight runnable examples

	shows
`01_migrate_from_langfuse.py`	The exact one-import diff
`02_budget_guard.py`	Hard kill mid-loop
`03_eta_in_ui.py`	Live ETA per step
`04_upfront_pricing.py`	Scout-then-quote
`05_next_action.py`	Predict next tool / model
`06_resource_forecast.py`	Full 5-dimension forecast
`07_real_langfuse_sdk.py`	Driven by the real Langfuse SDK
`08_action_menu.py`	All 5 pre-emptive actions in one script

How it works (30 seconds)

Every Langfuse event becomes part of a trajectory. For a partial trace we find the k=20 nearest finished trajectories (16-dim prefix L2; same-trace.name halved) and aggregate them five ways at once:

final outcomes → quantile bands (time / cost / resources)
step at prefix_len+1 → next-action distribution
per-tool histograms → expected remaining tool calls
per-model cost split → upfront routing target
status field → off-rails risk

For pre-trace forecast(), we skip the kNN step and aggregate the same-name cohort directly. After 1000 finished traces per shape, gradient-boosted quantile regressors auto-promote for scalar dimensions while distributions stay kNN. No GPU. Full architecture: DESIGN.md.

Status


Tests	28 / 28 green (+ optional real-Langfuse SDK integration when installed)
Cost MAE	$0.10 on synthetic benchmark
p90 coverage	79% (conformal calibration is on deck)
Local UI	`http://localhost:7187/ui/` — trace list, detail timeline, prediction cards, budgets, forecast
Stack	FastAPI + SQLite + scikit-learn · Python ≥3.10 · no GPU

Roadmap: OTel ingest · TypeScript SDK · conformal calibration · optional proxy add-on for hard budget enforcement.

Next steps

Read the design: DESIGN.md — pain-point research, why kNN-then-GBM, why no mid-trace downgrade.
Run the action menu live: python examples/08_action_menu.py after starting the server.
Integrate: change one env var; predictions and the action menu both land immediately.

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
examples		examples
sdk-python		sdk-python
server		server
tests		tests
.env.example		.env.example
.gitignore		.gitignore
DESIGN.md		DESIGN.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

langpred

The moat: act before the bad outcome

The five actions

Code for each action

1. reject-upfront — don't even start

2. route-at-start — pick the model upfront

3. alert — fire a webhook on any threshold

4. scope-reduce — shrink the work, keep the model

5. kill — hard stop

What it predicts — five dimensions, one round-trip

Migrate from Langfuse — one line

Install — 30 seconds

Eight runnable examples

How it works (30 seconds)

Status

Next steps

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

langpred

The moat: act before the bad outcome

The five actions

Code for each action

1. reject-upfront — don't even start

2. route-at-start — pick the model upfront

3. alert — fire a webhook on any threshold

4. scope-reduce — shrink the work, keep the model

5. kill — hard stop

What it predicts — five dimensions, one round-trip

Migrate from Langfuse — one line

Install — 30 seconds

Eight runnable examples

How it works (30 seconds)

Status

Next steps

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages