Skip to content

security: atomic Durable Object rate limiter (completes GHSA-v7qc-7qh8-h69g fix)#549

Merged
schmug merged 2 commits into
mainfrom
claude/heuristic-mendel-580560
Jun 13, 2026
Merged

security: atomic Durable Object rate limiter (completes GHSA-v7qc-7qh8-h69g fix)#549
schmug merged 2 commits into
mainfrom
claude/heuristic-mendel-580560

Conversation

@schmug

@schmug schmug commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator

Summary

Completes the fix for the confirmed rate-limiter race condition tracked in the private advisory GHSA-v7qc-7qh8-h69g by replacing the non-atomic Cache-API counter with a per-identity Durable Object atomic counter.

PR #539 shipped a partial mitigation — it awaits the Cache-API write before returning, closing only the intra-isolate window. This PR implements the complete fix: an atomic primitive that holds across isolates and colos.

The problem

checkRateLimitCache was a non-atomic read-modify-write on the Cloudflare Cache API: cache.matchcount++cache.put. The Cache API offers no atomic increment/CAS, so a concurrent burst under one identity could all read the same stale count and each write count + 1, letting the effective ceiling exceed the configured limit (free 10/60s, pro 60/3600s).

The fix

  • src/rate-limit-do.ts — new RateLimiterDO (SQLite-backed). One instance per identity bucket (ip:<x> / user:<id>) via getByName. The increment(limit, windowSec) RPC does the whole read-modify-write synchronously, so the DO's single-threaded execution serializes overlapping calls — no interleaving, no lost updates.
  • src/rate-limit.tscheckRateLimit(identity, config, namespace?) routes to the DO when the binding is present and falls back to the existing in-memory limiter when it isn't (self-host deploys that strip the binding, and the Node test pool). The racy Cache-API path and pendingWrite are removed.
  • src/index.ts — re-exports the DO class from the Worker entry (required for the binding to resolve class_name); middleware passes c.env.RATE_LIMITER.
  • wrangler.toml / src/env.tsRATE_LIMITER binding + v1 new_sqlite_classes migration.
  • vitest.config.ts / test/shims/ — Node-pool stub for the cloudflare:workers virtual module (the Node tests import src/index.ts, which now pulls in the DO base class; the Workers pool uses the real module).

Why not the native Rate Limiting binding? It returns only { success } (no remaining/resetAt for the X-RateLimit-* headers) and supports only 10s/60s periods, which breaks the pro 3600s tier. A Durable Object is the correct primitive.

Testing (TDD — failing concurrency test first)

test/integration/rate-limit-do.test.ts runs in the real workerd runtime:

  • Atomicity / no burst bypass: 30 simultaneous increments to one identity → the stored count reaches exactly 30 with no lost updates, and exactly limit are allowed.
  • Both tiers (free 10/60, pro 60/3600), X-RateLimit-* headers, resetAt stability within a window, ip: vs user: bucket isolation, and the window-roll reset branch (via runInDurableObject).

Gates (fresh run):

  • npm test1307 passed (78 files, incl. 8 new DO tests)
  • npm run typecheckpass (exit 0)
  • npm run lintpass (161 files, no diagnostics)
  • wrangler deploy --dry-run → binding recognized as env.RATE_LIMITER (RateLimiterDO), migration accepted

Acceptance criteria

  • Concurrency test fires N simultaneous checkRateLimit calls for one identity and asserts the stored count reaches N (no burst bypass)
  • Headers and both tiers behave as before for sequential requests
  • wrangler.toml has the DO binding + migration; typecheck passes against the generated binding type
  • Existing rate-limit tests still pass (unchanged — they force the in-memory path)

Notes

  • Touches CODEOWNERS-gated paths (wrangler.toml, rate limiting) — expects code-owner review; auto-merge intentionally not enabled.
  • The advisory GHSA-v7qc-7qh8-h69g remains private/unpublished.

Refs: GHSA-v7qc-7qh8-h69g · completes #539

Co-Authored-By: Claude Opus 4.8 (1M context) noreply@anthropic.com

schmug and others added 2 commits June 13, 2026 14:07
Replace the non-atomic Cache-API rate-limit counter with a per-identity
Durable Object. The Cache API has no atomic increment/CAS, so a concurrent
burst under one identity could each read the same stale count and write
count+1, letting the effective ceiling exceed the configured limit (free
10/60s, pro 60/3600s). The DO's single-threaded RPC serializes the
read-modify-write, closing the window across isolates and colos.

Completes the partial mitigation from PR #539, which only awaited the
Cache-API write to close the intra-isolate window.

- src/rate-limit-do.ts: new RateLimiterDO (SQLite-backed, synchronous
  read-modify-write); one instance per `ip:<x>` / `user:<id>` bucket.
- src/rate-limit.ts: checkRateLimit threads an optional DO namespace and
  routes to it; in-memory path kept as graceful fallback when the binding
  is absent (self-host, Node test pool). Cache-API path removed.
- src/index.ts: re-export the DO class from the entry module; middleware
  passes c.env.RATE_LIMITER.
- wrangler.toml / src/env.ts: RATE_LIMITER binding + v1 sqlite migration.
- test/integration/rate-limit-do.test.ts: N concurrent increments reach
  exactly N with no burst bypass; tiers, headers, isolation, window roll.
- vitest.config.ts / test/shims: Node-pool stub for cloudflare:workers.

Preserves RateLimitResult fields and X-RateLimit-* headers. Does NOT use
the native Rate Limiting binding (no remaining/resetAt; 60s-period cap
breaks the pro 3600s tier).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: schmug <38227427+schmug@users.noreply.github.com>
The threat model still described `/mcp` as having "no rate-limit middleware"
and being "unmetered", but every scan-triggering route already carries
`rateLimitMiddleware` (`/check`, `/api/check`, `/api/bulk-scan`, SSE, `/badge`,
`/mcp`, `/api/domain/*`). `/mx/:slug` is a static provider page that never
triggers a scan, so it correctly has no limiter.

- E2: note /mcp is rate-limited per-IP (anon bucket), bearer still not required.
- T9: reframe residual from "unmetered routes" to IP-rotation; list the metered
  routes; keep partially_mitigated (botnet IP rotation remains).
- Remove the resolved "/mcp rate limiting?" open question.
- Mark the "apply rateLimitMiddleware to every scan route" mitigation done.

No code change; documentation accuracy only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: schmug <38227427+schmug@users.noreply.github.com>
schmug added a commit that referenced this pull request Jun 13, 2026
…imit write-race fix (superseded by #549)

The Cache-API await fix here (GHSA-v7qc-7qh8-h69g) is only a partial mitigation; #549 fully supersedes it with an atomic Durable Object rate limiter and deletes the checkRateLimitCache path this touched. Revert the rate-limit.ts change, the rateLimitMiddleware hunk in index.ts, and the f30 test so this PR scopes cleanly to the per-analyzer caps that nothing else duplicates: DKIM selector count/length (GHSA-6fqp-4vhc-59mf) and MTA-STS policy body size (GHSA-p676-gc7j-96mx). Avoids a same-line conflict with #549's middleware rewrite.

Signed-off-by: schmug <38227427+schmug@users.noreply.github.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@schmug schmug merged commit 6847099 into main Jun 13, 2026
5 checks passed
@schmug schmug deleted the claude/heuristic-mendel-580560 branch June 13, 2026 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant