Skip to content

security: bound attacker-controlled scan fan-out and rate-limit write race#539

Open
schmug wants to merge 2 commits into
mainfrom
security/dos-and-ratelimit-hardening
Open

security: bound attacker-controlled scan fan-out and rate-limit write race#539
schmug wants to merge 2 commits into
mainfrom
security/dos-and-ratelimit-hardening

Conversation

@schmug

@schmug schmug commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator

Summary

Hardens three DoS / race-condition findings surfaced by a deep security audit. Each is reachable by an unauthenticated request and turns one cheap, rate-limited request into disproportionate backend work. All three are tracked as private draft advisories (linked below).

Finding Advisory Fix
DKIM selector fan-out GHSA-6fqp-4vhc-59mf Cap custom selectors to 16 items × 63 chars in parseSelectors and the MCP parseSelectorsFromArray, before analyzeDkim fires one DNS lookup per selector.
MTA-STS unbounded body GHSA-p676-gc7j-96mx Read the policy via arrayBuffer() truncated to MAX_POLICY_BYTES (64KB), mirroring security-txt. redirect:"manual" posture unchanged.
Rate-limit counter write GHSA-v7qc-7qh8-h69g Await the Cache-API write before returning instead of deferring via waitUntil, closing the intra-isolate read-modify-write window.

Why

The per-IP rate limiter (10/IP/60s) bounds request count, not per-request work. Without these caps, a single allowed request fans out into thousands of DNS lookups (selector list), buffers an unbounded HTTP body (MTA-STS), or — under a concurrent burst — under-counts because the counter write was deferred.

Scope / non-goals

  • The rate-limiter fix closes the intra-isolate window. The Cache API has no atomic increment/CAS, so a cross-isolate/cross-colo race remains; a fully atomic counter (Durable Object or the native Rate Limiting binding) is out of scope and stays tracked in GHSA-v7qc-7qh8-h69g.
  • The MCP inputSchema now advertises maxItems/maxLength, but enforcement is server-side regardless (the schema was, and remains, advisory — handleToolCall does not validate against it).
  • Also gitignores .security-scans/ so point-in-time scan reports are never committed to this public repo.

Testing

  • 1305 passing, 0 failing (full vitest suite, 77 files)
  • tsc --noEmit clean · biome check clean
  • Adds regression tests: selector count + length caps (HTTP and MCP paths), MTA-STS body truncation (content past the cap is never parsed), and the awaited rate-limit write.

Review notes

Touches CODEOWNERS-gated paths (rate limiting, analyzer module, input validation) → requires code-owner approval. Auto-merge intentionally not enabled.

🤖 Generated with Claude Code

… race

Hardens three DoS / race-condition findings surfaced by a deep security
audit and tracked as private draft advisories. All three are reachable by
an unauthenticated request and turn one cheap, rate-limited request into
disproportionate backend work.

- DKIM selector fan-out (GHSA-6fqp-4vhc-59mf): parseSelectors and the MCP
  parseSelectorsFromArray now cap the custom selector list to 16 items and
  63 chars each (RFC 1035 label limit) before analyzeDkim fires one DNS
  lookup per selector. The MCP inputSchema advertises maxItems/maxLength to
  match, but enforcement is server-side regardless (the schema is advisory).

- MTA-STS unbounded body (GHSA-p676-gc7j-96mx): fetchPolicy now reads the
  policy via arrayBuffer() truncated to MAX_POLICY_BYTES (64KB), mirroring
  the ceiling security-txt already applies, so an attacker-controlled
  mta-sts.<domain> can't stream an unbounded body into the isolate. The
  redirect:"manual" posture (RFC 8461 §3.3) is unchanged.

- Rate-limit counter write (GHSA-v7qc-7qh8-h69g): the Cache-API write is
  awaited before returning instead of deferred via waitUntil, closing the
  intra-isolate read-modify-write window the deferral widened. A fully
  atomic cross-isolate counter (Durable Object / native binding) remains
  tracked in the advisory and is out of scope here.

Also gitignores .security-scans/ so point-in-time scan reports are never
committed to this public repo.

Tests: 1305 passing, 0 failing. Typecheck + biome lint clean. Adds
regression tests for each cap and for the awaited rate-limit write.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: schmug <38227427+schmug@users.noreply.github.com>
@cloudflare-workers-and-pages

Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
✅ Deployment successful!
View logs
dmarcheck 2379a8f Jun 13 2026, 04:54 PM

schmug added a commit that referenced this pull request Jun 13, 2026
Orchestrator-level DoS umbrella for GHSA-f828-8wf8-vqp2. scan()/scanStreaming()
previously awaited Promise.all across all analyzers with only a per-query 3s DNS
timeout — no aggregate cap on total DNS queries or wall-clock. A single request
combining a large DKIM selector list with a rua/ruf-stuffed _dmarc record on an
attacker-controlled domain could drive a large outbound-DNS burst and long
wall-clock on one rate-limit token.

Adds two backstops, both threaded through every analyzer:
- ScanBudget (src/dns/scan-budget.ts): one shared per-scan DNS-query pool that
  every queryTxt/queryMx/queryDoh draws from; exhaustion throws a DnsLookupError
  subclass so analyzers degrade to "could not verify" instead of crashing.
- Overall deadline: one AbortController + setTimeout; each settled analyzer is
  raced against it, degrading to its synthetic fallback on a breach. The budget
  also holds the signal, so no new query is issued past the deadline.

Defaults (DEFAULT_SCAN_LIMITS): 150 queries / 12s — generous for real
multi-analyzer scans, overridable via an optional `limits` param (tests).
Preserves the #378 per-analyzer settle contract and scanStreaming SSE semantics
(every protocol still streams exactly once). DnsLookupError moved to
src/dns/errors.ts so the budget can subclass it without depending on the DNS
client (which tests mock).

Refs GHSA-f828-8wf8-vqp2. Umbrella over the per-analyzer caps in #539.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: schmug <38227427+schmug@users.noreply.github.com>
…imit write-race fix (superseded by #549)

The Cache-API await fix here (GHSA-v7qc-7qh8-h69g) is only a partial mitigation; #549 fully supersedes it with an atomic Durable Object rate limiter and deletes the checkRateLimitCache path this touched. Revert the rate-limit.ts change, the rateLimitMiddleware hunk in index.ts, and the f30 test so this PR scopes cleanly to the per-analyzer caps that nothing else duplicates: DKIM selector count/length (GHSA-6fqp-4vhc-59mf) and MTA-STS policy body size (GHSA-p676-gc7j-96mx). Avoids a same-line conflict with #549's middleware rewrite.

Signed-off-by: schmug <38227427+schmug@users.noreply.github.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
schmug added a commit that referenced this pull request Jun 13, 2026
…8-h69g fix) (#549)

* security: atomic Durable Object rate limiter (GHSA-v7qc-7qh8-h69g)

Replace the non-atomic Cache-API rate-limit counter with a per-identity
Durable Object. The Cache API has no atomic increment/CAS, so a concurrent
burst under one identity could each read the same stale count and write
count+1, letting the effective ceiling exceed the configured limit (free
10/60s, pro 60/3600s). The DO's single-threaded RPC serializes the
read-modify-write, closing the window across isolates and colos.

Completes the partial mitigation from PR #539, which only awaited the
Cache-API write to close the intra-isolate window.

- src/rate-limit-do.ts: new RateLimiterDO (SQLite-backed, synchronous
  read-modify-write); one instance per `ip:<x>` / `user:<id>` bucket.
- src/rate-limit.ts: checkRateLimit threads an optional DO namespace and
  routes to it; in-memory path kept as graceful fallback when the binding
  is absent (self-host, Node test pool). Cache-API path removed.
- src/index.ts: re-export the DO class from the entry module; middleware
  passes c.env.RATE_LIMITER.
- wrangler.toml / src/env.ts: RATE_LIMITER binding + v1 sqlite migration.
- test/integration/rate-limit-do.test.ts: N concurrent increments reach
  exactly N with no burst bypass; tiers, headers, isolation, window roll.
- vitest.config.ts / test/shims: Node-pool stub for cloudflare:workers.

Preserves RateLimitResult fields and X-RateLimit-* headers. Does NOT use
the native Rate Limiting binding (no remaining/resetAt; 60s-period cap
breaks the pro 3600s tier).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: schmug <38227427+schmug@users.noreply.github.com>

* docs: correct stale THREAT_MODEL rate-limit claims (E2/T9)

The threat model still described `/mcp` as having "no rate-limit middleware"
and being "unmetered", but every scan-triggering route already carries
`rateLimitMiddleware` (`/check`, `/api/check`, `/api/bulk-scan`, SSE, `/badge`,
`/mcp`, `/api/domain/*`). `/mx/:slug` is a static provider page that never
triggers a scan, so it correctly has no limiter.

- E2: note /mcp is rate-limited per-IP (anon bucket), bearer still not required.
- T9: reframe residual from "unmetered routes" to IP-rotation; list the metered
  routes; keep partially_mitigated (botnet IP rotation remains).
- Remove the resolved "/mcp rate limiting?" open question.
- Mark the "apply rateLimitMiddleware to every scan route" mitigation done.

No code change; documentation accuracy only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: schmug <38227427+schmug@users.noreply.github.com>

---------

Signed-off-by: schmug <38227427+schmug@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant