diff --git a/CLAUDE.md b/CLAUDE.md index 272c167..607583e 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -36,7 +36,7 @@ Live at dmarc.mx | Repo: github.com/schmug/dmarcheck - `src/views/` — HTML generation via template literals (styles.ts, scripts.ts, components.ts, html.ts, favicon.ts) - `components.ts` — `generateCreature(size, mood, partyHat?)` helper and `gradeToMood()` mapping - `markdown.ts` — markdown renderings served when `Accept: text/markdown` (landing, /check report, /scoring, /learn, /docs/api) -- `src/rate-limit.ts` — Cache API-based rate limiter (10 req/IP/60s) +- `src/rate-limit.ts` — per-identity rate limiter (free 10/60s, pro 60/3600s). Primary path is an atomic Durable Object counter (`src/rate-limit-do.ts` `RateLimiterDO`, bound as `RATE_LIMITER`); its single-threaded RPC serializes increments so a concurrent burst under one identity can't exceed the ceiling (GHSA-v7qc-7qh8-h69g — replaced a non-atomic Cache-API read-modify-write). `checkRateLimit(identity, config, namespace?)` falls back to the in-memory limiter when the binding is absent (self-host deploys, Node test pool) ## Agent discovery diff --git a/THREAT_MODEL.md b/THREAT_MODEL.md index 7a846b8..a11130a 100644 --- a/THREAT_MODEL.md +++ b/THREAT_MODEL.md @@ -53,7 +53,7 @@ flowchart LR end subgraph worker["dmarcheck Worker — trust boundary (Cloudflare edge)"] - rl["Rate limiter / cache (E9)"] + rl["Rate limiter (E9)"] scan["Scan API + orchestrator (E1, E2)"] auth["Auth & session (E5)"] dash["Dashboard / history CRUD (E6)"] @@ -93,14 +93,14 @@ flowchart LR | entry_point | description | trust_boundary | reachable_assets | |---|---|---|---| | E1 — Public scan API (`/check`, `/api/check`, `/api/check/stream`, `/badge`, `/mx/:slug`) | Attacker controls `?domain`, `?selectors`, `?format`, `Accept`; drives DNS lookups + HTML/JSON/CSV/SSE rendering | unauth HTTP → app logic; Worker → upstream DNS | grade integrity, service availability | -| E2 — MCP handler (`POST /mcp` `scan_domain`) | Arbitrary JSON-RPC body; `domain`/`dkim_selectors` drive a full scan. No bearer requirement and **no rate-limit middleware** (contrast `/check`) | unauth HTTP → Worker → DNS/HTTP | service availability, grade integrity | +| E2 — MCP handler (`POST /mcp` `scan_domain`) | Arbitrary JSON-RPC body; `domain`/`dkim_selectors` drive a full scan. No bearer requirement, but rate-limited per-IP by `rateLimitMiddleware` (same anon bucket as `/api/check`, `src/index.ts` `app.use("/mcp", …)`) | unauth HTTP → Worker → DNS/HTTP | service availability, grade integrity | | E3 — Analyzer outbound fetch (MTA-STS, security.txt, BIMI) | Scanned domain interpolated into upstream HTTPS URLs; MTA-STS uses `redirect: "manual"`, security.txt uses `redirect: "follow"` | Worker → attacker-named upstream HTTP | internal network, service integrity | | E4 — Outbound webhook dispatch | Fetches a Pro user's saved `webhook.url`; save path validates only `protocol === "https:"` | authenticated user → Worker outbound to arbitrary host | internal network, service integrity | | E5 — Auth & session (session cookie JWT, bearer API key, Cloudflare Access JWT) | HS256 session HMAC + exp; `dmk_` API key SHA-256 lookup; `jose` RS256 Access JWT (preview only, fail-closed) | unauth → authenticated identity | all authenticated assets | | E6 — Dashboard CRUD + history/bulk-scan APIs (D1, per-user) | Authenticated reads/writes scoped by `WHERE user_id = ?` / `getDomainByUserAndName` | authenticated session → another user's data | scan history, API keys, user/billing data | | E7 — Stripe webhook (`POST /webhooks/stripe`) | Raw-body HMAC-SHA256 verify, 5-min skew, event-id idempotency, then mutates subscription state | unauth internet → billing state mutation | subscription state, billing data | | E8 — HTML report rendering (`src/views/*`) | User/DNS-derived values interpolated into template-literal HTML | scan data → rendered HTML in a viewer's browser | viewer session, grade integrity | -| E9 — Rate limiter / cache | Keyed on `CF-Connecting-IP` (`ip:`) or `user:`; Cache API store with in-memory fallback | spoofable identity / shared cache key | service availability | +| E9 — Rate limiter | Keyed on `CF-Connecting-IP` (`ip:`) or `user:`; per-identity Durable Object atomic counter (`RateLimiterDO`, single-threaded RPC) with in-memory fallback when the binding is absent | spoofable identity | service availability | | E10 — CI/CD + deploy (GitHub Actions: ci, codeql, migrate, release, deploy-mta-sts; Cloudflare Git integration) | `pull_request` on a public repo; `main`-gated jobs hold prod D1 / deploy / release tokens | PR/main → CI runner → prod | infra tokens, prod D1, releases | | E11 — Autonomous-routine PR merge path | External routine identity opens + auto-merges PRs; CODEOWNERS + fail-closed gate | external automation → `main` | analyzers, orchestration, scoring, CI | @@ -116,7 +116,7 @@ flowchart LR | T6 | Secret or PII exposure via logs or error responses | remote_unauth | E1, E5, E7 | secrets, user/billing data | high | possible | unmitigated | Sentry capture; no documented scrubbing audit | | | T7 | Billing privilege escalation (free → paid) via forged or replayed Stripe webhook | remote_unauth | E7 | subscription state | high | rare | partially_mitigated | raw-body HMAC-SHA256 verify, constant-time compare, 5-min skew, event-id idempotency | | | T8 | Supply-chain / CI compromise escalating to prod D1 write or deploy | supply_chain | E10 | prod D1, infra tokens, releases | high | rare | partially_mitigated | SHA-pinned actions, ubuntu-latest only, explicit `permissions:` blocks, secrets only on `main`-gated jobs | | -| T9 | Rate-limit bypass → DNS amplification / scan abuse via unauthenticated, unmetered `/mcp` and non-`/check` scan routes | remote_unauth | E2, E9 | service availability, upstream DNS | medium | likely | partially_mitigated | `CF-Connecting-IP` keying on `/check`; XFF no longer trusted | #71, #123, #59 | +| T9 | Rate-limit bypass → DNS amplification / scan abuse: an unauthenticated caller rotating source IPs earns a fresh per-IP bucket on each scan route | remote_unauth | E2, E9 | service availability, upstream DNS | medium | possible | partially_mitigated | every scan-triggering route carries `rateLimitMiddleware` (`/check`, `/api/check`, `/api/bulk-scan`, SSE `/api/check/stream`, `/badge`, `/mcp`, `/api/domain/*`); `CF-Connecting-IP` keying (XFF no longer trusted); per-identity Durable Object atomic counter closes the Cache-API read-modify-write burst-bypass window (GHSA-v7qc-7qh8-h69g). Residual: IP-rotation (botnet) still gets per-IP buckets | #71, #123, #59, GHSA-v7qc-7qh8-h69g | | T10 | Stored/reflected XSS via unescaped scan data rendered into the HTML report | remote_unauth | E8, E1 | viewer session, grade integrity | medium | possible | partially_mitigated | `esc()` on interpolated values; per-request CSP nonce + `strict-dynamic`; `default-src 'none'` | #59, #281, 0fc81e2 | | T11 | Denial of service via DNS resource exhaustion or scan-abort on attacker-controlled domains | remote_unauth | E1, E3 | service availability | medium | possible | partially_mitigated | SPF lookup-limit early-exit; per-analyzer failure isolation (one analyzer error can't abort the scan); `DnsLookupError` catch on external lookups | #90, #354 | | T12 | Login CSRF / OAuth-flow tampering | remote_unauth | E5 | user session | medium | rare | mitigated | OAuth `state` cookie (HttpOnly/Secure/SameSite=Lax) + strict callback match | #150 | @@ -138,9 +138,6 @@ flowchart LR - **Webhook SSRF posture (T2):** Is the outbound-webhook feature intended to reach arbitrary user hosts, or should it enforce a public-IP/host allowlist and `redirect: "manual"`? Does the dispatch fetch currently follow redirects? -- **`/mcp` rate limiting (T9):** Is the unauthenticated MCP scan path - intentionally exempt from `rateLimitMiddleware`, or an oversight? Same - question for `/badge` and `/mx/:slug`. - **Bot-identity split (T5):** Has #299 landed? Until the routine runs as a non-admin identity, the CODEOWNERS gate is advisory (admin bypasses the ruleset). @@ -163,7 +160,7 @@ flowchart LR | mitigation | threat_ids | closes_class | effort | |---|---|---|---| | Enforce a public-host allowlist + `redirect: "manual"` + private-IP/DNS-rebinding guard on all server-side fetches built from user input | T2, T3 | partial | M | -| Apply `rateLimitMiddleware` to every scan-triggering route (`/mcp`, `/badge`, `/mx/:slug`, SSE) — centralize "any route that performs a DNS scan is rate-limited" | T9, T11 | yes | S | +| ✅ Done — `rateLimitMiddleware` applied to every scan-triggering route (`/check`, `/api/check`, `/api/bulk-scan`, SSE `/api/check/stream`, `/badge`, `/mcp`, `/api/domain/*`); `/mx/:slug` is a static provider page (no scan, no limiter needed) | T9, T11 | yes | S | | Centralize per-user row scoping in a query helper so no handler can issue an unscoped read/write of a tenant-owned table | T4 | yes | M | | Keep all HTML interpolation behind `esc()` and the CSP nonce; lint/block raw user input inside inline `