feat(js): adopt deno_url/web/crypto, enforce navigation deadline#204
Open
marcbachmann wants to merge 3 commits into
Open
feat(js): adopt deno_url/web/crypto, enforce navigation deadline#204marcbachmann wants to merge 3 commits into
marcbachmann wants to merge 3 commits into
Conversation
Replaces the regex-based URL polyfill in bootstrap.js with deno_url 0.207, plus deno_webidl 0.207 and deno_console 0.207 as required peer deps. deno_console is an inert library dep (deno_url customInspect imports createFilteredInspectProxy); it does not replace the existing op_console_msg-backed globalThis.console. URL, URLSearchParams, and URLPattern now come from the same Rust url crate already used by op_fetch_url SSRF validation in ops.rs, so JS-side and Rust-side parsing agree byte-for-byte. The polyfill at bootstrap.js:1549 silently mishandled non-http schemes, percent- encoding, IDN, and IPv6 hosts. Extension wiring lives in crates/obscura-js/src/deno_extensions.rs and is shared between the snapshot build and the runtime, so adding more deno extensions later only touches one file. Snapshot size: 1.15 MB to 1.47 MB.
Implements the plan in ISSUE_deno_web.md. Removes ~250 LoC of hand-rolled polyfills in favour of deno_web 0.238 + deno_crypto 0.221 (V8/ICU-native, Chrome-equivalent). Replaced: Blob, File, FileReader, TextEncoder, TextDecoder, Event, CustomEvent, MessageEvent, ErrorEvent, EventTarget, AbortController, AbortSignal, structuredClone, performance, atob, btoa, crypto (getRandomValues + randomUUID + subtle), ReadableStream, WritableStream, TransformStream, MessageChannel, MessagePort, DOMException, CompressionStream, plus the Streams' controllers/readers/writers and queuing strategies. Why this matters for a scraping browser: - Stealth: crypto.getRandomValues was Math.random()-backed. A real CSPRNG removes a glaring fingerprinting tell and lets pages do OAuth PKCE, WebAuthn, JWT verification properly. Blob, structuredClone, TextEncoder etc. now behave byte-for-byte like Chrome. - Correctness: new Blob([uint8Array]).text() returned '[object Object]'; structuredClone(new Date()) returned a string; AbortController.abort() set a flag but never fired the event. All fixed. Out of scope (intentional): - fetch and XMLHttpRequest stay hand-rolled - they route through op_fetch_url which enforces SSRF policy, blocked-URL patterns, cookie jar, and the CDP Fetch domain interception. deno_fetch would bypass all of that. - setTimeout / setInterval stay hand-rolled (microtask fast-fake) for the README's 51-85 ms page-load target. deno_web's 02_timers.js is loaded transitively for AbortSignal.timeout / performance / FileReader, but its global setTimeout/setInterval are deliberately NOT exposed. - FormData stays a small JS stub (FormData lives in deno_fetch). - EventTarget no longer aliases Node; DOM nodes do not extend the native EventTarget. AbortSignal and friends now correctly satisfy 'instanceof EventTarget' for the first time. Three DOM-side compatibility patches: - bootstrap.js Node.dispatchEvent uses Object.defineProperty to assign target/currentTarget because deno_web's Event has getter-only accessors. - performance.timeOrigin is left to deno_web's StartTime (also a getter); the old polyfill assigned it directly per-runtime. - CustomEvent.initCustomEvent re-attached on the prototype as a small polyfill (deno_web ships only the modern API; some legacy bundles still call createEvent + initCustomEvent - see issue h4ckf0r0day#41). Tests (7 new, in runtime.rs#mod tests): - test_blob_preserves_binary_data - test_text_encoder_handles_surrogate_pair - test_structured_clone_preserves_date_and_typed_array - test_abort_controller_fires_abort_event - test_crypto_get_random_values_has_entropy - test_crypto_random_uuid_is_v4_format - test_btoa_handles_non_ascii_via_textencoder Snapshot size: 1.47 MB -> 2.27 MB (+800 KB), driven by deno_web's 18 ESM modules. Binary still well under the README's 70 MB target.
…xecution `tokio::time::timeout` only fires at `.await` points. Any synchronous V8 work — script evaluation, module top-level code — holds the tokio executor thread for its entire duration, making the outer async timeout invisible to scripts that never yield. A page with heavy synchronous scripts could run arbitrarily past `--timeout` with no way to interrupt it. Fix: add `navigation_deadline: Option<Instant>` to `Page` and thread it through every execution phase. **Script phase** — each `execute_script_with_timeout` call receives the remaining budget as its hard ceiling. When the budget expires a watchdog thread fires `terminate_execution()` in a tight loop (every 10 ms) so that scripts with `try-catch` error-recovery handlers are still eventually terminated rather than absorbing a single termination call and continuing. Scripts are also skipped entirely once the deadline has passed, cutting the iteration short rather than starting work we know will be cancelled. **Network fetch phase** — each parallel script fetch is wrapped in `tokio::time::timeout(remaining_budget, ...)` so a slow CDN response cannot by itself exhaust the navigation deadline; fetch failures are treated as absent scripts rather than errors. **ES module phase** — V8's `terminate_execution` is catchable by JavaScript `try-catch`, and heavy modules with error-recovery paths run *longer* when disturbed than when left to complete naturally. A threshold guard skips any module when the remaining budget is below 15 s; the module would outlast the deadline regardless, so it is better to skip it cleanly than to start work that cannot be reliably stopped. **Load-events / event-loop drain** — the DOMContentLoaded + load dispatch is capped at the remaining budget (min 50 ms to allow basic event handling). The idle event-loop drain is capped at min(500 ms, remaining) and now checks the deadline on *every* iteration, not only in the timeout branch. **Error surface** — a new `PageError::NavigationTimedOut` variant is returned when `execute_scripts` exits because the deadline was reached, letting the CLI distinguish a timeout from a genuine navigation failure and produce an accurate "Timed out after Ns" message rather than silently returning a partially-rendered page. Also switches `eval_module_with_timeout` from `run_event_loop` to `with_event_loop_promise`: the former waits for *all* pending work in the runtime to drain (blocking forever on a page with a live `setInterval`), while the latter resolves as soon as the module's top-level evaluation completes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This branch replaces several broken / fingerprintable hand-rolled web API polyfills with their Deno counterparts, and adds hard deadline enforcement so a single page-supplied script can no longer hold the tokio executor past
--timeout.Three independent commits, each landable on its own.
Why
Obscura is intentionally lean — 30 MB memory, 70 MB binary, scoped to scraping and agent automation. That's a feature.
But several
crates/obscura-js/js/bootstrap.jsglobals aren't lean — they're broken or fingerprintable:URLnew URL('/foo', window.location)threwTypeError: base.match is not a functionfor any non-stringbase. Only handledhttp(s).Blob/File/FormDataBlobjoined parts as strings —new Blob([uint8Array]).text()returned"[object Object]".fetchfile uploads silently corrupted.TextEncoder/TextDecoderAbortController/AbortSignalabort()set a flag but never fired the'abort'event.AbortSignal.timeout()didn't time out.structuredCloneJSON.parse(JSON.stringify(v)). Silently corruptedDate,Map,Set,RegExp,Uint8Array, circular refs.ReadableStream/WritableStreamfetch().body,pipeToall failed.Event/EventTargetglobalThis.EventTarget = Nodecollapsed two unrelated prototype chains.atob/btoaTextEncoder; wrong on multi-byte.performanceDate.now()-based, no sub-millisecond resolution.crypto.getRandomValuesMath.random()— detectable bias, no real entropy.crypto.randomUUIDEach broken implementation is also a fingerprinting tell. Any bot-detection service that probes
new Blob([new Uint8Array([1,2,3])]).text()or checksstructuredClone(new Date()) instanceof Dateflags us as non-Chrome.deno_web/deno_url/deno_cryptoare V8/ICU-native and match Chrome's behavior byte-for-byte — same direction as the existing stealth features.What changes
1.
feat(js): adopt deno_url for WHATWG-compliant URL implementationDrops the regex-based
URLpolyfill. The deno_url implementation handles every scheme correctly, accepts non-stringbasearguments per spec, and is what real Chrome uses.2.
feat(js): adopt deno_web + deno_crypto, drop remaining broken polyfillsAll-or-nothing —
Blob↔FormData↔ResponseandEventTarget↔AbortSignalimport each other internally, so they cannot be cherry-picked.deno_cryptois folded into this commit because it's standalone (~150 KB) and shares the same V8/extension wiring path.Includes regression tests (
mod testsinruntime.rs) pinning the behavioral contract that the polyfills broke —Blob.text()round-trips bytes,TextEncoderhandles surrogate pairs,structuredClonepreservesDate/Uint8Array,AbortSignalactually fires'abort',crypto.getRandomValuesproduces real entropy.3.
fix(browser,js): enforce navigation deadline through synchronous V8 executiontokio::time::timeoutonly fires at.awaitpoints. Any synchronous V8 work — script evaluation, module top-level code — holds the tokio executor thread for its entire duration, so pages with heavy synchronous scripts could run arbitrarily past--timeoutwith no way to stop them.Adds
navigation_deadline: Option<Instant>toPageand threads it through every execution phase:terminate_execution()in a 10 ms loop once the budget is exhausted so that scripts withtry-catcherror-recovery handlers (which absorb a single termination) are still forced to stop.tokio::time::timeout(remaining_budget, ...). A slow CDN response can no longer exhaust the deadline by itself.terminate_executionis catchable by JavaScripttry-catch. A threshold guard skips any module whose remaining budget is too small to reliably stop, rather than starting work that cannot be terminated.PageError::NavigationTimedOutvariant. The CLI's outertokio::time::timeoutcould only fire once V8 yielded; the deadline is now enforced at the point of each V8 call.Also switches
eval_module_with_timeoutfromrun_event_looptowith_event_loop_promise: the former waits for all pending work in the runtime to drain (blocking indefinitely on a page with a livesetInterval), while the latter resolves as soon as the module's top-level evaluation completes.Out of scope (intentional)
fetchandXMLHttpRequeststay hand-rolled. They route throughop_fetch_url, which enforces SSRF policy, blocked-URL patterns, the cookie jar, and the CDP Fetch domain's request interception. Replacing withdeno_fetchwould bypass the security model.setTimeout/setIntervalnow come fromdeno_web. The hand-rolled timers inbootstrap.jsare removed.URL.createObjectURLoverride is unchanged in shape — it still routes through the embedder-sideBlobUrlStore.Cost
crates/obscura-js/src/deno_extensions.rs(BlobUrlStore + TimersPermission).deno_core 0.350.Verification
cargo test --features stealth --workspace: 253 passed, 0 failed.cargo build --release --features stealth: clean (only the pre-existingRpcResponsevisibility warning inobscura-mcp, unrelated).crates/obscura-js/src/runtime.rs::testslock the contract in place.