Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -606,6 +606,7 @@ human_bytes = "0.4.1"
html5ever = "0.27.0"
http = "1.1"
http-body = "1.0"
httparse = "1.10"
idna = "1.0"
ignore = "0.4.22"
image = "0.25.1"
Expand Down
4 changes: 4 additions & 0 deletions crates/http_proxy/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,11 @@ path = "src/http_proxy.rs"

[dependencies]
anyhow.workspace = true
base64.workspace = true
futures.workspace = true
httparse.workspace = true
idna.workspace = true
log.workspace = true
percent-encoding.workspace = true
proxyvars.workspace = true
thiserror.workspace = true
Expand Down
58 changes: 48 additions & 10 deletions crates/http_proxy/src/http_proxy.rs
Original file line number Diff line number Diff line change
@@ -1,17 +1,55 @@
//! Hostname-allowlisting primitives for confining sandboxed network access.
//! In-process HTTP/HTTPS proxy that enforces a hostname allowlist.
//!
//! This crate grows over a short stack of PRs:
//! Spawned per terminal command from the parent process. The sandbox is
//! configured to permit network only to this proxy's port; everything the
//! sandboxed command tries to reach the network for has to come through here.
//!
//! - [`allowlist`]: the policy types ([`HostPattern`], [`Allowlist`]) that
//! decide which hosts a sandboxed command may reach.
//! - [`UpstreamProxy`]: parsing an upstream HTTP proxy from the environment
//! (`HTTPS_PROXY` / `NO_PROXY` etc.) to chain through.
//! - the proxy server itself (next): an in-process HTTP/HTTPS proxy that
//! enforces an [`Allowlist`] and is the only network egress a sandboxed
//! command is permitted.
//! The proxy:
//!
//! - Speaks HTTP CONNECT for HTTPS tunnels and HTTP forward proxying for
//! plain HTTP. Other protocols cannot reach it (the seatbelt rule limits
//! the sandboxed process to this one TCP destination, and this proxy only
//! speaks HTTP).
//! - Checks the destination hostname against an allowlist of exact hostnames
//! and leading-`*.` subdomain wildcards. Unless the allowlist allows any
//! host, IP-literal targets are denied, and hostnames whose DNS resolves
//! only into loopback / private / link-local space are denied too
//! (DNS-rebinding protection — the proxy runs outside the sandbox, so it
//! must not reopen the local network the Seatbelt rule closed off).
//! - Pins each TCP connection to the destination approved for its first
//! request: directly (to the vetted resolved addresses) or via a CONNECT
//! tunnel through an optional upstream HTTP proxy from the parent's
//! environment (`HTTPS_PROXY` / `HTTP_PROXY`), honoring `NO_PROXY`. Plain
//! HTTP is also tunneled when chaining, so keep-alive requests after the
//! first can never be routed to a different host by the upstream.
//! - Reports per-connection events (allowed, denied, completed) over an
//! mpsc supplied by the caller.
//!
//! ## Trust assumptions
//!
//! The proxy's sole client is model-driven code running inside the sandbox —
//! exactly the party the sandbox distrusts — and the proxy itself runs inside
//! the editor process. It therefore caps request header sizes and concurrent
//! connections, and bounds connect/handshake waits with timeouts, so a
//! malicious command can't exhaust the editor's memory, threads, or file
//! descriptors through it. Bandwidth is deliberately not capped; the
//! command's lifetime bounds it.
//!
//! ## "No proxy here" principle
//!
//! The agent and tools running inside the sandbox should not need to know
//! that a proxy is in front of them. The only response code the proxy
//! synthesizes itself is `511 Network Authentication Required`, used solely
//! for policy denials (with `Via:` and `Proxy-Status:` headers and a
//! plain-text body explaining the policy decision). Other failure modes
//! (upstream connection failure, malformed input from the client, etc.) are
//! handled by silently closing the connection — same behavior the client
//! would see from a direct network failure, no proxy fingerprint.

mod allowlist;
mod proxy;

pub use allowlist::{Allowlist, HostPattern, HostPatternError};
pub use proxy::UpstreamProxy;
pub use proxy::{
DenyReason, ProxyConfig, ProxyEvent, ProxyHandle, RequestMethod, RequestOutcome, UpstreamProxy,
};
283 changes: 280 additions & 3 deletions crates/http_proxy/src/proxy.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,284 @@
//! The proxy module. For now it holds only the upstream-proxy configuration
//! type; the proxy server (listener, connection handling) lands in a later
//! PR.
//! The proxy itself: listener, connection handlers, upstream chaining.
//!
//! All synchronous, thread-per-connection. `ProxyHandle::spawn` binds a
//! `std::net::TcpListener` on `127.0.0.1:0` and returns once the listener
//! is bound and the listener thread has been spawned. Drop the handle to
//! shut everything down — the listener thread stops accepting new
//! connections; in-flight connection threads finish on their own when
//! either side closes.
//!
//! See the crate-level docs for trust assumptions and the "no proxy here"
//! principle.

mod connection;
mod upstream;

use crate::allowlist::Allowlist;
use anyhow::{Context, Result};
use futures::channel::mpsc;
use std::net::{Ipv4Addr, TcpListener, TcpStream};
use std::sync::Arc;
use std::sync::atomic::{AtomicBool, AtomicUsize, Ordering};
use std::thread;

/// Cap on concurrently handled connections. Each connection costs the
/// editor process two threads and two pump buffers; the cap keeps a
/// runaway (or malicious) sandboxed command from exhausting the editor's
/// thread/fd budget. Well above what parallel package managers open.
const MAX_CONCURRENT_CONNECTIONS: usize = 256;

pub use upstream::UpstreamProxy;

/// Configuration for spawning a proxy.
#[derive(Debug, Clone)]
pub struct ProxyConfig {
/// Hosts the proxy will allow to be reached.
pub allowlist: Allowlist,
/// Optional upstream HTTP proxy to chain through, with `NO_PROXY`-style
/// bypasses for hosts that should connect direct.
pub upstream: Option<UpstreamProxy>,
/// Where the proxy reports per-connection events. Use
/// [`mpsc::unbounded`] so connection threads (which are sync) never
/// block on send. The receiver is async-friendly so `gpui` / `tokio`
/// callers can poll it from their executor of choice.
pub events: mpsc::UnboundedSender<ProxyEvent>,
}

/// A request method seen by the proxy.
///
/// Either a CONNECT (HTTPS tunnel) or an HTTP forward request.
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum RequestMethod {
Connect,
Http(String),
}

impl RequestMethod {
pub fn as_str(&self) -> &str {
match self {
RequestMethod::Connect => "CONNECT",
RequestMethod::Http(method) => method.as_str(),
}
}
}

/// Outcome of a single connection's policy decision.
#[derive(Debug, Clone)]
pub enum RequestOutcome {
Allowed,
Denied { reason: DenyReason },
}

/// Why an attempted connection was denied.
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum DenyReason {
/// Hostname (in punycode form on the wire) wasn't in the allowlist.
HostNotInAllowlist { host: String },
/// CONNECT or HTTP request targeted an IP literal. Denied unless the
/// allowlist allows any host.
IpLiteralRejected { target: String },
/// The hostname resolved only to loopback / private / link-local
/// addresses, which the sandbox policy never reaches via the allowlist
/// (DNS-rebinding protection). Not applied when the allowlist allows
/// any host.
ResolvedToForbiddenIp { host: String },
}

impl DenyReason {
pub(crate) fn proxy_status_error(&self) -> &'static str {
match self {
DenyReason::HostNotInAllowlist { .. } => "destination_ip_prohibited",
DenyReason::IpLiteralRejected { .. } => "destination_ip_prohibited",
DenyReason::ResolvedToForbiddenIp { .. } => "destination_ip_prohibited",
}
}

pub(crate) fn human_explanation(&self) -> String {
match self {
DenyReason::HostNotInAllowlist { host } => {
format!("host '{host}' is not in this conversation's network allowlist")
}
DenyReason::IpLiteralRejected { target } => format!(
"target '{target}' is an IP literal; only hostnames are permitted by sandbox policy"
),
DenyReason::ResolvedToForbiddenIp { host } => format!(
"host '{host}' resolves only to loopback/private/link-local addresses, \
which sandbox policy blocks"
),
}
}
}

/// Events emitted by the proxy as it handles connections.
#[derive(Debug, Clone)]
pub enum ProxyEvent {
/// Sent once after the listener is bound. Always the first event for
/// a given proxy instance.
Ready { port: u16 },

/// Emitted at policy-decision time, before bytes flow to the upstream.
RequestAttempt {
host: String,
port: u16,
method: RequestMethod,
outcome: RequestOutcome,
},

/// Emitted after an `Allowed` connection finishes. Carries throughput
/// totals for diagnostics. Not emitted for denied connections.
RequestCompleted {
host: String,
port: u16,
method: RequestMethod,
bytes_to_remote: u64,
bytes_from_remote: u64,
duration_ms: u64,
},
}

/// Handle to a running proxy. Drop to stop the listener; in-flight
/// connection threads finish on their own as soon as either side closes.
pub struct ProxyHandle {
port: u16,
/// Listener thread sees this flip to `true` after `accept` returns and
/// then exits.
shutdown: Arc<AtomicBool>,
/// Joined on drop to make shutdown deterministic in tests; ignored if
/// the listener has already exited.
listener_thread: Option<thread::JoinHandle<()>>,
}

impl ProxyHandle {
/// Spawns the proxy: binds a listener on `127.0.0.1:0`, spawns the
/// listener thread, sends a `Ready` event, and returns. The returned
/// port is what callers should use for `HTTPS_PROXY`/`HTTP_PROXY` env
/// vars and for the seatbelt rule narrowing `localhost:<port>`.
pub fn spawn(config: ProxyConfig) -> Result<ProxyHandle> {
let listener = TcpListener::bind((Ipv4Addr::LOCALHOST, 0))
.context("failed to bind proxy listener on 127.0.0.1:0")?;
let port = listener
.local_addr()
.context("failed to read proxy local addr")?
.port();

// Inform the parent the proxy is ready before starting the accept
// loop. Send is fire-and-forget on an unbounded channel — never
// blocks, never errors meaningfully.
let _ = config.events.unbounded_send(ProxyEvent::Ready { port });

let shutdown = Arc::new(AtomicBool::new(false));
let runtime_state = Arc::new(RuntimeState {
allowlist: config.allowlist,
upstream: config.upstream,
events: config.events,
active_connections: AtomicUsize::new(0),
});

let listener_thread = thread::Builder::new()
.name("http-proxy-listener".to_string())
// Listener thread does almost nothing on its stack — accept,
// spawn, loop. 128 KiB is plenty.
.stack_size(128 * 1024)
.spawn({
let shutdown = shutdown.clone();
move || run_listener(listener, runtime_state, shutdown)
})
.context("failed to spawn proxy listener thread")?;

Ok(ProxyHandle {
port,
shutdown,
listener_thread: Some(listener_thread),
})
}

/// The bound port. Stable for the lifetime of this handle.
pub fn port(&self) -> u16 {
self.port
}
}

impl Drop for ProxyHandle {
fn drop(&mut self) {
self.shutdown.store(true, Ordering::SeqCst);
// The listener is blocked in `accept()`. Waking it up cleanly via
// a flag alone isn't possible with `std::net::TcpListener` — there's
// no way to interrupt the syscall. Connect to ourselves: the
// listener wakes up, accepts the connection, sees the shutdown
// flag, breaks the loop. The accepted connection's worker thread
// will read the empty stream and exit too.
let _ = TcpStream::connect((Ipv4Addr::LOCALHOST, self.port));

if let Some(thread) = self.listener_thread.take() {
// Give the listener a chance to clean up. A join error means the
// listener thread panicked; there's nothing to recover, but it
// shouldn't pass unnoticed.
if thread.join().is_err() {
log::warn!("[http_proxy] listener thread panicked");
}
}
}
}

/// State shared across all connection threads for a single proxy instance.
pub(crate) struct RuntimeState {
pub(crate) allowlist: Allowlist,
pub(crate) upstream: Option<UpstreamProxy>,
pub(crate) events: mpsc::UnboundedSender<ProxyEvent>,
active_connections: AtomicUsize,
}

/// Decrements the active-connection count when a connection thread finishes
/// (normally or by panic).
struct ConnectionSlot(Arc<RuntimeState>);

impl Drop for ConnectionSlot {
fn drop(&mut self) {
self.0.active_connections.fetch_sub(1, Ordering::SeqCst);
}
}

fn run_listener(listener: TcpListener, state: Arc<RuntimeState>, shutdown: Arc<AtomicBool>) {
for stream in listener.incoming() {
if shutdown.load(Ordering::SeqCst) {
log::debug!("[http_proxy] listener stopping (shutdown signaled)");
break;
}
match stream {
Ok(stream) => {
let previous = state.active_connections.fetch_add(1, Ordering::SeqCst);
if previous >= MAX_CONCURRENT_CONNECTIONS {
state.active_connections.fetch_sub(1, Ordering::SeqCst);
log::warn!(
"[http_proxy] dropping connection: {MAX_CONCURRENT_CONNECTIONS} \
connections already active"
);
drop(stream);
continue;
}
let slot = ConnectionSlot(state.clone());
let state = state.clone();
let result = thread::Builder::new()
.name("http-proxy-conn".to_string())
// Connection workers do bidir copy with a 64 KiB buffer
// and a few syscall stack frames. 128 KiB is plenty.
.stack_size(128 * 1024)
.spawn(move || {
let _slot = slot;
if let Err(e) = connection::handle(stream, state) {
log::debug!("[http_proxy] connection handler error: {e}");
}
});
if let Err(e) = result {
log::warn!("[http_proxy] failed to spawn connection thread: {e}");
}
}
Err(e) => {
// EMFILE / per-process fd exhaustion is the realistic
// failure here. Log and keep going — accept errors are
// usually transient.
log::warn!("[http_proxy] accept failed: {e}");
}
}
}
}
Loading
Loading