[client] Categorize privileged tests behind a build tag and run them in Docker#6425
[client] Categorize privileged tests behind a build tag and run them in Docker#6425pappz wants to merge 8 commits into
Conversation
…ild tag Tests that need root or mutate host state (nftables/iptables/DNS, TUN/WireGuard interfaces, routes, eBPF, SSH/service install) are now gated behind a //go:build privileged tag. The default `go test ./client/...` runs as a non-root user with no sudo and leaves host networking untouched; mixed files were split so pure-logic tests stay in the default suite. A self-hosting ory/dockertest/v4 harness (client/testutil/privileged) runs the privileged suite inside a --privileged --cap-add=NET_ADMIN container via `make test-privileged`; a DOCKER_CI=true guard skips the spawn when already inside the container. Added `make test-unit` for the host-safe run.
The dockertest harness now reads two optional env vars when building the in-container `go test` command: PRIV_RUN adds a -run test-name filter and PRIV_PKGS overrides the package list. Both empty reproduce the full privileged suite, so CI and `make test-privileged` behave as before. Lets a developer run a single privileged test in the container, e.g.: PRIV_RUN=TestNftablesManager PRIV_PKGS=./client/firewall/nftables/... make test-privileged
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
🚧 Files skipped from review as they are similar to previous changes (4)
📝 WalkthroughWalkthroughThis PR adds a ChangesBuild tag infrastructure and CI configuration
Service lifecycle and server integration test migrations
DNS and SSH test migrations
Engine integration test migration
Firewall, interface, and socket test tagging
Routing test fixture extraction and reorganization
Docker test runner harness
Documentation and dependency updates
🎯 4 (Complex) | ⏱️ ~60 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 12
🧹 Nitpick comments (9)
client/cmd/service_privileged_test.go (1)
50-54: ⚡ Quick winSwallowing all Status() errors can mask real issues.
All errors from
s.Status()are silently ignored and the loop continues polling. If the service truly doesn't exist or there's a persistent error (not transient), this will cause the test to timeout (10s for start, 5s for stop) instead of failing fast with a clear error message.Consider at least logging the error, or distinguishing between "service not found" (which might be transient during start/stop) and other errors.
🔍 Proposed enhancement to log errors
case <-ticker.C: status, err := s.Status() if err != nil { - // Continue polling on transient errors + // Continue polling on transient errors, but log for debugging + t.Logf("status check error (will retry): %v", err) continue }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/cmd/service_privileged_test.go` around lines 50 - 54, The loop currently swallows all errors from s.Status(), which can hide persistent failures; update the Status() error handling so transient "service not found" errors are allowed to continue polling but all other errors are reported and fail the test fast: in the polling loop around s.Status() check the error value and if it represents a not-found/transient condition (e.g., match the service-not-exist sentinel or error string/GRPC NotFound) then continue, otherwise call t.Fatalf (or t.Errorf + break) to surface the real error; additionally, at minimum call t.Logf/t.Errorf with the error before continuing so transient failures are visible in test output.client/testutil/privileged/runner_test.go (2)
155-164: 💤 Low valueConsider validating
PRIV_PKGSandPRIV_RUNbefore shell interpolation.Both environment variable values are interpolated directly into the shell command without sanitization. While the attack surface is limited (requires control of env vars on the test-running machine), the container has the repo mounted read-write, so malicious input could corrupt source files.
For defense in depth, consider rejecting values containing shell metacharacters or using a safer construction pattern.
Example validation
func sanitizeEnvValue(val string) (string, error) { // Reject shell metacharacters if strings.ContainsAny(val, ";|&$`'\"\\(){}[]<>") { return "", fmt.Errorf("invalid characters in value: %s", val) } return val, nil }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/testutil/privileged/runner_test.go` around lines 155 - 164, The buildTestScript function currently interpolates PRIV_PKGS and PRIV_RUN directly into shell strings (variables pkgs and runFilter); validate or sanitize these env values before use to prevent shell metacharacter injection: either reject values containing characters like ;|&$`'"\(){}[]<> or escape them safely (e.g., use a quoting/printf approach) and then construct pkgs and runFilter from the sanitized results, or avoid shell interpolation entirely by building the command with safe argument passing (e.g., using exec.Command-style semantics) so PRIV_PKGS and PRIV_RUN are never injected into an unescaped shell string.
43-43: ⚡ Quick winMissing exclusion for
/client/testutil/privilegedin package filter.The CI workflow excludes this package from the grep filter to avoid including the harness itself. While DOCKER_CI=true prevents recursion, adding the exclusion maintains consistency with CI and avoids unnecessary test discovery overhead.
Suggested fix
-const privilegedTestPackages = `go list -buildvcs=false ./... | grep -v -e /management -e /signal -e /relay -e /proxy -e /combined -e /client/ui -e /upload-server` +const privilegedTestPackages = `go list -buildvcs=false ./... | grep -v -e /management -e /signal -e /relay -e /proxy -e /combined -e /client/ui -e /upload-server -e /client/testutil/privileged`🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/testutil/privileged/runner_test.go` at line 43, The package filter constant privilegedTestPackages misses excluding the test harness itself; update the string value of privilegedTestPackages so the grep pattern also excludes the /client/testutil/privileged package (add an -e /client/testutil/privileged term to the existing list of -e exclusions) so the harness package is not included in test discovery.client/internal/engine_privileged_test.go (4)
35-38: 💤 Low value
t.Fatal()already stops execution;returnis redundant.In several locations,
t.Fatal(err)is immediately followed byreturn. SinceFatalcallsFailNow()which stops the current goroutine, thereturnstatements are unreachable.♻️ Simplify by removing redundant returns
Example for lines 35-38:
key, err := wgtypes.GeneratePrivateKey() if err != nil { t.Fatal(err) - return }Apply the same pattern to lines 41-44, 158-161, 205-208, 254-257, and 260-263.
Also applies to: 41-44, 158-161, 205-208, 254-257, 260-263
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/engine_privileged_test.go` around lines 35 - 38, Remove the redundant return statements that immediately follow t.Fatal(err) in client/internal/engine_privileged_test.go: locate each occurrence of t.Fatal(err) (e.g., the instance shown in the diff and the additional occurrences noted at lines 41-44, 158-161, 205-208, 254-257, 260-263) and delete the subsequent return since t.Fatal calls FailNow() and already stops execution.
138-138: 💤 Low valueRemove commented-out code.
This commented-out
time.Sleepshould either be removed or uncommented if it's needed for test stability.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/engine_privileged_test.go` at line 138, Remove the commented-out time.Sleep line ("// time.Sleep(250 * time.Millisecond)") from the test; if a pause is actually required for test stability, uncomment it instead and add a short explanatory comment and consider replacing the fixed sleep with a deterministic synchronization (e.g., wait for a condition or use a channel) inside the same test function where the commented line appears so the test remains stable without stray commented code.
125-125: ⚖️ Poor tradeoffConsider a more robust synchronization mechanism.
The 250ms sleep is used to wait for the SSH server to initialize, but this is brittle and may cause flakes on slower systems. Consider polling
engine.sshServerwith a timeout or adding a synchronization primitive to the engine's SSH server startup path.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/engine_privileged_test.go` at line 125, Replace the brittle time.Sleep(250 * time.Millisecond) in the test with a deterministic wait: poll the engine.sshServer (e.g., loop checking engine.sshServer != nil and that it is accepting connections) with a short sleep and an overall timeout, or add a synchronization primitive in the engine SSH startup (e.g., a ready channel or a WaitGroup that the test can wait on). Concretely, remove the direct Sleep call in the test, implement polling with a timeout around engine.sshServer readiness (or introduce Engine.StartSSH()/Engine.WaitForSSHReady() that signals when ready), and have the test wait on that signal instead of sleeping. Ensure the wait fails the test if the timeout elapses.
317-318: 💤 Low valueUnreachable
break loopaftert.Fatalf.
t.Fatalfstops test execution immediately, so thebreak loopon line 318 is never reached. Consider removing it for clarity.♻️ Proposed simplification
case <-timeoutChan: t.Fatalf("waiting for expected connections timeout after %s", timeout.String()) - break loop case <-ticker.C:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/engine_privileged_test.go` around lines 317 - 318, Remove the unreachable "break loop" statement immediately following the t.Fatalf call in the test (the t.Fatalf(...) line already stops execution), so delete the redundant "break loop" to clean up the test and avoid dead code next to the failing call.client/internal/routemanager/systemops/systemops_bsd_privileged_test.go (1)
31-72: ⚡ Quick winConsider using
require.NoErrorfor immediate test termination on route failures.Lines 49 and 65 use
t.Errorf, allowing the test to continue after route add/remove failures. While this reports all concurrent failures, it can produce unclear output if many goroutines fail and may leave routes in an inconsistent state for subsequent assertions.♻️ Proposed refactor
go func(ip netip.Addr) { defer wg.Done() prefix := netip.PrefixFrom(ip, 32) - if err := r.addToRouteTable(prefix, nexthop); err != nil { - t.Errorf("Failed to add route for %s: %v", prefix, err) - } + require.NoError(t, r.addToRouteTable(prefix, nexthop), "Failed to add route for %s", prefix) }(baseIP)Apply the same pattern for the remove loop at lines 64-66.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/routemanager/systemops/systemops_bsd_privileged_test.go` around lines 31 - 72, TestConcurrentRoutes uses t.Errorf in concurrent goroutines which allows the test to continue after failures; change those checks to require.NoError so failures abort immediately. Replace t.Errorf("Failed to add route for %s: %v", prefix, err) with require.NoError(t, err, "add route %s", prefix) in the add loop and similarly replace the remove loop's t.Errorf with require.NoError(t, err, "remove route %s", prefix). Add the testify/require import if missing. Ensure references to r.addToRouteTable and r.removeFromRouteTable remain unchanged otherwise.client/internal/routemanager/systemops/systemops_routing_data_test.go (1)
75-83: 💤 Low valueConsider documenting UDP-only behavior or making the helper more flexible.
The helper always sets
UDP: trueand never setsTCP, which works for the current DNS-based test cases but may require updates if TCP routing tests are added later.💡 Optional: Add protocol parameter or comment
Option 1: Add a comment documenting the UDP-only behavior:
func createPacketExpectation(srcIP string, srcPort int, dstIP string, dstPort int) PacketExpectation { + // Creates UDP packet expectations for DNS-based routing tests return PacketExpectation{Option 2: Make it more flexible by adding a protocol parameter:
-func createPacketExpectation(srcIP string, srcPort int, dstIP string, dstPort int) PacketExpectation { +func createPacketExpectation(srcIP string, srcPort int, dstIP string, dstPort int, udp bool) PacketExpectation { return PacketExpectation{ SrcIP: net.ParseIP(srcIP), DstIP: net.ParseIP(dstIP), SrcPort: srcPort, DstPort: dstPort, - UDP: true, + UDP: udp, + TCP: !udp, } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/routemanager/systemops/systemops_routing_data_test.go` around lines 75 - 83, The helper createPacketExpectation currently hardcodes UDP: true which limits reuse; update it to accept a protocol parameter (e.g., proto string or isUDP bool) and set the PacketExpectation fields accordingly (set UDP true when proto == "udp", set TCP true when proto == "tcp", or set both/none as needed), or alternatively add a short comment above createPacketExpectation documenting that it intentionally creates UDP-only expectations; reference the createPacketExpectation function and ensure callers in tests pass the new parameter or are left valid after adding the comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@client/cmd/service_privileged_test.go`:
- Around line 79-82: The test mutates package-level globals configPath,
logLevel, and daemonAddr but never restores them; follow the same
save-and-restore pattern used for serviceName (lines 73-77): at the start of the
test capture originals (e.g., oldConfigPath, oldLogLevel, oldDaemonAddr), set
the globals to the temp values as now, and add a defer that restores configPath
= oldConfigPath, logLevel = oldLogLevel, and daemonAddr = oldDaemonAddr so other
tests are not affected.
In `@client/internal/dns/server_privileged_test.go`:
- Around line 315-324: The test currently asserts exact handler ID keys in
dnsServer.dnsMuxMap which will fail because UpdateDNSServer/newUpstreamResolver
create fresh handlers with different IDs; change the verification to match
handlers by stable properties: iterate testCase.expectedUpstreamMap values
(generated by generateDummyHandler/dummyHandler), and for each expected handler
find a corresponding actual handler in dnsServer.dnsMuxMap by comparing domain
(and if applicable priority/count or other stable fields) rather than ID, then
assert those properties are equal; keep references to dnsServer.dnsMuxMap,
generateDummyHandler/dummyHandler, UpdateDNSServer, and newUpstreamResolver to
locate the code.
In `@client/internal/engine_privileged_test.go`:
- Line 302: Update the failing test message string to correct grammar: replace
the text "not all peers was started" with "not all peers were started" in the
test that calls t.Fatal (search for the t.Fatal call in
engine_privileged_test.go to locate the exact assertion).
In `@client/internal/routemanager/systemops/systemops_linux_test.go`:
- Line 1: Add the missing linux build constraint to the file-level build tag so
this test file only builds on Linux: update the existing build tag line
"//go:build !android && privileged" to include "linux" (e.g. "//go:build linux
&& !android && privileged") so the Linux-only import
"github.com/vishvananda/netlink" and any Linux-specific code in the file are not
compiled on non-Linux platforms.
In `@client/internal/routemanager/systemops/systemops_windows_test.go`:
- Around line 1-2: The file's build tag currently only specifies "privileged"
and must be limited to Windows to avoid attempting to build/run PowerShell-based
tests on non-Windows platforms; replace the existing single build line with a
Windows+privileged constraint by adding the modern and legacy build tags: use
"//go:build privileged && windows" followed by a blank line and "// +build
privileged,windows" so the file only builds on Windows with the privileged tag
(no other code changes required).
In `@client/server/server_privileged_test.go`:
- Around line 228-232: Replace the call to log.Fatalf in the goroutine that runs
s.Serve(lis) with t.Errorf so a server startup error is reported to the test
harness instead of terminating the whole process; specifically, in the anonymous
goroutine that calls s.Serve(lis) change log.Fatalf("failed to serve: %v", err)
to t.Errorf("failed to serve: %v", err).
- Around line 62-119: The test TestConnectWithRetryRuns starts gRPC servers via
startSignal and startManagement but never stops them; update the test to capture
the returned *grpc.Server values and register t.Cleanup calls to stop them (call
server.GracefulStop() or server.Stop()) so the servers are shut down when the
test ends; specifically, assign the first return of startSignal and
startManagement to variables (e.g., sigSrv, mgmtSrv) and call t.Cleanup(func(){
sigSrv.GracefulStop(); mgmtSrv.GracefulStop() }) or Stop() to prevent
goroutine/port leaks.
- Around line 157-160: Remove the unreachable error check after creating the
in-memory event store: the assignment eventStore :=
&activity.InMemoryEventStore{} does not return an error, so delete the
subsequent if err != nil { return nil, "", err } block; ensure only the earlier
error handling for store.NewTestStoreFromSQL remains and no new variables are
introduced.
- Around line 219-222: The helper that currently calls net.Listen("tcp",
"localhost:0") should not call log.Fatalf; change the helper to return the error
to its caller instead (e.g., change its signature to return (net.Listener,
error)), replace the log.Fatalf("failed to listen: %v", err) with return nil,
err, and update all callers to handle the error (they can call t.Fatalf/t.Fatal
or fail the test as appropriate). Ensure net.Listen error handling in the helper
uses the new return path and that callers propagate or handle the error rather
than letting log.Fatalf exit the entire test process.
- Around line 205-209: The goroutine launching s.Serve currently calls
log.Fatalf which exits the entire test process; change that to report the error
on the test (use t.Errorf) so only this test fails and cleanup runs: inside the
anonymous goroutine replace log.Fatalf("failed to serve: %v", err) with
t.Errorf("failed to serve: %v", err) (and return from the goroutine after
reporting); ensure the goroutine closure correctly captures the test variable
(t) so the call to t.Errorf uses the same *testing.T from the test function.
In `@client/ssh/proxy/proxy_privileged_test.go`:
- Around line 184-187: TestSSHProxy_CommandQuoting currently runs Unix-only
commands and lacks a Windows skip; update the start of the
TestSSHProxy_CommandQuoting function to skip on Windows like
TestSSHProxy_Connect does by checking runtime.GOOS == "windows" and calling
t.Skip("skipping on Windows") (also keep the existing testing.Short() skip).
In `@client/testutil/privileged/runner_test.go`:
- Line 34: Rename the misspelled constant containerGoModache to
containerGoModCache and update all references to it (e.g., the usages on the
lines noted in the review: the occurrences currently referencing
containerGoModache around lines 81 and 89) so they use the corrected name
containerGoModCache; ensure any tests or helper functions that import or refer
to this constant compile after the rename.
---
Nitpick comments:
In `@client/cmd/service_privileged_test.go`:
- Around line 50-54: The loop currently swallows all errors from s.Status(),
which can hide persistent failures; update the Status() error handling so
transient "service not found" errors are allowed to continue polling but all
other errors are reported and fail the test fast: in the polling loop around
s.Status() check the error value and if it represents a not-found/transient
condition (e.g., match the service-not-exist sentinel or error string/GRPC
NotFound) then continue, otherwise call t.Fatalf (or t.Errorf + break) to
surface the real error; additionally, at minimum call t.Logf/t.Errorf with the
error before continuing so transient failures are visible in test output.
In `@client/internal/engine_privileged_test.go`:
- Around line 35-38: Remove the redundant return statements that immediately
follow t.Fatal(err) in client/internal/engine_privileged_test.go: locate each
occurrence of t.Fatal(err) (e.g., the instance shown in the diff and the
additional occurrences noted at lines 41-44, 158-161, 205-208, 254-257, 260-263)
and delete the subsequent return since t.Fatal calls FailNow() and already stops
execution.
- Line 138: Remove the commented-out time.Sleep line ("// time.Sleep(250 *
time.Millisecond)") from the test; if a pause is actually required for test
stability, uncomment it instead and add a short explanatory comment and consider
replacing the fixed sleep with a deterministic synchronization (e.g., wait for a
condition or use a channel) inside the same test function where the commented
line appears so the test remains stable without stray commented code.
- Line 125: Replace the brittle time.Sleep(250 * time.Millisecond) in the test
with a deterministic wait: poll the engine.sshServer (e.g., loop checking
engine.sshServer != nil and that it is accepting connections) with a short sleep
and an overall timeout, or add a synchronization primitive in the engine SSH
startup (e.g., a ready channel or a WaitGroup that the test can wait on).
Concretely, remove the direct Sleep call in the test, implement polling with a
timeout around engine.sshServer readiness (or introduce
Engine.StartSSH()/Engine.WaitForSSHReady() that signals when ready), and have
the test wait on that signal instead of sleeping. Ensure the wait fails the test
if the timeout elapses.
- Around line 317-318: Remove the unreachable "break loop" statement immediately
following the t.Fatalf call in the test (the t.Fatalf(...) line already stops
execution), so delete the redundant "break loop" to clean up the test and avoid
dead code next to the failing call.
In `@client/internal/routemanager/systemops/systemops_bsd_privileged_test.go`:
- Around line 31-72: TestConcurrentRoutes uses t.Errorf in concurrent goroutines
which allows the test to continue after failures; change those checks to
require.NoError so failures abort immediately. Replace t.Errorf("Failed to add
route for %s: %v", prefix, err) with require.NoError(t, err, "add route %s",
prefix) in the add loop and similarly replace the remove loop's t.Errorf with
require.NoError(t, err, "remove route %s", prefix). Add the testify/require
import if missing. Ensure references to r.addToRouteTable and
r.removeFromRouteTable remain unchanged otherwise.
In `@client/internal/routemanager/systemops/systemops_routing_data_test.go`:
- Around line 75-83: The helper createPacketExpectation currently hardcodes UDP:
true which limits reuse; update it to accept a protocol parameter (e.g., proto
string or isUDP bool) and set the PacketExpectation fields accordingly (set UDP
true when proto == "udp", set TCP true when proto == "tcp", or set both/none as
needed), or alternatively add a short comment above createPacketExpectation
documenting that it intentionally creates UDP-only expectations; reference the
createPacketExpectation function and ensure callers in tests pass the new
parameter or are left valid after adding the comment.
In `@client/testutil/privileged/runner_test.go`:
- Around line 155-164: The buildTestScript function currently interpolates
PRIV_PKGS and PRIV_RUN directly into shell strings (variables pkgs and
runFilter); validate or sanitize these env values before use to prevent shell
metacharacter injection: either reject values containing characters like
;|&$`'"\(){}[]<> or escape them safely (e.g., use a quoting/printf approach) and
then construct pkgs and runFilter from the sanitized results, or avoid shell
interpolation entirely by building the command with safe argument passing (e.g.,
using exec.Command-style semantics) so PRIV_PKGS and PRIV_RUN are never injected
into an unescaped shell string.
- Line 43: The package filter constant privilegedTestPackages misses excluding
the test harness itself; update the string value of privilegedTestPackages so
the grep pattern also excludes the /client/testutil/privileged package (add an
-e /client/testutil/privileged term to the existing list of -e exclusions) so
the harness package is not included in test discovery.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3fb90dcf-9449-43b0-8baf-650f7ae7c8a7
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (38)
.github/workflows/golang-test-linux.ymlMakefileclient/cmd/service_privileged_test.goclient/cmd/service_test.goclient/firewall/iptables/manager_linux_test.goclient/firewall/iptables/router_linux_test.goclient/firewall/nftables/manager_linux_test.goclient/firewall/nftables/router_linux_test.goclient/iface/iface_test.goclient/iface/wgproxy/proxy_linux_test.goclient/iface/wgproxy/proxy_seed_test.goclient/iface/wgproxy/redirect_test.goclient/internal/dns/server_privileged_test.goclient/internal/dns/server_test.goclient/internal/engine_privileged_test.goclient/internal/engine_test.goclient/internal/routemanager/manager_test.goclient/internal/routemanager/systemops/rt_tables_linux_test.goclient/internal/routemanager/systemops/systemops_bsd_privileged_test.goclient/internal/routemanager/systemops/systemops_bsd_test.goclient/internal/routemanager/systemops/systemops_dialer_test.goclient/internal/routemanager/systemops/systemops_generic_test.goclient/internal/routemanager/systemops/systemops_isvpnroute_test.goclient/internal/routemanager/systemops/systemops_linux_test.goclient/internal/routemanager/systemops/systemops_routing_data_linux_test.goclient/internal/routemanager/systemops/systemops_routing_data_test.goclient/internal/routemanager/systemops/systemops_unix_test.goclient/internal/routemanager/systemops/systemops_windows_test.goclient/internal/routemanager/systemops/v6route_linux_test.goclient/server/server_privileged_test.goclient/server/server_test.goclient/ssh/client/client_privileged_test.goclient/ssh/client/client_test.goclient/ssh/proxy/proxy_privileged_test.goclient/ssh/proxy/proxy_test.goclient/testutil/privileged/runner_test.godocs/testing-privileged.mdgo.mod
💤 Files with no reviewable changes (6)
- client/ssh/client/client_test.go
- client/ssh/proxy/proxy_test.go
- client/cmd/service_test.go
- client/internal/engine_test.go
- client/internal/routemanager/systemops/systemops_bsd_test.go
- client/internal/dns/server_test.go
| tempDir := t.TempDir() | ||
| configPath = fmt.Sprintf("%s/netbird-test-config.json", tempDir) | ||
| logLevel = "info" | ||
| daemonAddr = fmt.Sprintf("unix://%s/netbird-test.sock", tempDir) |
There was a problem hiding this comment.
Restore mutated global variables in test cleanup.
The test modifies package-level globals configPath, logLevel, and daemonAddr but doesn't restore them after the test completes. If other tests in this package rely on these defaults, they could be affected by this test's mutations.
Apply the same save-and-restore pattern used for serviceName (lines 73-77).
🔧 Proposed fix to restore globals
tempDir := t.TempDir()
+ originalConfigPath := configPath
+ originalLogLevel := logLevel
+ originalDaemonAddr := daemonAddr
configPath = fmt.Sprintf("%s/netbird-test-config.json", tempDir)
logLevel = "info"
daemonAddr = fmt.Sprintf("unix://%s/netbird-test.sock", tempDir)
+ defer func() {
+ configPath = originalConfigPath
+ logLevel = originalLogLevel
+ daemonAddr = originalDaemonAddr
+ }()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| tempDir := t.TempDir() | |
| configPath = fmt.Sprintf("%s/netbird-test-config.json", tempDir) | |
| logLevel = "info" | |
| daemonAddr = fmt.Sprintf("unix://%s/netbird-test.sock", tempDir) | |
| tempDir := t.TempDir() | |
| originalConfigPath := configPath | |
| originalLogLevel := logLevel | |
| originalDaemonAddr := daemonAddr | |
| configPath = fmt.Sprintf("%s/netbird-test-config.json", tempDir) | |
| logLevel = "info" | |
| daemonAddr = fmt.Sprintf("unix://%s/netbird-test.sock", tempDir) | |
| defer func() { | |
| configPath = originalConfigPath | |
| logLevel = originalLogLevel | |
| daemonAddr = originalDaemonAddr | |
| }() |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@client/cmd/service_privileged_test.go` around lines 79 - 82, The test mutates
package-level globals configPath, logLevel, and daemonAddr but never restores
them; follow the same save-and-restore pattern used for serviceName (lines
73-77): at the start of the test capture originals (e.g., oldConfigPath,
oldLogLevel, oldDaemonAddr), set the globals to the temp values as now, and add
a defer that restores configPath = oldConfigPath, logLevel = oldLogLevel, and
daemonAddr = oldDaemonAddr so other tests are not affected.
| go func() { | ||
| if err = s.Serve(lis); err != nil { | ||
| log.Fatalf("failed to serve: %v", err) | ||
| } | ||
| }() |
There was a problem hiding this comment.
Replace log.Fatalf with t.Errorf in test goroutine.
Using log.Fatalf at line 207 will terminate the entire test process (via os.Exit(1)) if the gRPC server fails, rather than just failing the current test. This prevents proper cleanup and makes test failures harder to debug.
In test helpers, errors should be reported via t.Errorf or returned to the caller.
⚠️ Proposed fix using t.Errorf
mgmtProto.RegisterManagementServiceServer(s, mock)
go func() {
if err = s.Serve(lis); err != nil {
- log.Fatalf("failed to serve: %v", err)
+ t.Errorf("management server failed: %v", err)
}
}()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@client/server/server_privileged_test.go` around lines 205 - 209, The
goroutine launching s.Serve currently calls log.Fatalf which exits the entire
test process; change that to report the error on the test (use t.Errorf) so only
this test fails and cleanup runs: inside the anonymous goroutine replace
log.Fatalf("failed to serve: %v", err) with t.Errorf("failed to serve: %v", err)
(and return from the goroutine after reporting); ensure the goroutine closure
correctly captures the test variable (t) so the call to t.Errorf uses the same
*testing.T from the test function.
| go func() { | ||
| if err = s.Serve(lis); err != nil { | ||
| log.Fatalf("failed to serve: %v", err) | ||
| } | ||
| }() |
There was a problem hiding this comment.
Replace log.Fatalf with t.Errorf in signal server goroutine.
Using log.Fatalf at line 230 will terminate the entire test process if the gRPC server fails. Use t.Errorf to report the error without killing the test process.
⚠️ Proposed fix
go func() {
if err = s.Serve(lis); err != nil {
- log.Fatalf("failed to serve: %v", err)
+ t.Errorf("signal server failed: %v", err)
}
}()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@client/server/server_privileged_test.go` around lines 228 - 232, Replace the
call to log.Fatalf in the goroutine that runs s.Serve(lis) with t.Errorf so a
server startup error is reported to the test harness instead of terminating the
whole process; specifically, in the anonymous goroutine that calls s.Serve(lis)
change log.Fatalf("failed to serve: %v", err) to t.Errorf("failed to serve: %v",
err).
| func TestSSHProxy_CommandQuoting(t *testing.T) { | ||
| if testing.Short() { | ||
| t.Skip("Skipping integration test in short mode") | ||
| } |
There was a problem hiding this comment.
Missing Windows skip will cause test failures.
TestSSHProxy_Connect correctly skips Windows (lines 42-44), but this test does not and uses Unix-only /bin/sh -c commands that will fail on Windows.
Proposed fix
func TestSSHProxy_CommandQuoting(t *testing.T) {
if testing.Short() {
t.Skip("Skipping integration test in short mode")
}
+
+ if runtime.GOOS == "windows" {
+ t.Skip("Skipping on Windows - test uses /bin/sh commands")
+ }
sshClient, cleanup := setupProxySSHClient(t)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@client/ssh/proxy/proxy_privileged_test.go` around lines 184 - 187,
TestSSHProxy_CommandQuoting currently runs Unix-only commands and lacks a
Windows skip; update the start of the TestSSHProxy_CommandQuoting function to
skip on Windows like TestSSHProxy_Connect does by checking runtime.GOOS ==
"windows" and calling t.Skip("skipping on Windows") (also keep the existing
testing.Short() skip).
Splitting privileged tests into *_privileged_test.go left their shared helpers in the untagged files, so in the default (no-tag) build they had no callers and golangci-lint flagged them as unused. Moved the privileged-only helpers into the privileged files next to their callers (generateDummyHandler; createEngine/startSignal/startManagement/getConnectedPeers/ getPeers + kaep/kasp; (*mockDaemon).setJWTToken). Annotated the shared routing-test fixtures that must stay untagged for cross-platform compilation with //nolint:unused (systemops_bsd expected* vars, ensureIPv6DefaultRoute on bsd/windows, loopbackIfaceWindows), matching the existing linux variant.
Release artifactsBuilt for PR head
GHCR images (amd64)
This comment is updated by the Release workflow. Artifact links expire according to the workflow retention policy. |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
client/internal/engine_privileged_test.go (1)
329-329:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix grammatical error in test failure message.
"not all peers was started" should be "not all peers were started".
📝 Proposed fix
if len(engines) != numPeers { - t.Fatal("not all peers was started") + t.Fatal("not all peers were started") }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/engine_privileged_test.go` at line 329, Fix the grammatical error in the test failure message by replacing the string "not all peers was started" with "not all peers were started" in the test that calls t.Fatal (update the t.Fatal invocation's message to use "were" instead of "was").client/ssh/proxy/proxy_privileged_test.go (1)
188-191:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMissing Windows skip will cause test failures.
TestSSHProxy_Connectcorrectly skips Windows (lines 46-48), but this test does not and uses Unix-only/bin/sh -ccommands that will fail on Windows.Proposed fix
func TestSSHProxy_CommandQuoting(t *testing.T) { if testing.Short() { t.Skip("Skipping integration test in short mode") } + + if runtime.GOOS == "windows" { + t.Skip("Skipping on Windows - test uses /bin/sh commands") + } sshClient, cleanup := setupProxySSHClient(t)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/ssh/proxy/proxy_privileged_test.go` around lines 188 - 191, TestSSHProxy_CommandQuoting currently runs Unix-only shell commands and will fail on Windows; add the same Windows skip used in TestSSHProxy_Connect by checking runtime.GOOS == "windows" at the start of TestSSHProxy_CommandQuoting and calling t.Skip(...) (or equivalent) to skip the test on Windows so it only runs on Unix-like systems.
🧹 Nitpick comments (1)
client/internal/engine_privileged_test.go (1)
423-428: SimplifyNewEngineassignment in privileged test
NewEnginereturns only*Engine(func NewEngine(...) *Engine), soe, err := NewEngine(...), nilalways setserrtonil; prefere := NewEngine(...); return e, nil.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/engine_privileged_test.go` around lines 423 - 428, The test incorrectly unpacks NewEngine as if it returned (engine, error): change the assignment "e, err := NewEngine(...), nil" to only capture the returned *Engine (e := NewEngine(...)) and then return e, nil from the test helper; update any surrounding references that expect err to be set (remove or rely on the explicit nil return) so that NewEngine (which returns *Engine) is used correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@client/internal/engine_privileged_test.go`:
- Line 440: In client/internal/engine_privileged_test.go replace the fatal
logger calls in the test server goroutines so they don't exit the whole test
process: at anchor site (client/internal/engine_privileged_test.go lines
440-440) change the signal server's log.Fatalf("failed to listen: %v", err) to
log.Errorf("gRPC signal server stopped: %v", err); at sibling site
(client/internal/engine_privileged_test.go lines 449-449) change the signal
server's log.Fatalf("failed to serve: %v", err) to log.Errorf("gRPC signal
server stopped: %v", err); and at sibling site
(client/internal/engine_privileged_test.go lines 539-539) change the management
server's log.Fatalf("failed to serve: %v", err) to log.Errorf("gRPC management
server stopped: %v", err).
---
Duplicate comments:
In `@client/internal/engine_privileged_test.go`:
- Line 329: Fix the grammatical error in the test failure message by replacing
the string "not all peers was started" with "not all peers were started" in the
test that calls t.Fatal (update the t.Fatal invocation's message to use "were"
instead of "was").
In `@client/ssh/proxy/proxy_privileged_test.go`:
- Around line 188-191: TestSSHProxy_CommandQuoting currently runs Unix-only
shell commands and will fail on Windows; add the same Windows skip used in
TestSSHProxy_Connect by checking runtime.GOOS == "windows" at the start of
TestSSHProxy_CommandQuoting and calling t.Skip(...) (or equivalent) to skip the
test on Windows so it only runs on Unix-like systems.
---
Nitpick comments:
In `@client/internal/engine_privileged_test.go`:
- Around line 423-428: The test incorrectly unpacks NewEngine as if it returned
(engine, error): change the assignment "e, err := NewEngine(...), nil" to only
capture the returned *Engine (e := NewEngine(...)) and then return e, nil from
the test helper; update any surrounding references that expect err to be set
(remove or rely on the explicit nil return) so that NewEngine (which returns
*Engine) is used correctly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 08e6d92d-26df-407b-9695-08613d2d6664
📒 Files selected for processing (9)
client/internal/dns/server_privileged_test.goclient/internal/dns/server_test.goclient/internal/engine_privileged_test.goclient/internal/engine_test.goclient/internal/routemanager/systemops/systemops_bsd_test.goclient/internal/routemanager/systemops/v6route_bsd_test.goclient/internal/routemanager/systemops/v6route_windows_test.goclient/ssh/proxy/proxy_privileged_test.goclient/ssh/proxy/proxy_test.go
💤 Files with no reviewable changes (3)
- client/internal/dns/server_test.go
- client/ssh/proxy/proxy_test.go
- client/internal/engine_test.go
✅ Files skipped from review due to trivial changes (2)
- client/internal/routemanager/systemops/v6route_windows_test.go
- client/internal/routemanager/systemops/v6route_bsd_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
- client/internal/dns/server_privileged_test.go
- client/internal/routemanager/systemops/systemops_bsd_test.go
|
|
||
| lis, err := net.Listen("tcp", "localhost:0") | ||
| if err != nil { | ||
| log.Fatalf("failed to listen: %v", err) |
There was a problem hiding this comment.
Replace log.Fatalf with non-fatal logging in test server goroutines. All three goroutines use log.Fatalf, which calls os.Exit(1) and terminates the entire test process (including parallel tests). In test helpers, log errors without exiting or return them to the caller.
client/internal/engine_privileged_test.go#L440-L440: Replacelog.Fatalf("failed to listen: %v", err)in the signal server goroutine withlog.Errorf("gRPC signal server stopped: %v", err).client/internal/engine_privileged_test.go#L449-L449: Replacelog.Fatalf("failed to serve: %v", err)in the signal server goroutine withlog.Errorf("gRPC signal server stopped: %v", err).client/internal/engine_privileged_test.go#L539-L539: Replacelog.Fatalf("failed to serve: %v", err)in the management server goroutine withlog.Errorf("gRPC management server stopped: %v", err).
📍 Affects 1 file
client/internal/engine_privileged_test.go#L440-L440(this comment)client/internal/engine_privileged_test.go#L449-L449client/internal/engine_privileged_test.go#L539-L539
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@client/internal/engine_privileged_test.go` at line 440, In
client/internal/engine_privileged_test.go replace the fatal logger calls in the
test server goroutines so they don't exit the whole test process: at anchor site
(client/internal/engine_privileged_test.go lines 440-440) change the signal
server's log.Fatalf("failed to listen: %v", err) to log.Errorf("gRPC signal
server stopped: %v", err); at sibling site
(client/internal/engine_privileged_test.go lines 449-449) change the signal
server's log.Fatalf("failed to serve: %v", err) to log.Errorf("gRPC signal
server stopped: %v", err); and at sibling site
(client/internal/engine_privileged_test.go lines 539-539) change the management
server's log.Fatalf("failed to serve: %v", err) to log.Errorf("gRPC management
server stopped: %v", err).
The host-safe unit run dropped sudo but two privileged test groups were never tagged, and the Docker privileged job silently never ran the suite: - Gate the ssh/server PrivilegeDropper command-construction tests behind the privileged tag (they require root to target a different UID); split them into executor_unix_privileged_test.go. - Tag sharedsock raw-socket tests privileged (need CAP_NET_RAW). - Fix the Docker job command: nested single quotes around the build tags closed the sh -c wrapper early, dropping the go list package set and the privileged tag, so go test ran on the empty repo root. Use double quotes. Make the self-hosting harness usable from a dev Mac: - Build it on darwin as well as linux; it only drives Docker. - Resolve the active docker context endpoint into DOCKER_HOST when the default /var/run/docker.sock is absent (Docker Desktop, Colima, OrbStack). - Rename the misspelled containerGoModache constant to containerGoModCache.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@client/ssh/server/executor_unix_privileged_test.go`:
- Around line 63-66: Replace the incorrect no-op assertion that checks for
"--interactive" with a check for the actual flag used by the executor; in the
test (referencing CreateExecutorCommand and the cmd variable) change
assert.NotContains(t, cmd.Args, "--interactive") to assert.NotContains(t,
cmd.Args, "--pty") so the test verifies the absence of the pty flag when command
mode is expected.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 200de882-b12f-4238-80ec-69b900c97812
📒 Files selected for processing (5)
.github/workflows/golang-test-linux.ymlclient/ssh/server/executor_unix_privileged_test.goclient/ssh/server/executor_unix_test.goclient/testutil/privileged/runner_test.gosharedsock/sock_linux_test.go
💤 Files with no reviewable changes (1)
- client/ssh/server/executor_unix_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
- .github/workflows/golang-test-linux.yml
- client/testutil/privileged/runner_test.go
| // Verify no command mode (command is empty so no --cmd flag) | ||
| assert.NotContains(t, cmd.Args, "--cmd") | ||
| assert.NotContains(t, cmd.Args, "--interactive") | ||
| } |
There was a problem hiding this comment.
Assert the actual flag name used by the executor (--pty), not --interactive.
At Line 65, assert.NotContains(t, cmd.Args, "--interactive") is a no-op for regression safety because CreateExecutorCommand doesn’t emit --interactive at all. This should validate --pty instead.
Suggested fix
- assert.NotContains(t, cmd.Args, "--interactive")
+ assert.NotContains(t, cmd.Args, "--pty")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Verify no command mode (command is empty so no --cmd flag) | |
| assert.NotContains(t, cmd.Args, "--cmd") | |
| assert.NotContains(t, cmd.Args, "--interactive") | |
| } | |
| // Verify no command mode (command is empty so no --cmd flag) | |
| assert.NotContains(t, cmd.Args, "--cmd") | |
| assert.NotContains(t, cmd.Args, "--pty") | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@client/ssh/server/executor_unix_privileged_test.go` around lines 63 - 66,
Replace the incorrect no-op assertion that checks for "--interactive" with a
check for the actual flag used by the executor; in the test (referencing
CreateExecutorCommand and the cmd variable) change assert.NotContains(t,
cmd.Args, "--interactive") to assert.NotContains(t, cmd.Args, "--pty") so the
test verifies the absence of the pty flag when command mode is expected.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
|
riccardomanfrin
left a comment
There was a problem hiding this comment.
In the PR description I see
go build -tags 'devcert privileged' ./... compiles on linux/darwin/bsd/windows.
but the tag there has no effect right? cause build is skipping test files
Can you add a common place for the existing tags (devcert doc is very deep in the repo struct and away from privilged doc)
"Privileged suite: runs the privileged-tagged tests" <- is privileged running ONLY the privilged or is it not excluding them (maybe it's worth to explain if it's additive in the md doc)



Describe your changes
Previously the whole suite ran under
sudoso the privileged subset was implicit and the unit tests couldn't run host-safe locally, on either Linux or macOS.Root / system-mutating client tests (nftables/iptables/DNS, TUN/WireGuard interfaces, routes, eBPF, SSH/service install) are now gated behind a
//go:build privilegedtag. Mixed files were split so pure-logic tests stay in the default suite.go test ./client/...(andmake test-unit) now runs as a non-root user with no sudo and leaves host networking untouched.ory/dockertest/v4harness (client/testutil/privileged) runs the privileged suite inside a--privileged --cap-add=NET_ADMINcontainer viamake test-privileged. ADOCKER_CI=trueguard prevents container recursion.PRIV_RUN/PRIV_PKGSenv vars narrow the container run to a single test/package.Test plan
go test -tags devcert ./client/...passes as non-root, no sudo.go build -tags 'devcert privileged' ./...compiles on linux/darwin/bsd/windows.make test-privilegedruns the suite in the container.Issue ticket number and link
Stack
Checklist
Documentation
Select exactly one:
Docs PR URL (required if "docs added" is checked)
Paste the PR link from https://github.com/netbirdio/docs here:
https://github.com/netbirdio/docs/pull/__
Summary by CodeRabbit
privilegedbuild-tag split for host-mutating/privileged tests and expanded privileged integration coverage (DNS, SSH, routing/system ops, engine, server, firewall).go testviasudo, and to better scope privileged package runs for container testing.test-unitandtest-privileged.