Skip to content

fix(ios): prevent extension crash from late Go callbacks during network flapping#142

Open
pappz wants to merge 1 commit into
mainfrom
fix/t808-ios-network-change-listener-race
Open

fix(ios): prevent extension crash from late Go callbacks during network flapping#142
pappz wants to merge 1 commit into
mainfrom
fix/t808-ios-network-change-listener-race

Conversation

@pappz

@pappz pappz commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Problem

On wifi↔cellular path flapping the packet-tunnel extension crash-loops: it connects briefly, then is killed and relaunched every ~6–12 s, so no IP / device name renders and iOS never activates the VPN. Customer-reported, multiple iOS devices, build 7 / 0.2.0, iOS 26.5.

Crash signature from the device .ips reports: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS at 0x28, com.apple.main-thread, in the gomobile bridge.

Root cause (code-derived)

⚠️ Not yet confirmed against a symbolicated build-7 crash — the build-7 dSYM was not obtainable (the GH Actions runner deletes the .xcarchive after upload; ASC shows "Includes Symbols: Yes" but offers no download). This fix is derived from the code path + three OLD symbolicated extension crashes (0.0.9 / 0.0.10) that all crash on the same onNetworkChanged path.

The Go engine retains NetworkChangeListener and DNSManager for the lifetime of the SDK client they were created with. When the adapter is swapped/torn down (profile switch, or a restartClient() triggered by flapping), the old client can still fire a late onNetworkChanged / applyDns into the now-stale listener, which dereferences a tunnelManager whose owning provider has already been deallocated → use-after-free → 0x28. The listener/adapter swap had no synchronization against in-flight Go callbacks.

Fix

Make the Go-held listeners null-safe and explicitly detachable.

  • NetworkChangeListener.swift / DNSManager.swift: tunnelManager is now optional; every Go callback (onNetworkChanged, setInterfaceIP, setInterfaceIPv6, applyDns) runs on a serial queue behind a guard isValid, let tunnelManager — late/in-flight callbacks become no-ops instead of crashing. Added invalidate() (serialized on the same queue).
  • NetBirdAdapter.swift: added invalidateListeners(). Deliberately NOT called from stop()/restart — the same adapter is reused there and must keep delivering callbacks; only invalidated when the adapter is actually discarded.
  • PacketTunnelProvider.swift: invalidate the outgoing adapter's listeners before replacing it at the two reconfigure sites (startTunnel, handleAppMessage).

Risk / things to watch

  • Callbacks now run via callbackQueue.sync — fine today (no reentrant call back onto the queue; setTunnelNetworkSettings completion is async, so no deadlock), but a future reentrant caller would deadlock. Flagging for reviewer awareness.
  • Main regression risk is functional, not a new crash: confirm that after a restart route/DNS callbacks still arrive, and after a profile switch the new adapter's listener works.

Testing — NOT yet done (needs macOS/CI; no swiftc on the dev box)

  • Builds on CI
  • No crash under wifi↔cellular flapping (Network Link Conditioner / toggle interfaces)
  • Connection + routes + DNS recover after flapping
  • Works after a profile switch

Follow-up suggestion

Add an actions/upload-artifact step for $RUNNER_TEMP/NetBird.xcarchive/dSYMs in build-upload.yml so future extension crashes symbolicate without an ASC dSYM hunt.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced stability during VPN profile switching by properly detaching network and DNS callbacks
    • Improved callback serialization to prevent issues when switching profiles or during tunnel state transitions

…rk flapping

The packet tunnel extension crash-loops (EXC_BAD_ACCESS, KERN_INVALID_ADDRESS
at 0x28) on wifi<->cellular path flapping: it connects briefly, then the
extension is killed and relaunched every ~6-12s, so no IP/device name renders
and iOS never activates the VPN (T-808).

Root cause (code-derived; not yet confirmed against a symbolicated build-7
crash because no dSYM was available): the Go engine retains NetworkChangeListener
and DNSManager for the lifetime of the SDK client they were created with. When
the adapter is swapped/torn down (profile switch, or a restart triggered by
flapping) the old client can still fire a late onNetworkChanged/applyDns into
the now-stale listener, which dereferences a tunnel manager whose owning
provider has already been deallocated -> use-after-free.

Fix: make the Go-held listeners null-safe and explicitly detachable.
- NetworkChangeListener / DNSManager: tunnelManager is now optional; every Go
  callback runs on a serial queue behind a guard (isValid + non-nil
  tunnelManager). Late/in-flight callbacks become no-ops instead of crashing.
  Added invalidate() to detach, serialized on the same queue.
- NetBirdAdapter: added invalidateListeners(). Deliberately NOT called from
  stop()/restart (the same adapter is reused there and must keep delivering
  callbacks) -- only when the adapter is actually discarded.
- PacketTunnelProvider: invalidate the outgoing adapter's listeners before
  replacing it at the two reconfigure sites (startTunnel, handleAppMessage).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds lifecycle invalidation to DNSManager and NetworkChangeListener by introducing serial callbackQueue dispatching, an isValid flag, and an invalidate() method on each. NetBirdAdapter gains invalidateListeners() to call both, and PacketTunnelProvider invokes it before replacing the adapter in startTunnel and handleAppMessage.

Changes

Go callback lifecycle safety

Layer / File(s) Summary
DNSManager and NetworkChangeListener invalidation guards
NetbirdKit/DNSManager.swift, NetbirdKit/NetworkChangeListener.swift
Both listeners gain a serial callbackQueue, an isValid flag, and an optional tunnelManager. A new invalidate() method synchronously sets isValid = false and nils the tunnel manager. All callback entry points (applyDns, onNetworkChanged, setInterfaceIP, setInterfaceIPv6) now run inside callbackQueue.sync and early-return if invalid or tunnelManager is nil.
NetBirdAdapter.invalidateListeners() and PacketTunnelProvider teardown
NetbirdNetworkExtension/NetBirdAdapter.swift, NetbirdNetworkExtension/PacketTunnelProvider.swift
NetBirdAdapter.invalidateListeners() calls invalidate() on both listeners. PacketTunnelProvider calls adapter?.invalidateListeners() in startTunnel before adapter replacement when config differs, and in handleAppMessage before adapter recreation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • netbirdio/ios-client#82: Modifies NetworkChangeListener.onNetworkChanged route parsing and empty-notification handling — the same callback function that this PR wraps in a callbackQueue.sync guard.

Suggested reviewers

  • mlsmaycon
  • lixmal

Poem

🐇 Hop hop, the callbacks must not stray,
When old adapters are tossed away.
A queue and a flag guard the den,
Late Go callbacks drop — never again!
invalidate() seals the burrow tight,
No stale tunnel manager wakes at night. 🌙

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The pull request title accurately and concisely describes the main change: preventing extension crashes caused by late Go callbacks during network flapping, which directly addresses the root cause identified in the PR objectives.
Description check ✅ Passed The pull request description is comprehensive and well-structured, covering problem statement, root cause, fix details, risk assessment, and testing plans.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/t808-ios-network-change-listener-race

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@pappz pappz marked this pull request as ready for review June 18, 2026 08:42
@pappz pappz changed the title fix(ios): prevent extension crash from late Go callbacks during network flapping (T-808) fix(ios): prevent extension crash from late Go callbacks during network flapping Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant