fix: use shared system trust store sds secret#9357
Conversation
✅ Deploy Preview for cerulean-figolla-1f9435 ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #9357 +/- ##
==========================================
- Coverage 75.17% 75.17% -0.01%
==========================================
Files 252 252
Lines 41049 41071 +22
==========================================
+ Hits 30860 30875 +15
- Misses 8094 8097 +3
- Partials 2095 2099 +4 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
177ceb3 to
9305124
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9305124dfa
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if !hasSystemTrustStore && route.Destination != nil { | ||
| for _, ds := range route.Destination.Settings { | ||
| if ds.TLS != nil && ds.TLS.UseSystemTrustStore { |
There was a problem hiding this comment.
Emit the shared trust-store secret for all TLS users
When WellKnownCACertificates: System is used only by a TLS upstream that is not an HTTP/TCP route destination—for example OpenTelemetry access logs/tracing or an EnvoyExtensionPolicy ExtProc backend—this new scan never sets GlobalResources.UseSystemTrustStore because it only walks route.Destination. However buildValidationContext now points every such TLS config at system_ca_certificates, so no SDS secret is emitted and Envoy receives a cluster with a missing validation-context secret.
Useful? React with 👍 / 👎.
| @@ -0,0 +1 @@ | |||
| Improved resource utilization for `BackendTLSPolicy` using `WellKnownCACertificates: System`. Previously, each policy created a separate SDS secret referencing the system CA bundle, resulting in one inotify watch per policy. All policies now share a single SDS secret. | |||
There was a problem hiding this comment.
Mark the xDS secret rename as breaking
This entry is under performance_improvements only, but the patch changes generated xDS names/content: clusters now reference system_ca_certificates instead of the per-policy CA secret and the old per-cluster Secret resources are removed. Per the release notes policy for xDS changes that can break EnvoyPatchPolicies/extension servers, this needs a breaking_changes fragment too; otherwise users relying on patching the old secret names will not get the upgrade warning.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR updates the xDS translation pipeline to avoid generating a per-backend SDS secret when upstream TLS validation uses the system trust store, instead having all such clusters reference a single shared SDS secret (system_ca_certificates) to reduce per-policy inotify watchers.
Changes:
- Switch upstream TLS validation to reference a shared SDS secret when
UseSystemTrustStoreis enabled (no more per-cluster secret emission). - Add global resource emission for the shared system trust store SDS secret.
- Update/extend golden testdata to reflect shared-secret behavior (including a new multi-backend system-truststore case).
Required fixes (blocking):
internal/gatewayapi/globalresources.go:scanXdsIRonly detectsUseSystemTrustStoreon HTTP/TCP route destinations. Other xDS IR destinations (notably AccessLog OpenTelemetry/ALS and Tracing) can also carrytls.useSystemTrustStore: true; with this PR they will referencesystem_ca_certificatesbut won’t trigger global secret emission, producing invalid xDS (clusters referencing a missing secret).release-notes/current/performance_improvements/9357-system-trust-store-single-sds-secret.md: This modifies existing generated xDS resource names/content (secrets), which can break existing EnvoyPatchPolicies/extension servers that patch the previous per-policy secret names; it should be called out as a breaking change (and typically placed underrelease-notes/current/breaking_changes/).
Reviewed changes
Copilot reviewed 22 out of 40 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| release-notes/current/performance_improvements/9357-system-trust-store-single-sds-secret.md | Adds release note for shared system trust store SDS secret (needs breaking-change callout). |
| internal/xds/translator/translator.go | Updates secret emission + validation-context secret naming for system trust store. |
| internal/xds/translator/globalresources.go | Introduces shared secret name constant and emits shared system trust store secret as a global resource. |
| internal/ir/xds.go | Adds GlobalResources.UseSystemTrustStore flag to drive shared secret emission. |
| internal/gatewayapi/globalresources.go | Sets GlobalResources.UseSystemTrustStore based on IR scan (scan currently incomplete). |
| internal/xds/translator/testdata/in/xds-ir/websocket-backend-force-http1-upstream.yaml | Sets globalResources.useSystemTrustStore: true in IR input. |
| internal/xds/translator/testdata/in/xds-ir/tcproute-mtls.yaml | Sets globalResources.useSystemTrustStore: true in IR input. |
| internal/xds/translator/testdata/in/xds-ir/httproute-with-tls-and-http.yaml | Sets globalResources.useSystemTrustStore: true in IR input. |
| internal/xds/translator/testdata/in/xds-ir/http-route-with-tls-system-truststore.yaml | Sets globalResources.useSystemTrustStore: true in IR input. |
| internal/xds/translator/testdata/in/xds-ir/http-route-multiple-system-truststore.yaml | Adds new IR input case with multiple backends using system trust store. |
| internal/xds/translator/testdata/in/xds-ir/http-route-dynamic-resolver-with-host-rewriting.yaml | Sets globalResources.useSystemTrustStore: true in IR input. |
| internal/xds/translator/testdata/in/xds-ir/backend-tls-settings.yaml | Sets globalResources.useSystemTrustStore: true in IR input. |
| internal/xds/translator/testdata/out/xds-ir/websocket-backend-force-http1-upstream.secrets.yaml | Updates expected secrets to single system_ca_certificates entry. |
| internal/xds/translator/testdata/out/xds-ir/websocket-backend-force-http1-upstream.clusters.yaml | Updates expected cluster secret refs to system_ca_certificates. |
| internal/xds/translator/testdata/out/xds-ir/tcproute-mtls.secrets.yaml | Updates expected secrets to include shared system trust store secret. |
| internal/xds/translator/testdata/out/xds-ir/tcproute-mtls.clusters.yaml | Updates expected cluster secret refs to system_ca_certificates. |
| internal/xds/translator/testdata/out/xds-ir/httproute-with-tls-and-http.secrets.yaml | Updates expected secrets to shared system trust store secret. |
| internal/xds/translator/testdata/out/xds-ir/httproute-with-tls-and-http.clusters.yaml | Updates expected cluster secret refs to system_ca_certificates. |
| internal/xds/translator/testdata/out/xds-ir/http-route-with-tls-system-truststore.secrets.yaml | Updates expected secrets to shared system trust store secret. |
| internal/xds/translator/testdata/out/xds-ir/http-route-with-tls-system-truststore.clusters.yaml | Updates expected cluster secret refs to system_ca_certificates. |
| internal/xds/translator/testdata/out/xds-ir/http-route-multiple-system-truststore.secrets.yaml | New expected secrets output for multi-backend shared secret case. |
| internal/xds/translator/testdata/out/xds-ir/http-route-multiple-system-truststore.routes.yaml | New expected routes output for multi-backend case. |
| internal/xds/translator/testdata/out/xds-ir/http-route-multiple-system-truststore.listeners.yaml | New expected listeners output for multi-backend case. |
| internal/xds/translator/testdata/out/xds-ir/http-route-multiple-system-truststore.endpoints.yaml | New expected endpoints output for multi-backend case. |
| internal/xds/translator/testdata/out/xds-ir/http-route-multiple-system-truststore.clusters.yaml | New expected clusters output for multi-backend shared secret case. |
| internal/xds/translator/testdata/out/xds-ir/http-route-dynamic-resolver-with-host-rewriting.secrets.yaml | Updates expected secrets to shared system trust store secret. |
| internal/xds/translator/testdata/out/xds-ir/http-route-dynamic-resolver-with-host-rewriting.clusters.yaml | Updates expected cluster secret refs to system_ca_certificates. |
| internal/xds/translator/testdata/out/xds-ir/backend-tls-settings.secrets.yaml | Updates expected secrets to dedupe system trust store into one shared secret. |
| internal/xds/translator/testdata/out/xds-ir/backend-tls-settings.clusters.yaml | Updates expected cluster secret refs to system_ca_certificates. |
| internal/xds/translator/testdata/out/extension-xds-ir/http-route-extension-translate-error.secrets.yaml | Adds expected shared system trust store secret output in extension error case. |
| internal/gatewayapi/testdata/tcproute-with-backendtlspolicy.out.yaml | Updates expected xDS IR to include globalResources.useSystemTrustStore: true. |
| internal/gatewayapi/testdata/httproute-with-tls-and-http.out.yaml | Updates expected xDS IR to include globalResources.useSystemTrustStore: true. |
| internal/gatewayapi/testdata/httproute-rule-with-non-service-backends-and-websocket-app-protocols.out.yaml | Updates expected xDS IR to include globalResources.useSystemTrustStore: true. |
| internal/gatewayapi/testdata/httproute-dynamic-resolver.out.yaml | Updates expected xDS IR to include globalResources.useSystemTrustStore: true. |
| internal/gatewayapi/testdata/httproute-dynamic-resolver-host-rewriting.out.yaml | Updates expected xDS IR to include globalResources.useSystemTrustStore: true. |
| internal/gatewayapi/testdata/gateway-tls-frontend-backend.out.yaml | Updates expected xDS IR to include globalResources.useSystemTrustStore: true. |
| internal/gatewayapi/testdata/envoyproxy-tls-settings.out.yaml | Updates expected xDS IR to include globalResources.useSystemTrustStore: true. |
| internal/gatewayapi/testdata/backendtlspolicy-system-truststore.out.yaml | Updates expected xDS IR to include globalResources.useSystemTrustStore: true. |
| internal/gatewayapi/testdata/backendtlspolicy-serviceimport-target.out.yaml | Updates expected xDS IR to include globalResources.useSystemTrustStore: true. |
| internal/gatewayapi/testdata/backend-tls-settings.out.yaml | Updates expected xDS IR to include globalResources.useSystemTrustStore: true. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if !hasSystemTrustStore { | ||
| for _, tcp := range xdsIR.TCP { | ||
| for _, route := range tcp.Routes { | ||
| if route.Destination == nil { | ||
| continue |
| @@ -0,0 +1 @@ | |||
| Improved resource utilization for `BackendTLSPolicy` using `WellKnownCACertificates: System`. Previously, each policy created a separate SDS secret referencing the system CA bundle, resulting in one inotify watch per policy. All policies now share a single SDS secret. | |||
Signed-off-by: Guy Daich <guy.daich@sap.com>
9305124 to
7d1c60c
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d9fa8c423c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if existing == nil { | ||
| return nil | ||
| } | ||
| vc := existing.GetValidationContext() | ||
| if vc == nil || vc.GetTrustedCa().GetFilename() != cert.SystemCertPath { |
There was a problem hiding this comment.
Reject missing or mutated shared trust-store secret
When an EnvoyPatchPolicy or post-translation extension renames/removes the generated system_ca_certificates secret, every cluster that used WellKnownCACertificates: System still references that SDS name, but this validation returns success because findXdsSecret no longer finds it. If the patch leaves the filename intact while adding other fields to the validation context, it also passes. In those patch/extension cases translation succeeds with clusters pointing at a missing or globally altered trust-store secret instead of surfacing the reserved-name error.
Useful? React with 👍 / 👎.
| func validateSystemTrustStoreSecret(tCtx *types.ResourceVersionTable) error { | ||
| existing := findXdsSecret(tCtx, SystemTrustStoreSecretName) | ||
| if existing == nil { | ||
| return nil | ||
| } | ||
| vc := existing.GetValidationContext() | ||
| if vc == nil || vc.GetTrustedCa().GetFilename() != cert.SystemCertPath { | ||
| return fmt.Errorf("secret name %q is reserved for the system trust store and cannot be used by other resources", SystemTrustStoreSecretName) | ||
| } | ||
| return nil | ||
| } |
What this PR does / why we need it:
To support dynamic reload of the system trust store, an sds secret must be created pointing to the filesystem location of the CA certificates. Currently, EG creates an SDS secret for each BTLSP. Each such sds secret causes envoy to establish a inotify file watcher.
This change uses a shared sds secert for the
WellKnownCACertificates: Systemcase, to reduce filewatch utilization.Which issue(s) this PR fixes:
Fixes #
PR Checklist
git commit -s). See DCO: Sign your work./api), the API was discussed and agreed before the implementation. The API change can be in a separate PR, or in the same PR, but the API must be agreed before implementation. N/A if this PR does not contain API changes.make generate gen-check,make lint, and the unit-test/coverage build pass. (Flaky e2e failures are not considered breakages, butgen-check,lint, and coverage MUST pass.)release-notes/current/<section>/<pr-number>-<slug>.md(seerelease-notes/current/README.mdfor sections and naming). N/A if this PR does not contain non-trivial changes.make gen-checkand committed the result if API/helm charts/modules changed.release-notes/current/breaking_changes/.