Skip to content

fix: use shared system trust store sds secret#9357

Draft
guydc wants to merge 2 commits into
envoyproxy:mainfrom
guydc:feat-dedupe-truststore-sds
Draft

fix: use shared system trust store sds secret#9357
guydc wants to merge 2 commits into
envoyproxy:mainfrom
guydc:feat-dedupe-truststore-sds

Conversation

@guydc

@guydc guydc commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

What this PR does / why we need it:
To support dynamic reload of the system trust store, an sds secret must be created pointing to the filesystem location of the CA certificates. Currently, EG creates an SDS secret for each BTLSP. Each such sds secret causes envoy to establish a inotify file watcher.

This change uses a shared sds secert for the WellKnownCACertificates: System case, to reduce filewatch utilization.

Which issue(s) this PR fixes:

Fixes #


PR Checklist

  • Authorship & ownership: Coding agents / AI assistants are welcome, but I have reviewed every change, understand how and why it works, can explain and maintain it, and take full responsibility for this PR. I have not submitted generated output I do not understand.
  • DCO: All commits are signed off (git commit -s). See DCO: Sign your work.
  • API agreed first: If this PR contains API changes (changes under /api), the API was discussed and agreed before the implementation. The API change can be in a separate PR, or in the same PR, but the API must be agreed before implementation. N/A if this PR does not contain API changes.
  • Required checks pass: make generate gen-check, make lint, and the unit-test/coverage build pass. (Flaky e2e failures are not considered breakages, but gen-check, lint, and coverage MUST pass.)
  • Tests added/updated: New/changed code is covered by appropriate tests. N/A if this PR does not contain code changes.
  • Docs: User-facing changes update the docs, either in this PR or a follow-up PR. N/A if this PR does not contain user-facing changes.
  • Release notes: For any non-trivial change, added a release-note fragment under release-notes/current/<section>/<pr-number>-<slug>.md (see release-notes/current/README.md for sections and naming). N/A if this PR does not contain non-trivial changes.
  • Generated files committed: Ran make gen-check and committed the result if API/helm charts/modules changed.
  • Scope & compatibility: The PR is reasonably scoped (no unrelated changes) and preserves backward compatibility, or any breaking change is called out above and documented in release-notes/current/breaking_changes/.
  • Codex review: Requested a Codex review and addressed all of its comments.
  • Copilot review: Requested a Copilot review and addressed all of its comments.

@netlify

netlify Bot commented Jun 25, 2026

Copy link
Copy Markdown

Deploy Preview for cerulean-figolla-1f9435 ready!

Name Link
🔨 Latest commit d9fa8c4
🔍 Latest deploy log https://app.netlify.com/projects/cerulean-figolla-1f9435/deploys/6a3ed4bdad67c10008faa188
😎 Deploy Preview https://deploy-preview-9357--cerulean-figolla-1f9435.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 77.77778% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.17%. Comparing base (39850aa) to head (d9fa8c4).

Files with missing lines Patch % Lines
internal/xds/translator/translator.go 60.00% 3 Missing and 3 partials ⚠️
internal/xds/translator/globalresources.go 90.47% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9357      +/-   ##
==========================================
- Coverage   75.17%   75.17%   -0.01%     
==========================================
  Files         252      252              
  Lines       41049    41071      +22     
==========================================
+ Hits        30860    30875      +15     
- Misses       8094     8097       +3     
- Partials     2095     2099       +4     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@guydc guydc force-pushed the feat-dedupe-truststore-sds branch 2 times, most recently from 177ceb3 to 9305124 Compare June 26, 2026 16:52
@guydc guydc requested a review from Copilot June 26, 2026 16:53
@guydc

guydc commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

@codex

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9305124dfa

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/gatewayapi/globalresources.go Outdated
Comment on lines +124 to +126
if !hasSystemTrustStore && route.Destination != nil {
for _, ds := range route.Destination.Settings {
if ds.TLS != nil && ds.TLS.UseSystemTrustStore {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Emit the shared trust-store secret for all TLS users

When WellKnownCACertificates: System is used only by a TLS upstream that is not an HTTP/TCP route destination—for example OpenTelemetry access logs/tracing or an EnvoyExtensionPolicy ExtProc backend—this new scan never sets GlobalResources.UseSystemTrustStore because it only walks route.Destination. However buildValidationContext now points every such TLS config at system_ca_certificates, so no SDS secret is emitted and Envoy receives a cluster with a missing validation-context secret.

Useful? React with 👍 / 👎.

@@ -0,0 +1 @@
Improved resource utilization for `BackendTLSPolicy` using `WellKnownCACertificates: System`. Previously, each policy created a separate SDS secret referencing the system CA bundle, resulting in one inotify watch per policy. All policies now share a single SDS secret.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Mark the xDS secret rename as breaking

This entry is under performance_improvements only, but the patch changes generated xDS names/content: clusters now reference system_ca_certificates instead of the per-policy CA secret and the old per-cluster Secret resources are removed. Per the release notes policy for xDS changes that can break EnvoyPatchPolicies/extension servers, this needs a breaking_changes fragment too; otherwise users relying on patching the old secret names will not get the upgrade warning.

Useful? React with 👍 / 👎.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the xDS translation pipeline to avoid generating a per-backend SDS secret when upstream TLS validation uses the system trust store, instead having all such clusters reference a single shared SDS secret (system_ca_certificates) to reduce per-policy inotify watchers.

Changes:

  • Switch upstream TLS validation to reference a shared SDS secret when UseSystemTrustStore is enabled (no more per-cluster secret emission).
  • Add global resource emission for the shared system trust store SDS secret.
  • Update/extend golden testdata to reflect shared-secret behavior (including a new multi-backend system-truststore case).

Required fixes (blocking):

  • internal/gatewayapi/globalresources.go: scanXdsIR only detects UseSystemTrustStore on HTTP/TCP route destinations. Other xDS IR destinations (notably AccessLog OpenTelemetry/ALS and Tracing) can also carry tls.useSystemTrustStore: true; with this PR they will reference system_ca_certificates but won’t trigger global secret emission, producing invalid xDS (clusters referencing a missing secret).
  • release-notes/current/performance_improvements/9357-system-trust-store-single-sds-secret.md: This modifies existing generated xDS resource names/content (secrets), which can break existing EnvoyPatchPolicies/extension servers that patch the previous per-policy secret names; it should be called out as a breaking change (and typically placed under release-notes/current/breaking_changes/).

Reviewed changes

Copilot reviewed 22 out of 40 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
release-notes/current/performance_improvements/9357-system-trust-store-single-sds-secret.md Adds release note for shared system trust store SDS secret (needs breaking-change callout).
internal/xds/translator/translator.go Updates secret emission + validation-context secret naming for system trust store.
internal/xds/translator/globalresources.go Introduces shared secret name constant and emits shared system trust store secret as a global resource.
internal/ir/xds.go Adds GlobalResources.UseSystemTrustStore flag to drive shared secret emission.
internal/gatewayapi/globalresources.go Sets GlobalResources.UseSystemTrustStore based on IR scan (scan currently incomplete).
internal/xds/translator/testdata/in/xds-ir/websocket-backend-force-http1-upstream.yaml Sets globalResources.useSystemTrustStore: true in IR input.
internal/xds/translator/testdata/in/xds-ir/tcproute-mtls.yaml Sets globalResources.useSystemTrustStore: true in IR input.
internal/xds/translator/testdata/in/xds-ir/httproute-with-tls-and-http.yaml Sets globalResources.useSystemTrustStore: true in IR input.
internal/xds/translator/testdata/in/xds-ir/http-route-with-tls-system-truststore.yaml Sets globalResources.useSystemTrustStore: true in IR input.
internal/xds/translator/testdata/in/xds-ir/http-route-multiple-system-truststore.yaml Adds new IR input case with multiple backends using system trust store.
internal/xds/translator/testdata/in/xds-ir/http-route-dynamic-resolver-with-host-rewriting.yaml Sets globalResources.useSystemTrustStore: true in IR input.
internal/xds/translator/testdata/in/xds-ir/backend-tls-settings.yaml Sets globalResources.useSystemTrustStore: true in IR input.
internal/xds/translator/testdata/out/xds-ir/websocket-backend-force-http1-upstream.secrets.yaml Updates expected secrets to single system_ca_certificates entry.
internal/xds/translator/testdata/out/xds-ir/websocket-backend-force-http1-upstream.clusters.yaml Updates expected cluster secret refs to system_ca_certificates.
internal/xds/translator/testdata/out/xds-ir/tcproute-mtls.secrets.yaml Updates expected secrets to include shared system trust store secret.
internal/xds/translator/testdata/out/xds-ir/tcproute-mtls.clusters.yaml Updates expected cluster secret refs to system_ca_certificates.
internal/xds/translator/testdata/out/xds-ir/httproute-with-tls-and-http.secrets.yaml Updates expected secrets to shared system trust store secret.
internal/xds/translator/testdata/out/xds-ir/httproute-with-tls-and-http.clusters.yaml Updates expected cluster secret refs to system_ca_certificates.
internal/xds/translator/testdata/out/xds-ir/http-route-with-tls-system-truststore.secrets.yaml Updates expected secrets to shared system trust store secret.
internal/xds/translator/testdata/out/xds-ir/http-route-with-tls-system-truststore.clusters.yaml Updates expected cluster secret refs to system_ca_certificates.
internal/xds/translator/testdata/out/xds-ir/http-route-multiple-system-truststore.secrets.yaml New expected secrets output for multi-backend shared secret case.
internal/xds/translator/testdata/out/xds-ir/http-route-multiple-system-truststore.routes.yaml New expected routes output for multi-backend case.
internal/xds/translator/testdata/out/xds-ir/http-route-multiple-system-truststore.listeners.yaml New expected listeners output for multi-backend case.
internal/xds/translator/testdata/out/xds-ir/http-route-multiple-system-truststore.endpoints.yaml New expected endpoints output for multi-backend case.
internal/xds/translator/testdata/out/xds-ir/http-route-multiple-system-truststore.clusters.yaml New expected clusters output for multi-backend shared secret case.
internal/xds/translator/testdata/out/xds-ir/http-route-dynamic-resolver-with-host-rewriting.secrets.yaml Updates expected secrets to shared system trust store secret.
internal/xds/translator/testdata/out/xds-ir/http-route-dynamic-resolver-with-host-rewriting.clusters.yaml Updates expected cluster secret refs to system_ca_certificates.
internal/xds/translator/testdata/out/xds-ir/backend-tls-settings.secrets.yaml Updates expected secrets to dedupe system trust store into one shared secret.
internal/xds/translator/testdata/out/xds-ir/backend-tls-settings.clusters.yaml Updates expected cluster secret refs to system_ca_certificates.
internal/xds/translator/testdata/out/extension-xds-ir/http-route-extension-translate-error.secrets.yaml Adds expected shared system trust store secret output in extension error case.
internal/gatewayapi/testdata/tcproute-with-backendtlspolicy.out.yaml Updates expected xDS IR to include globalResources.useSystemTrustStore: true.
internal/gatewayapi/testdata/httproute-with-tls-and-http.out.yaml Updates expected xDS IR to include globalResources.useSystemTrustStore: true.
internal/gatewayapi/testdata/httproute-rule-with-non-service-backends-and-websocket-app-protocols.out.yaml Updates expected xDS IR to include globalResources.useSystemTrustStore: true.
internal/gatewayapi/testdata/httproute-dynamic-resolver.out.yaml Updates expected xDS IR to include globalResources.useSystemTrustStore: true.
internal/gatewayapi/testdata/httproute-dynamic-resolver-host-rewriting.out.yaml Updates expected xDS IR to include globalResources.useSystemTrustStore: true.
internal/gatewayapi/testdata/gateway-tls-frontend-backend.out.yaml Updates expected xDS IR to include globalResources.useSystemTrustStore: true.
internal/gatewayapi/testdata/envoyproxy-tls-settings.out.yaml Updates expected xDS IR to include globalResources.useSystemTrustStore: true.
internal/gatewayapi/testdata/backendtlspolicy-system-truststore.out.yaml Updates expected xDS IR to include globalResources.useSystemTrustStore: true.
internal/gatewayapi/testdata/backendtlspolicy-serviceimport-target.out.yaml Updates expected xDS IR to include globalResources.useSystemTrustStore: true.
internal/gatewayapi/testdata/backend-tls-settings.out.yaml Updates expected xDS IR to include globalResources.useSystemTrustStore: true.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/gatewayapi/globalresources.go Outdated
Comment on lines +137 to +141
if !hasSystemTrustStore {
for _, tcp := range xdsIR.TCP {
for _, route := range tcp.Routes {
if route.Destination == nil {
continue
@@ -0,0 +1 @@
Improved resource utilization for `BackendTLSPolicy` using `WellKnownCACertificates: System`. Previously, each policy created a separate SDS secret referencing the system CA bundle, resulting in one inotify watch per policy. All policies now share a single SDS secret.
Signed-off-by: Guy Daich <guy.daich@sap.com>
@guydc guydc force-pushed the feat-dedupe-truststore-sds branch from 9305124 to 7d1c60c Compare June 26, 2026 19:36
@guydc guydc requested a review from Copilot June 26, 2026 20:29
@guydc

guydc commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

@codex

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d9fa8c423c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +93 to +97
if existing == nil {
return nil
}
vc := existing.GetValidationContext()
if vc == nil || vc.GetTrustedCa().GetFilename() != cert.SystemCertPath {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject missing or mutated shared trust-store secret

When an EnvoyPatchPolicy or post-translation extension renames/removes the generated system_ca_certificates secret, every cluster that used WellKnownCACertificates: System still references that SDS name, but this validation returns success because findXdsSecret no longer finds it. If the patch leaves the filename intact while adding other fields to the validation context, it also passes. In those patch/extension cases translation succeeds with clusters pointing at a missing or globally altered trust-store secret instead of surfacing the reserved-name error.

Useful? React with 👍 / 👎.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 23 changed files in this pull request and generated 1 comment.

Comment on lines +91 to +101
func validateSystemTrustStoreSecret(tCtx *types.ResourceVersionTable) error {
existing := findXdsSecret(tCtx, SystemTrustStoreSecretName)
if existing == nil {
return nil
}
vc := existing.GetValidationContext()
if vc == nil || vc.GetTrustedCa().GetFilename() != cert.SystemCertPath {
return fmt.Errorf("secret name %q is reserved for the system trust store and cannot be used by other resources", SystemTrustStoreSecretName)
}
return nil
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants