[release-4.21] OCPBUGS-XXXXX: Backport noOLM / Sail Library to release-4.21#1442
[release-4.21] OCPBUGS-XXXXX: Backport noOLM / Sail Library to release-4.21#1442gcs278 wants to merge 11 commits into
Conversation
|
Skipping CI for Draft Pull Request. |
|
@gcs278: This pull request references NE-2471 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.z" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
c647f7b to
b47ed5e
Compare
|
No longer pursuing this |
|
@gcs278: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Some new information makes this backport attractive again. |
|
@gcs278: Reopened this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
b47ed5e to
af43e28
Compare
530e487 to
96ad9e5
Compare
|
/test ? |
|
i manually pullled in #1444 for now - because we need to bump to istio 1.28.5, and might as well bump the GWAPI CRDs /test e2e-aws-operator-techpreview |
9b7956c to
9df9dba
Compare
| # with permissions that CIO itself holds. | ||
| # | ||
| # TODO - Future Improvements: | ||
| # 1. Consider removing 'escalate' and instead explicitly grant istiod the exact |
There was a problem hiding this comment.
we need to consider if we want to implement the fix for this escalate before the backport, or if we will backport the fix later. Anyway it would be good to keep a track on it
There was a problem hiding this comment.
Yea I agree. I wouldn't block the backports on it, but we can keep an eye.
| ) | ||
|
|
||
| // ensureOSSMtoSailLibraryMigration handles the upgrade migration from OLM-based | ||
| // Istio installation (4.21) to Helm-based installation via Sail Library (4.22). |
There was a problem hiding this comment.
nit: (maybe you fixed on a followup commit) - this comment is now wrong, given we are doing the migration also on 4.21
There was a problem hiding this comment.
ah ignore it, it is already fixed. I will first review commit by commit and then the big one picture
| // Build a zap development logger. | ||
| zapLogger, err := zap.NewDevelopment(zap.AddCallerSkip(1), zap.AddStacktrace(zap.FatalLevel)) | ||
| // Build a zap development logger with INFO level. | ||
| config := zap.NewDevelopmentConfig() |
There was a problem hiding this comment.
just a request for @rhamini3 and @melvinjoseph86 when doing the verify, take a look into CIO logs to see if they are being too verbose. I remember we have set this because sail library was issuing a debug message every X seconds, but I think this does not break anything
There was a problem hiding this comment.
Good point - verification will need to verify all of the bug fixes we are pulling it along with this including this one.
There was a problem hiding this comment.
sounds good, let me know when its ready @gcs278
| metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
| ) | ||
|
|
||
| const ( |
There was a problem hiding this comment.
more a side comment, we will also backport the orphan subscription condition IIRC right? But this should be a future plan
There was a problem hiding this comment.
Yep I think so - @aswinsuryan FYI. It would be really nice to have the alert/condition at the same time we merge this, but I wouldn't block these backports over it.
| state.unmanagedGatewayAPICRDNames = extension.UnmanagedGatewayAPICRDNames | ||
| } | ||
|
|
||
| if r.config.GatewayAPIControllerEnabled && r.config.MarketplaceEnabled && r.config.OperatorLifecycleManagerEnabled { |
There was a problem hiding this comment.
I think on this version you still need the guard of r.config.GatewayAPIControllerEnabled, don't you?
There was a problem hiding this comment.
Good catch - yes, technically it wouldn't be an issue as this feature gate is now GA and the result of dropping this is always running this Gateway API logic, but best to be consistent.
| k8s.io/client-go v0.34.3 | ||
| k8s.io/utils v0.0.0-20250820121507-0af2bda4dd1d | ||
| sigs.k8s.io/controller-runtime v0.22.5 | ||
| sigs.k8s.io/gateway-api v1.4.1 |
There was a problem hiding this comment.
more like a nitpick'ish, but even you doing replace on the bottom, usually people come to go.mod (myself included) trying to figure out which Gateway API was used and will not roll down to the end of the file.
Maybe for these backports, add a comment that this may not be the real version and it may be replaced on the bottom of the file?
There was a problem hiding this comment.
Happy to do that, it may trigger more merge conflicts in backports of the go.mod for future backports, but backports to the go.mod always conflict anyways.
Cherry-picked from: ed2eb36 openshift#1354 Conflicts resolved: - pkg/operator/controller/status/controller.go: Took incoming noOLM logic (useOLM/useSailLibrary, conditional subscription listing) but wrapped in existing 4.21 GatewayAPIEnabled guard. Restored GatewayAPIControllerEnabled guard that was present in the original condition but dropped during cherry-pick.
…rRole Cherry-picked from: a758d83 openshift#1354
Cherry-picked from: 9c4d792 openshift#1354 Conflicts resolved: - test/e2e/gateway_api_test.go: Kept 4.21 gatewayAPIControllerEnabled guard, added gatewayAPIWithoutOLMEnabled conditionals inside it. Kept xcrdNames alongside new istioCRDNames. Removed testCRDNames declaration since it was subsequently added to release-4.21 via PR openshift#1446. - Removed references to testGatewayAPIInfrastructureAnnotations, testGatewayAPIInternalLoadBalancer, and testGatewayOpenshiftConditions which were added in separate PRs not present on release-4.21.
Cherry-picked from: 955a5c0 openshift#1354
Cherry-picked from: 6d2c6c8 openshift#1402
Cherry-picked from: 43c978a openshift#1404 Conflicts resolved: - go.mod: Aslak's development branch introduced transitive dependencies that diverged from release-4.21. Reset go.mod to the release-4.21 baseline, then added only the openshift-service-mesh/sail-operator replace directive and bumped gateway-api to v1.4.1.
Cherry-picked from: b1bbbb7 openshift#1404
6a8043c to
12879f2
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Updates from @rikatz review:
|
|
/testwith openshift/cluster-ingress-operator/release-4.21/e2e-aws-operator-techpreview openshift/api#2873 |
|
/testwith openshift/cluster-ingress-operator/release-4.21/e2e-aws-ovn-techpreview openshift/api#2873 openshift/origin#31232 |
|
@gcs278: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
Comfirmed passing No-OLM Origin TP Tests here: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/multi-pr-openshift-cluster-ingress-operator-1442-openshift-api-2873-openshift-origin-31232-e2e-aws-ovn-techpreview/2062289410233208832 And confirmed passing pre-submit TP tests here: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/multi-pr-openshift-cluster-ingress-operator-1442-openshift-api-2873-e2e-aws-operator-techpreview/2062284169832042496 |
|
@gcs278: No Jira issue is referenced in the title of this pull request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Sorry I think we forgot to assign Aswin to this one |
|
@gcs278: GitHub didn't allow me to assign the following users: aswinsuryan. Note that only openshift members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
| }); err != nil { | ||
| return reconcile.Result{}, fmt.Errorf("failed to list gateway classes: %w", err) | ||
| } | ||
| if countActiveGatewayClasses(&gatewayClassList, gatewayClass.Name) == 0 { |
There was a problem hiding this comment.
Could there be a race here, where a new GatewayClass adds its finalizer after countActiveGatewayClasses() returns zero but before Uninstall() completes, causing the new class to require an Istio instance that was just removed?
Summary
Backport of the noOLM / Sail Library installation path (NE-2286, shipped in 4.22) to release-4.21. This resolves several fundamental OLM bugs that have no viable OLM-based workaround — most critically OCPBUGS-86778, which blocks all OSSM z-stream upgrades and prevents shipping CVE fixes.
This PR is intended to merge with the
GatewayAPIWithoutOLMfeature gate disabled, making it a no-op on merge. The goal is to subsequently enable the gate by default (via openshift/api) to activate the Sail Library path and resolve the OLM issues.Cherry-picked PRs
istio_sail_installer.go,istio_olm.gorefactor,migration.go,status.go, CRD manifests, Sail Library RBAC manifestsNote: #1393 (OCPBUGS-79667: Use feature-gate annotation for Sail Library RBAC) was also a dependency but is being skipped because CVO on this release does not support the
release.openshift.io/feature-gateannotation (openshift/cluster-version-operator#1273 was not backported). As a result, the Sail Library RBAC manifests use therelease.openshift.io/feature-setannotation and a separate PR will be needed to remove this annotation before promoting the feature gate to GA.Versioning
This backport does not bump the Gateway API CRDs (remain at v1.3.0) or the Istio version (remains at v1.27.3) for the noOLM code path. When the
GatewayAPIWithoutOLMfeature gate is enabled, the Sail Library will install Istio using the same v1.27.3 version that the OLM path currently uses. This works because the vendored Sail Library (OSSM 3.3.1) still supports Istio 1.27.3.The GWAPI CRD bump to v1.4.1 and Istio version bump to v1.28.5 will follow separately via #1444, allowing us to validate the noOLM path independently from the version changes.
When noOLM shipped in 4.22, the OLM and noOLM versions were already aligned at 3.3.1, so version separation was not needed. On 4.21, the OLM path is on 3.2.0 — keeping both paths at the same Istio version avoids introducing conditional logic or separate deployment manifests in the backport.
Conflicts resolved
pkg/operator/operator.go: AddedGatewayAPIWithoutOLMgate alongside existing 4.21 gates (GatewayAPI,GatewayAPIController,RouteExternalCertificate)pkg/operator/controller/status/controller.go: Took incoming noOLM logic (useOLM/useSailLibrary, conditional subscription listing) but wrapped in existing 4.21GatewayAPIEnabledguardtest/e2e/gateway_api_test.go: Kept 4.21gatewayAPIControllerEnabledguard, addedgatewayAPIWithoutOLMEnabledconditionals inside for Sail Library vs OLM test selection. KeptxcrdNamesalongside newistioCRDNames. Removed references totestGatewayAPIInfrastructureAnnotations,testGatewayAPIInternalLoadBalancer, andtestGatewayOpenshiftConditionswhich were added in separate PRs not present on release-4.21.go.mod/ vendor**: Addedreplacedirectives foropenshift/api(fork with gate) andsail-operator(downstream fork withpkg/install)pkg/operator/controller/canary/daemonset.go(OCPBUGS-79467: Change default log level from DEBUG to INFO #1402 commit 2): Skipped — references canary cert hash variables not present on 4.21Rollout Plan
Phase 1 — Land code (gate OFF)
Phase 2 — TechPreview soak
Phase 3 — GA promotion
Follow-up
Go Dependency Updates
Transitive dependency changes
The sail-operator (OSSM 3.3.1) brings in new transitive dependencies for Helm chart rendering (
helm.sh/helm/v3), Istio utility libraries (istio.io/istio/pkg/log,pkg/ptr,pkg/slices,pkg/util/sets), and their dependency chains. These are all indirect — vendored but not imported by CIO code directly. k8s modules received a patch bump (0.34.1 → 0.34.3) fromgo mod tidy. Both are low risk.controller-runtime (pinned: v0.22.5 → v0.21.0)
The sail-operator requires controller-runtime v0.22.5, but we pin back to v0.21.0 — the version CIO's own code was built and tested against on 4.21. CIO's core controller logic (client, cache, manager, controller wiring) is unchanged and continues to run against the same controller-runtime it shipped with. The sail library's install package only uses basic
client.Clientoperations (New,Get,Create,Update) andpkg/log— all unchanged since controller-runtime v0.1. No other vendored dependency calls controller-runtime APIs.On 4.21, this pin is not strictly required since 4.21 is already on k8s 0.34 and a patch bump poses no compatibility risk. However, on 4.20 and 4.19 the pin is essential because controller-runtime 0.22 would force a k8s minor version bump, causing incompatibilities with the frozen openshift ecosystem packages (client-go, library-go). Pinning here maintains a consistent approach across all three backport branches.
gateway-api (pinned: v1.4.1 → v1.3.0)
The sail-operator pulls in gateway-api v1.4.1, but we pin back to v1.3.0 (the original 4.21 version). The CRD manifests shipped in this release are v1.3.0, and the Go types are forward-compatible. Pinning keeps the vendored types aligned with the CRDs installed on the cluster.
Verification
go build ./pkg/operator/controller/gatewayclass/...compilesgo test ./pkg/operator/controller/gatewayclass/...passes🤖 Generated with Claude Code