fix(cli): print clear timeout message when argocd app wait times out#28274
Open
pncloud wants to merge 7 commits into
Open
fix(cli): print clear timeout message when argocd app wait times out#28274pncloud wants to merge 7 commits into
pncloud wants to merge 7 commits into
Conversation
…d files Add a new --file/-f flag to the argocd app list command that filters applications affected by a given list of changed files. For each application, the filter evaluates: - The application source path - The argocd.argoproj.io/manifest-generate-paths annotation (if set) This is useful in monorepo setups where --repo returns too many applications and users need to find only apps affected by specific changed files (e.g. in a CI pipeline). Also adds a FilterByFiles function to util/argo/argo.go and corresponding unit tests. Signed-off-by: pncloud <pmn3232@gmail.com>
Signed-off-by: pncloud <pmn3232@gmail.com>
…filter tests - Iterate over all app sources (GetSources) instead of only the first source (GetSource) so multi-source apps are correctly matched - Add unit test for multi-source app matching in TestFilterByFiles - Add TestFilterByPathAndFiles to verify behavior when --path and --file flags are used together Signed-off-by: pncloud <pmn3232@gmail.com>
…Files Sources without a path (e.g. Helm chart sources using spec.source.chart) would cause AppFilesHaveChanged to return true for every file since refreshPaths would be empty. Skip such sources to avoid false positives. Added unit test to verify Helm chart sources are correctly skipped. Signed-off-by: pncloud <pmn3232@gmail.com>
…rce apps FilterByRepo, FilterByRepoP, and FilterByPath all used GetSource() which only returns the first source for multi-source applications. This caused multi-source apps to be incorrectly excluded from filtered results when the matching repo/path was on any source other than the first. Fix all three functions to iterate GetSources() and break on first match, consistent with the approach used in FilterByFiles. Add multi-source sub-tests to TestFilterByRepo, TestFilterByRepoP, and TestFilterByPath to cover the second-source match case. Signed-off-by: pncloud <pmn3232@gmail.com>
When argocd app wait --timeout N exceeds its limit, a context
cancellation propagates through the gRPC stream and can surface as the
cryptic fatal error:
level=fatal msg="rpc error: code = Canceled desc = context canceled"
This is confusing because users don't know whether the command timed out
or something else went wrong.
Two fixes:
1. In printFinalStatus, skip the app refresh if the context is already
canceled. This prevents a race between the AfterFunc timeout handler
setting refresh=false and the watch loop setting refresh=true, which
previously caused the refresh Get call to fail with context.Canceled
and crash the process before the timed-out error could be returned.
2. In the app wait Run function, detect context cancellation / deadline
exceeded errors (both stdlib and gRPC-wrapped) and replace them with
a clear message:
level=fatal msg="timed out (Ns) waiting for app X to match the expected conditions"
Add TestIsContextCanceledErr unit tests covering nil, context.Canceled,
context.DeadlineExceeded, gRPC Canceled, gRPC DeadlineExceeded, and
unrelated error cases.
Fixes argoproj#14705
Signed-off-by: pncloud <pmn3232@gmail.com>
✅ Preview Environment deployed on Bunnyshell
See: Environment Details | Pipeline Logs Available commands (reply to this comment):
|
Bundle ReportBundle size has no change ✅ |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #28274 +/- ##
=========================================
Coverage ? 64.83%
=========================================
Files ? 425
Lines ? 59132
Branches ? 0
=========================================
Hits ? 38336
Misses ? 17234
Partials ? 3562 ☔ View full report in Codecov by Harness. |
ppapapetrou76
left a comment
Contributor
There was a problem hiding this comment.
The code changes are not aligned with the PR title
Can you please clarify the purpose of this PR?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When
argocd app wait --timeout Nexceeds its limit, a context cancellation propagates through the gRPC stream and can surface as:This is confusing — users don't know if the command timed out or something else went wrong.
Fixes #14705
Root Cause
Two issues:
Race condition in
printFinalStatus: The AfterFunc timeout handler setsrefresh = falsebefore callingcancel(), but the watch loop can setrefresh = trueconcurrently. When the context is then canceled andprintFinalStatusruns, it attemptsappClient.Get(ctx, ...)with an already-canceled context → crashes withcontext canceledbefore the clean timeout error can be returned.No fallback message for context-canceled errors at the call site: Even without the race, if a context-canceled error reaches
errors.CheckError, the user sees the raw gRPC error string.Changes
cmd/argocd/commands/app.go:printFinalStatus, skip the app refresh ifctx.Err() != nil(context already canceled)isContextCanceledErrhelper (detects both stdlib and gRPC-wrapped context errors)app waitRun function, callisContextCanceledErrand emit a clear message:cmd/argocd/commands/app_test.go: AddTestIsContextCanceledErrwith 6 sub-tests (nil,context.Canceled,context.DeadlineExceeded, gRPC Canceled, gRPC DeadlineExceeded, unrelated error)