Skip to content

fix(cli): print clear timeout message when argocd app wait times out#28274

Open
pncloud wants to merge 7 commits into
argoproj:masterfrom
pncloud:fix/app-wait-timeout-message-14705
Open

fix(cli): print clear timeout message when argocd app wait times out#28274
pncloud wants to merge 7 commits into
argoproj:masterfrom
pncloud:fix/app-wait-timeout-message-14705

Conversation

@pncloud

@pncloud pncloud commented Jun 12, 2026

Copy link
Copy Markdown

Summary

When argocd app wait --timeout N exceeds its limit, a context cancellation propagates through the gRPC stream and can surface as:

level=fatal msg="rpc error: code = Canceled desc = context canceled"

This is confusing — users don't know if the command timed out or something else went wrong.

Fixes #14705

Root Cause

Two issues:

  1. Race condition in printFinalStatus: The AfterFunc timeout handler sets refresh = false before calling cancel(), but the watch loop can set refresh = true concurrently. When the context is then canceled and printFinalStatus runs, it attempts appClient.Get(ctx, ...) with an already-canceled context → crashes with context canceled before the clean timeout error can be returned.

  2. No fallback message for context-canceled errors at the call site: Even without the race, if a context-canceled error reaches errors.CheckError, the user sees the raw gRPC error string.

Changes

  • cmd/argocd/commands/app.go:

    • In printFinalStatus, skip the app refresh if ctx.Err() != nil (context already canceled)
    • Add isContextCanceledErr helper (detects both stdlib and gRPC-wrapped context errors)
    • In the app wait Run function, call isContextCanceledErr and emit a clear message:
      level=fatal msg="timed out (1800s) waiting for app "my-app" to match the expected conditions"
      
  • cmd/argocd/commands/app_test.go: Add TestIsContextCanceledErr with 6 sub-tests (nil, context.Canceled, context.DeadlineExceeded, gRPC Canceled, gRPC DeadlineExceeded, unrelated error)

pncloud added 7 commits June 11, 2026 12:03
…d files

Add a new --file/-f flag to the argocd app list command that filters
applications affected by a given list of changed files.

For each application, the filter evaluates:
- The application source path
- The argocd.argoproj.io/manifest-generate-paths annotation (if set)

This is useful in monorepo setups where --repo returns too many
applications and users need to find only apps affected by specific
changed files (e.g. in a CI pipeline).

Also adds a FilterByFiles function to util/argo/argo.go and
corresponding unit tests.

Signed-off-by: pncloud <pmn3232@gmail.com>
Signed-off-by: pncloud <pmn3232@gmail.com>
…filter tests

- Iterate over all app sources (GetSources) instead of only the first
  source (GetSource) so multi-source apps are correctly matched
- Add unit test for multi-source app matching in TestFilterByFiles
- Add TestFilterByPathAndFiles to verify behavior when --path and
  --file flags are used together

Signed-off-by: pncloud <pmn3232@gmail.com>
…Files

Sources without a path (e.g. Helm chart sources using spec.source.chart)
would cause AppFilesHaveChanged to return true for every file since
refreshPaths would be empty. Skip such sources to avoid false positives.

Added unit test to verify Helm chart sources are correctly skipped.

Signed-off-by: pncloud <pmn3232@gmail.com>
…rce apps

FilterByRepo, FilterByRepoP, and FilterByPath all used GetSource() which
only returns the first source for multi-source applications. This caused
multi-source apps to be incorrectly excluded from filtered results when
the matching repo/path was on any source other than the first.

Fix all three functions to iterate GetSources() and break on first match,
consistent with the approach used in FilterByFiles.

Add multi-source sub-tests to TestFilterByRepo, TestFilterByRepoP, and
TestFilterByPath to cover the second-source match case.

Signed-off-by: pncloud <pmn3232@gmail.com>
When argocd app wait --timeout N exceeds its limit, a context
cancellation propagates through the gRPC stream and can surface as the
cryptic fatal error:

  level=fatal msg="rpc error: code = Canceled desc = context canceled"

This is confusing because users don't know whether the command timed out
or something else went wrong.

Two fixes:

1. In printFinalStatus, skip the app refresh if the context is already
   canceled. This prevents a race between the AfterFunc timeout handler
   setting refresh=false and the watch loop setting refresh=true, which
   previously caused the refresh Get call to fail with context.Canceled
   and crash the process before the timed-out error could be returned.

2. In the app wait Run function, detect context cancellation / deadline
   exceeded errors (both stdlib and gRPC-wrapped) and replace them with
   a clear message:
     level=fatal msg="timed out (Ns) waiting for app X to match the expected conditions"

Add TestIsContextCanceledErr unit tests covering nil, context.Canceled,
context.DeadlineExceeded, gRPC Canceled, gRPC DeadlineExceeded, and
unrelated error cases.

Fixes argoproj#14705

Signed-off-by: pncloud <pmn3232@gmail.com>
@pncloud pncloud requested review from a team as code owners June 12, 2026 18:27
@bunnyshell

bunnyshell Bot commented Jun 12, 2026

Copy link
Copy Markdown

✅ Preview Environment deployed on Bunnyshell

Component Endpoints
argocd https://argocd-uwezjs.bunnyenv.com/
argocd-ttyd https://argocd-web-cli-uwezjs.bunnyenv.com/

See: Environment Details | Pipeline Logs

Available commands (reply to this comment):

  • 🔴 /bns:stop to stop the environment
  • 🚀 /bns:deploy to redeploy the environment
  • /bns:delete to remove the environment

@codecov

codecov Bot commented Jun 12, 2026

Copy link
Copy Markdown

Bundle Report

Bundle size has no change ✅

@codecov

codecov Bot commented Jun 12, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 84.90566% with 8 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (master@f6ade14). Learn more about missing BASE report.

Files with missing lines Patch % Lines
cmd/argocd/commands/app.go 61.90% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##             master   #28274   +/-   ##
=========================================
  Coverage          ?   64.83%           
=========================================
  Files             ?      425           
  Lines             ?    59132           
  Branches          ?        0           
=========================================
  Hits              ?    38336           
  Misses            ?    17234           
  Partials          ?     3562           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

@ppapapetrou76 ppapapetrou76 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code changes are not aligned with the PR title
Can you please clarify the purpose of this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve timeout message for argocd app wait

2 participants