Skip to content

[ci] Resolve slow github polling and hook handling errors#15493

Open
cjllanwarne wants to merge 11 commits into
hail-is:mainfrom
cjllanwarne:cjl_ci_github_broken
Open

[ci] Resolve slow github polling and hook handling errors#15493
cjllanwarne wants to merge 11 commits into
hail-is:mainfrom
cjllanwarne:cjl_ci_github_broken

Conversation

@cjllanwarne
Copy link
Copy Markdown
Collaborator

@cjllanwarne cjllanwarne commented May 21, 2026

Change Description

Two main changes:

  • Switch away from a sequence of per-PR per-statuscheck polling which takes ages and runs into an internal CI timeout, and instead do a single graphQL poll that gets all PR review and check information in a single request
  • Fix a no-batch-yet bug in the github hook handler which was causing errors to be thrown in logs and status reporting to be delayed in the system.

Additionally, sets up a test framework in CI that can mock out GH to test the polling logic, and provide default environment variables during test runs so that we don't fail because of "global config missing" errors.

Security Assessment

  • This change potentially impacts the Hail Batch instance as deployed by Broad Institute in GCP

Impact Rating

  • This change has a low security impact

Impact Description

Small internal changes in how we poll github and how we manage internal state when we get state update hook calls.

Appsec Review

  • Required: The impact has been assessed and approved by appsec

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the CI service’s GitHub integration to reduce polling latency by batching PR review/check retrieval into a single (paginated) GraphQL query, and fixes an error case in the deploy status endpoint when no deploy batch exists yet.

Changes:

  • Replace per-PR GraphQL polling with _fetch_pr_github_data, a batched GraphQL query that fetches reviewDecision + statusCheckRollup contexts for all open PRs at once (with pagination support).
  • Refactor PR GitHub state application into PR._apply_github_data and update WatchedBranch’s GitHub update flow to use the batched results (including additional summary logging).
  • Add unit tests and pytest configuration for async tests; adjust deploy status reporting to avoid calling failure-log collection when no deploy batch exists.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
ci/unit-test/test_github_unit.py Adds unit tests for batched GraphQL polling and for applying GitHub review/check results onto PR state.
ci/unit-test/pytest.ini Enables pytest asyncio auto mode so async tests run without explicit markers/fixtures.
ci/unit-test/conftest.py Provides safe env/config defaults and mocks global-config secret reading for local/unit-test execution.
ci/ci/github.py Implements batched GraphQL PR polling and refactors PR status application; updates WatchedBranch GitHub update to use the new batching.
ci/ci/ci.py Fixes deploy status endpoint to avoid errors when deploy batch/failure info isn’t available yet.
Comments suppressed due to low confidence (1)

ci/ci/github.py:472

  • CheckRun.conclusion from the GitHub GraphQL API can be null for in-progress checks. Passing that null through to github_status will raise ValueError and can break the watched-branch update loop. Consider treating a null conclusion as a pending status (or explicitly mapping null -> GithubStatus.PENDING) before calling github_status.
                if (typename := check["__typename"]) == "StatusContext":
                    last_known_github_status[check["context"]] = github_status(check["state"])
                elif typename == "CheckRun":
                    last_known_github_status[check["name"]] = github_status(check["conclusion"])
                else:

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ci/ci/github.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Comment thread ci/unit-test/conftest.py Outdated
Comment on lines +21 to +37
if not os.path.exists('/global-config'):
# Patch gear.cloud_config.read_config_secret so that ci.environment's module-level
# get_global_config() call (and the subsequent get_gcp_config() call) succeed locally.
_fake_global_config = {
'cloud': 'gcp',
'docker_prefix': 'gcr.io/hail-vdc',
'docker_root_image': 'ubuntu:22.04',
'domain': 'hail.is',
'kubernetes_server_url': 'https://k8s.example.com',
'default_namespace': 'default',
# Fields required by GCPConfig.from_global_config
'batch_gcp_regions': json.dumps(['us-central1']),
'gcp_region': 'us-central1',
'gcp_project': 'hail-vdc',
'gcp_zone': 'us-central1-a',
}
unittest.mock.patch('gear.cloud_config.read_config_secret', return_value=_fake_global_config).start()
Comment thread ci/ci/github.py Outdated
Comment on lines +687 to +693
async def _fetch_pr_github_data(
gh: _GitHubGraphQL,
owner: str,
repo_name: str,
pr_numbers: List[int],
) -> Dict[int, tuple]:
"""Fetch review decisions and status check rollup for all PRs in a single batched GraphQL query.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skipping this one

kush-chandra
kush-chandra previously approved these changes Jun 1, 2026
@cjllanwarne cjllanwarne force-pushed the cjl_ci_github_broken branch from 9acd8d6 to 9eea584 Compare June 1, 2026 18:07
@cjllanwarne cjllanwarne requested a review from kush-chandra June 1, 2026 19:01
@cjllanwarne cjllanwarne dismissed kush-chandra’s stale review June 1, 2026 19:14

needs a timeout fixup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants