Skip to content

chore(ci): modernize Codecov config for the coverage initiative [DNM]#19138

Draft
yihua wants to merge 2 commits into
apache:masterfrom
yihua:codecov-infra
Draft

chore(ci): modernize Codecov config for the coverage initiative [DNM]#19138
yihua wants to merge 2 commits into
apache:masterfrom
yihua:codecov-infra

Conversation

@yihua

@yihua yihua commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

[DNM] Do not merge yet. This branch carries a temporary no-op comment in Base64CodecUtil.java to trigger CI so the updated Codecov config produces a live per-PR report; it will be reverted before merge. The individual per-class ignores from the prior config that still resolve (standalone main() tools and legacy JSON helpers) are kept excluded for consistency; only entries for since-deleted classes and stale paths were dropped.

Describe the issue this Pull Request addresses

.codecov.yml has drifted since the module layout changed. Its ignore list points at pre-2021 paths and files that no longer exist (hudi-hive-sync/, com/uber/hoodie/..., hudi-client/src/main/...), its flags block uses names (hudicli, hudispark, ...) that no longer match what CI uploads, per-PR comments are disabled, and there are no components or carryforward. Meanwhile Codecov already receives coverage on every PR and commit, so the config, not the pipeline, is what needs fixing.

Note on generated code: JaCoCo on the Azure report counts generated Avro/Thrift/Protobuf/ANTLR classes, but those have no source committed to git, so they are already absent from the Codecov report. No JaCoCo/Maven exclusion is needed to keep them out of the Codecov number; only the source-backed, non-production paths below need ignoring.

Summary and Changelog

Modernize .codecov.yml so per-PR coverage is accurate and actionable:

  • ignore: remove stale/nonexistent entries; ignore non-production code that Codecov still counts today: hudi-examples, packaging, the hudi-integ-test harness, the vendored HoodieHadoopFSUtils.scala (owned by Spark upstream), and the incubating hudi-platform-service (metaserver, 0% and unshipped; a comment marks it for removal if taken into scope).
  • flag_management: carryforward: true so a flag's coverage is retained when its CI job is skipped by the path filter, instead of dropping to zero on partial runs. Replaces the dead flags block.
  • component_management: per-ownership-area components (hudi-common, hudi-client, hudi-flink, hudi-spark-datasource, hudi-utilities, hudi-cli, hudi-hadoop, hudi-sync, hudi-io, hudi-timeline-service, hudi-cloud, hudi-kafka-connect) so each area has a coverage number on every PR.
  • comment: post a per-PR summary (project, components, patch delta); previously comment: false.
  • status: project and patch are informational (report, do not fail the build). They can be flipped to enforcing per component as coverage climbs, without gating unrelated PRs now.
  • Documents the Codecov per-PR flow in scripts/jacoco/README.md.

Validated with curl -X POST --data-binary @.codecov.yml https://codecov.io/validate (Valid).

Impact

CI reporting only; no production code, no build change. Contributors will start seeing a Codecov comment on PRs and per-component coverage. The headline number rises modestly once the integration-test harness and examples stop counting.

Risk Level

low

CI-config only. Coverage statuses are informational, so nothing new can fail a build.

Documentation Update

scripts/jacoco/README.md gains a "Per-PR coverage on Codecov" section.

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

Refresh .codecov.yml so per-PR coverage is accurate and actionable:

- ignore: drop stale entries pointing at pre-2021 paths and files that no
  longer exist; ignore non-production code that Codecov still counts today
  (hudi-examples, packaging, the hudi-integ-test harness, the vendored
  HoodieHadoopFSUtils, and the incubating hudi-platform-service). Generated
  code (Avro/Thrift/Protobuf/ANTLR) has no source in git and is already
  absent from the report, so it needs no exclusion.
- flag_management: carry a flag's coverage forward when its CI job is skipped
  by the path filter, so a partial run does not report a false drop. Replaces
  the old flags block whose names no longer matched the uploads.
- component_management: report coverage per ownership area so each coverage
  subtask can read its own number on every PR.
- comment: post a per-PR summary (project, components, patch); previously off.
- status: project and patch statuses are informational for now (report, do
  not block); can be flipped to enforcing per component as coverage climbs.

Also document the Codecov per-PR flow in scripts/jacoco/README.md.

@hudi-agent hudi-agent left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ 🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for the docs update! This adds a helpful "Per-PR coverage on Codecov" section to scripts/jacoco/README.md covering the flag uploads, .codecov.yml behavior, and a local coverage command. The script reference and local mvn example both check out against the repo; the main concern is that the listed flag names don't match what bot.yml actually uploads. Please have a Hudi committer or PMC member confirm before merging.

Comment thread scripts/jacoco/README.md
via `scripts/jacoco/generate_merged_coverage_report.sh`, and uploads `jacoco-report.xml` to
Codecov under a flag (`spark-java-tests`, `spark-scala-tests`, `utilities`,
`common-and-other-modules`, `spark-client-hadoop-common`, `hadoop-mr-java-client`,
`integration-tests`).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This flag list doesn't match what .github/workflows/bot.yml actually uploads. The workflow uploads a spark-core-tests flag (bot.yml line 1027) which is missing here, and there is no integration-tests flag uploaded anywhere in bot.yml. The full set uploaded today is: spark-client-hadoop-common, utilities, common-and-other-modules, spark-java-tests, spark-scala-tests, hadoop-mr-java-client, spark-core-tests. Could you update the list to match (drop integration-tests, add spark-core-tests)? Note the same discrepancy exists in the flag_management comment in .codecov.yml (lines 71-73), so it'd be worth fixing both together.

⚠️ AI-generated; verify before applying. React 👍/👎 to flag quality.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked against current apache/master .github/workflows/bot.yml: the uploaded flags are spark-client-hadoop-common, utilities, common-and-other-modules, spark-java-tests, spark-scala-tests, hadoop-mr-java-client, and integration-tests (the last at bot.yml line 1432). There is no spark-core-tests flag in bot.yml. The list here matches, so leaving it as-is.

Comment thread scripts/jacoco/README.md Outdated
ownership area (hudi-common, hudi-client, hudi-spark-datasource, hudi-flink, ...); and
`flag_management` carries a flag's coverage forward when its job is skipped by the path filter,
so a partial run does not report a false drop.
- Generated code (Avro, Thrift, Protobuf, ANTLR) has no source committed to git, so it never

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This states generated code "needs no exclusion," but .codecov.yml does keep one generated-code exclusion: hudi-common/src/main/java/org/apache/hudi/avro/model/** (with a comment that it's kept for safety in case sources are ever committed). A reader who cross-references the config may find this slightly contradictory. It might help to note that the one committed-generated Avro path is still ignored defensively, so the statement and the config line up.

⚠️ AI-generated; verify before applying. React 👍/👎 to flag quality.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: the doc now says generated code (Thrift/Protobuf/ANTLR) needs no exclusion, and notes the one committed generated path (Avro model classes) is still ignored defensively, so the doc and config agree.

- .codecov.yml: restore the individual Java-class ignores from the prior config
  for the classes that still exist (standalone main()-style tools and legacy JSON
  helpers), so their exclusion is unchanged; entries for since-deleted classes and
  stale paths are left dropped.
- scripts/jacoco/README.md: clarify that the committed Avro model path is still
  ignored defensively, so the doc and config agree.
- Base64CodecUtil: temporary no-op comment to trigger CI/Codecov upload (DNM, revert before merge).
@yihua yihua changed the title chore(ci): modernize Codecov config for the coverage initiative [DNM] chore(ci): modernize Codecov config for the coverage initiative Jul 2, 2026
@yihua yihua marked this pull request as draft July 2, 2026 05:35
@yihua yihua changed the title [DNM] chore(ci): modernize Codecov config for the coverage initiative chore(ci): modernize Codecov config for the coverage initiative [DNM] Jul 2, 2026
@codecov-commenter

codecov-commenter commented Jul 2, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.21%. Comparing base (2912bf6) to head (eaec940).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #19138      +/-   ##
============================================
+ Coverage     66.61%   68.21%   +1.60%     
+ Complexity    30378    30180     -198     
============================================
  Files          2683     2543     -140     
  Lines        150666   146072    -4594     
  Branches      18964    18606     -358     
============================================
- Hits         100360    99650     -710     
+ Misses        41811    37979    -3832     
+ Partials       8495     8443      -52     
Components Coverage Δ
hudi-common 79.62% <ø> (+<0.01%) ⬆️
hudi-client 79.44% <ø> (-0.01%) ⬇️
hudi-flink 63.28% <ø> (ø)
hudi-spark-datasource 53.81% <ø> (+0.25%) ⬆️
hudi-utilities 70.48% <ø> (-0.01%) ⬇️
hudi-cli 15.26% <ø> (ø)
hudi-hadoop 64.96% <ø> (ø)
hudi-sync 68.74% <ø> (ø)
hudi-io 79.42% <ø> (ø)
hudi-timeline-service 83.46% <ø> (+0.29%) ⬆️
hudi-cloud 63.85% <ø> (ø)
hudi-kafka-connect 53.20% <ø> (ø)
Flag Coverage Δ
common-and-other-modules 45.11% <ø> (+0.58%) ⬆️
hadoop-mr-java-client 43.84% <ø> (+0.06%) ⬆️
integration-tests 13.98% <ø> (+0.03%) ⬆️
spark-client-hadoop-common 48.07% <ø> (-0.01%) ⬇️
spark-java-tests 47.73% <ø> (+0.08%) ⬆️
spark-scala-tests 44.47% <ø> (+0.21%) ⬆️
utilities 36.69% <ø> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...a/org/apache/hudi/common/util/Base64CodecUtil.java 75.00% <ø> (ø)

... and 146 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions Bot added the size:M PR with lines of changes in (100, 300] label Jul 2, 2026
@hudi-bot

hudi-bot commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M PR with lines of changes in (100, 300]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants