chore(ci): modernize Codecov config for the coverage initiative [DNM]#19138
chore(ci): modernize Codecov config for the coverage initiative [DNM]#19138yihua wants to merge 2 commits into
Conversation
Refresh .codecov.yml so per-PR coverage is accurate and actionable: - ignore: drop stale entries pointing at pre-2021 paths and files that no longer exist; ignore non-production code that Codecov still counts today (hudi-examples, packaging, the hudi-integ-test harness, the vendored HoodieHadoopFSUtils, and the incubating hudi-platform-service). Generated code (Avro/Thrift/Protobuf/ANTLR) has no source in git and is already absent from the report, so it needs no exclusion. - flag_management: carry a flag's coverage forward when its CI job is skipped by the path filter, so a partial run does not report a false drop. Replaces the old flags block whose names no longer matched the uploads. - component_management: report coverage per ownership area so each coverage subtask can read its own number on every PR. - comment: post a per-PR summary (project, components, patch); previously off. - status: project and patch statuses are informational for now (report, do not block); can be flipped to enforcing per component as coverage climbs. Also document the Codecov per-PR flow in scripts/jacoco/README.md.
hudi-agent
left a comment
There was a problem hiding this comment.
Thanks for the docs update! This adds a helpful "Per-PR coverage on Codecov" section to scripts/jacoco/README.md covering the flag uploads, .codecov.yml behavior, and a local coverage command. The script reference and local mvn example both check out against the repo; the main concern is that the listed flag names don't match what bot.yml actually uploads. Please have a Hudi committer or PMC member confirm before merging.
| via `scripts/jacoco/generate_merged_coverage_report.sh`, and uploads `jacoco-report.xml` to | ||
| Codecov under a flag (`spark-java-tests`, `spark-scala-tests`, `utilities`, | ||
| `common-and-other-modules`, `spark-client-hadoop-common`, `hadoop-mr-java-client`, | ||
| `integration-tests`). |
There was a problem hiding this comment.
🤖 This flag list doesn't match what .github/workflows/bot.yml actually uploads. The workflow uploads a spark-core-tests flag (bot.yml line 1027) which is missing here, and there is no integration-tests flag uploaded anywhere in bot.yml. The full set uploaded today is: spark-client-hadoop-common, utilities, common-and-other-modules, spark-java-tests, spark-scala-tests, hadoop-mr-java-client, spark-core-tests. Could you update the list to match (drop integration-tests, add spark-core-tests)? Note the same discrepancy exists in the flag_management comment in .codecov.yml (lines 71-73), so it'd be worth fixing both together.
There was a problem hiding this comment.
Checked against current apache/master .github/workflows/bot.yml: the uploaded flags are spark-client-hadoop-common, utilities, common-and-other-modules, spark-java-tests, spark-scala-tests, hadoop-mr-java-client, and integration-tests (the last at bot.yml line 1432). There is no spark-core-tests flag in bot.yml. The list here matches, so leaving it as-is.
| ownership area (hudi-common, hudi-client, hudi-spark-datasource, hudi-flink, ...); and | ||
| `flag_management` carries a flag's coverage forward when its job is skipped by the path filter, | ||
| so a partial run does not report a false drop. | ||
| - Generated code (Avro, Thrift, Protobuf, ANTLR) has no source committed to git, so it never |
There was a problem hiding this comment.
🤖 This states generated code "needs no exclusion," but .codecov.yml does keep one generated-code exclusion: hudi-common/src/main/java/org/apache/hudi/avro/model/** (with a comment that it's kept for safety in case sources are ever committed). A reader who cross-references the config may find this slightly contradictory. It might help to note that the one committed-generated Avro path is still ignored defensively, so the statement and the config line up.
There was a problem hiding this comment.
Fixed: the doc now says generated code (Thrift/Protobuf/ANTLR) needs no exclusion, and notes the one committed generated path (Avro model classes) is still ignored defensively, so the doc and config agree.
- .codecov.yml: restore the individual Java-class ignores from the prior config for the classes that still exist (standalone main()-style tools and legacy JSON helpers), so their exclusion is unchanged; entries for since-deleted classes and stale paths are left dropped. - scripts/jacoco/README.md: clarify that the committed Avro model path is still ignored defensively, so the doc and config agree. - Base64CodecUtil: temporary no-op comment to trigger CI/Codecov upload (DNM, revert before merge).
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #19138 +/- ##
============================================
+ Coverage 66.61% 68.21% +1.60%
+ Complexity 30378 30180 -198
============================================
Files 2683 2543 -140
Lines 150666 146072 -4594
Branches 18964 18606 -358
============================================
- Hits 100360 99650 -710
+ Misses 41811 37979 -3832
+ Partials 8495 8443 -52
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Describe the issue this Pull Request addresses
.codecov.ymlhas drifted since the module layout changed. Itsignorelist points at pre-2021 paths and files that no longer exist (hudi-hive-sync/,com/uber/hoodie/...,hudi-client/src/main/...), itsflagsblock uses names (hudicli,hudispark, ...) that no longer match what CI uploads, per-PR comments are disabled, and there are no components or carryforward. Meanwhile Codecov already receives coverage on every PR and commit, so the config, not the pipeline, is what needs fixing.Note on generated code: JaCoCo on the Azure report counts generated Avro/Thrift/Protobuf/ANTLR classes, but those have no source committed to git, so they are already absent from the Codecov report. No JaCoCo/Maven exclusion is needed to keep them out of the Codecov number; only the source-backed, non-production paths below need ignoring.
Summary and Changelog
Modernize
.codecov.ymlso per-PR coverage is accurate and actionable:hudi-examples,packaging, thehudi-integ-testharness, the vendoredHoodieHadoopFSUtils.scala(owned by Spark upstream), and the incubatinghudi-platform-service(metaserver, 0% and unshipped; a comment marks it for removal if taken into scope).carryforward: trueso a flag's coverage is retained when its CI job is skipped by the path filter, instead of dropping to zero on partial runs. Replaces the deadflagsblock.comment: false.projectandpatchareinformational(report, do not fail the build). They can be flipped to enforcing per component as coverage climbs, without gating unrelated PRs now.scripts/jacoco/README.md.Validated with
curl -X POST --data-binary @.codecov.yml https://codecov.io/validate(Valid).Impact
CI reporting only; no production code, no build change. Contributors will start seeing a Codecov comment on PRs and per-component coverage. The headline number rises modestly once the integration-test harness and examples stop counting.
Risk Level
low
CI-config only. Coverage statuses are informational, so nothing new can fail a build.
Documentation Update
scripts/jacoco/README.mdgains a "Per-PR coverage on Codecov" section.Contributor's checklist