Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
160 changes: 112 additions & 48 deletions .codecov.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,68 +15,132 @@

# For more configuration details:
# https://docs.codecov.io/docs/codecov-yaml

# Check if this file is valid by running in bash:
#
# Validate this file:
# curl -X POST --data-binary @.codecov.yml https://codecov.io/validate

codecov:
# Wait for the parallel test jobs to finish uploading before evaluating status
# and posting the PR comment, so a partial upload does not report a false drop.
notify:
wait_for_ci: true

coverage:
precision: 2
round: down
range: "50...100"
status:
# Statuses are informational for now: they surface the delta on each PR but do
# not fail the build. Flip a component to blocking (drop `informational`, set a
# `target`/`threshold`) once its coverage has climbed, so the initiative does
# not gate unrelated PRs mid-flight.
project:
default:
informational: true
patch:
default:
informational: true

# Ignoring Paths
# --------------
# which folders/files to ignore
# Generated code (Avro/Thrift/Protobuf/ANTLR) is produced at build time, has no
# source in git, and is therefore already absent from the Codecov report; it needs
# no entry here. The paths below are non-generated code that should not count.
ignore:
- "hudi-common/src/main/java/org/apache/hudi/avro/model/*"
# Generated Avro model classes (kept for safety in case sources are ever committed).
- "hudi-common/src/main/java/org/apache/hudi/avro/model/**"
# Example and quickstart applications are illustrative, not production.
- "hudi-examples/**"
# Integration-test harness (test DAGs, fixtures, generators), not production behavior.
- "hudi-integ-test/**"
# Packaging and bundle shims: assembly config, no testable logic.
- "packaging/**"
# Vendored copy of Spark's HadoopFSUtils; owned upstream, not Hudi logic.
- "hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/HoodieHadoopFSUtils.scala"
# Metaserver is an incubating component with no shipped coverage. Remove this line
# if it is taken into scope for the coverage initiative.
- "hudi-platform-service/**"
# Standalone main()-style tools and legacy JSON/payload helpers carried over from the
# previous config; these are exercised by integration tests, not unit tests. Kept
# excluded for consistency (entries for since-deleted classes have been dropped).
- "hudi-common/src/main/java/org/apache/hudi/avro/MercifulJsonConverter.java"
- "hudi-common/src/main/java/org/apache/hudi/common/HoodieJsonPayload"
- "hudi-common/src/main/java/org/apache/hudi/common/HoodieJsonPayload.java"
- "hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCleaner.java"
- "hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCompactionAdminTool.java"
- "hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCompactor.java"
- "hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotCopier.java"
- "hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieWithTimelineServer.java"
- "hudi-utilities/src/main/java/org/apache/hudi/utilities/UpgradePayloadFromUberToApache.java"
- "hudi-utilities/src/main/java/org/apache/hudi/utilities/perf/TimelineServerPerf.java"
- "hudi-utilities/src/main/java/org/apache/hudi/utilities/HDFSParquetImporter.java"
- "hudi-utilities/src/main/java/org/apache/hudi/utilities/HiveIncrementalPuller.java"
- "hudi-utilities/src/main/java/org/apache/hudi/utilities/adhoc/UpgradePayloadFromUberToApache.java"
- "hudi-client/src/main/java/org/apache/hudi/metrics/JmxMetricsReporter.java"
- "hudi-client/src/main/java/org/apache/hudi/metrics/JmxReporterServer.java"
- "hudi-client/src/main/java/org/apache/hudi/metrics/MetricsGraphiteReporter.java"
- "hudi-hadoop-mr/src/main/java/com/uber/hoodie/hadoop/HoodieInputFormat.java"
- "hudi-hadoop-mr/src/main/java/com/uber/hoodie/hadoop/realtime/HoodieRealtimeInputFormat.java"

comment: false
# Post a coverage summary on every PR so reviewers can see the per-component and
# per-patch delta (used to enforce the coverage-quality bar for the initiative).
comment:
layout: "condensed_header, diff, components, flags, files"
behavior: default
require_changes: false

# Carry the last known coverage forward for a flag when its CI job is skipped by the
# path filter on a given PR, so a partial run does not zero out that flag's coverage.
# Flags uploaded by .github/workflows/bot.yml: spark-client-hadoop-common, utilities,
# common-and-other-modules, spark-java-tests, spark-scala-tests, hadoop-mr-java-client,
# integration-tests.
flag_management:
default_rules:
carryforward: true

flags:
hudicli:
paths:
- hudi-cli/src/main/
hudiclient:
paths:
- hudi-client/src/main/
hudicommon:
paths:
- hudi-common/src/main/
hudiexamples:
paths:
- hudi-examples/src/main/
hudihadoopmr:
paths:
- hudi-hadoop-mr/src/main/
hudihivesync:
paths:
- hudi-hive-sync/src/main/
hudiintegtest:
paths:
- hudi-integ-test/src/main/
hudispark:
paths:
- hudi-spark/src/main/
huditimelineservice:
paths:
- hudi-timeline-service/src/main/
hudiutilities:
paths:
- hudi-utilities/src/main/
# Per-ownership-area coverage, so each coverage subtask can read its own number on
# every PR. Paths align with the module layout and the ENG-44401 subtask breakdown.
component_management:
individual_components:
- component_id: hudi-common
name: hudi-common
paths:
- "hudi-common/**"
- component_id: hudi-client
name: hudi-client
paths:
- "hudi-client/hudi-client-common/**"
- "hudi-client/hudi-spark-client/**"
- "hudi-client/hudi-java-client/**"
- component_id: hudi-flink
name: hudi-flink
paths:
- "hudi-flink-datasource/**"
- "hudi-client/hudi-flink-client/**"
- component_id: hudi-spark-datasource
name: hudi-spark-datasource
paths:
- "hudi-spark-datasource/**"
- component_id: hudi-utilities
name: hudi-utilities
paths:
- "hudi-utilities/**"
- component_id: hudi-cli
name: hudi-cli
paths:
- "hudi-cli/**"
- component_id: hudi-hadoop
name: hudi-hadoop
paths:
- "hudi-hadoop-common/**"
- "hudi-hadoop-mr/**"
- component_id: hudi-sync
name: hudi-sync
paths:
- "hudi-sync/**"
- component_id: hudi-io
name: hudi-io
paths:
- "hudi-io/**"
- component_id: hudi-timeline-service
name: hudi-timeline-service
paths:
- "hudi-timeline-service/**"
- component_id: hudi-cloud
name: hudi-cloud
paths:
- "hudi-aws/**"
- "hudi-gcp/**"
- "hudi-azure/**"
- component_id: hudi-kafka-connect
name: hudi-kafka-connect
paths:
- "hudi-kafka-connect/**"
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@
*/
public final class Base64CodecUtil {

// DNM: temporary no-op comment to trigger CI so the updated Codecov config uploads a report. Revert before merge.

/**
* Decodes data from the input string into using the encoding scheme.
*
Expand Down
35 changes: 34 additions & 1 deletion scripts/jacoco/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,4 +107,37 @@ Published Artifacts
- `merge_jacoco_exec_files.sh`: merges multiple JaCoCo execution data files in multiple modules.
- `merge_jacoco_job_files.sh`: merges multiple JaCoCo execution data files from multiple Azure pipeline jobs.
- `generate_jacoco_coverage_report.sh`: generates the JaCoCo code coverage report by taking the execution data file,
source files and class files.
source files and class files.

## Per-PR coverage on Codecov

In addition to the aggregated Azure report described above, coverage is uploaded to
[Codecov](https://app.codecov.io/gh/apache/hudi) on every pull request and every commit to master.
This is the canonical per-PR view.

- Each test job in `.github/workflows/bot.yml` runs with the JaCoCo agent, builds a merged report
via `scripts/jacoco/generate_merged_coverage_report.sh`, and uploads `jacoco-report.xml` to
Codecov under a flag (`spark-java-tests`, `spark-scala-tests`, `utilities`,
`common-and-other-modules`, `spark-client-hadoop-common`, `hadoop-mr-java-client`,
`integration-tests`).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This flag list doesn't match what .github/workflows/bot.yml actually uploads. The workflow uploads a spark-core-tests flag (bot.yml line 1027) which is missing here, and there is no integration-tests flag uploaded anywhere in bot.yml. The full set uploaded today is: spark-client-hadoop-common, utilities, common-and-other-modules, spark-java-tests, spark-scala-tests, hadoop-mr-java-client, spark-core-tests. Could you update the list to match (drop integration-tests, add spark-core-tests)? Note the same discrepancy exists in the flag_management comment in .codecov.yml (lines 71-73), so it'd be worth fixing both together.

⚠️ AI-generated; verify before applying. React 👍/👎 to flag quality.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked against current apache/master .github/workflows/bot.yml: the uploaded flags are spark-client-hadoop-common, utilities, common-and-other-modules, spark-java-tests, spark-scala-tests, hadoop-mr-java-client, and integration-tests (the last at bot.yml line 1432). There is no spark-core-tests flag in bot.yml. The list here matches, so leaving it as-is.

- `.codecov.yml` configures reporting: `ignore` drops non-production code (examples, packaging,
the integration-test harness, an incubating module); `component_management` reports coverage per
ownership area (hudi-common, hudi-client, hudi-spark-datasource, hudi-flink, ...); and
`flag_management` carries a flag's coverage forward when its job is skipped by the path filter,
so a partial run does not report a false drop.
- Generated code (Thrift, Protobuf, ANTLR) has no source committed to git, so it never appears
in the Codecov report and needs no exclusion. The one committed generated path, the Avro model
classes, is still ignored defensively in `.codecov.yml`.
- Codecov posts a summary comment on each PR with the project, per-component, and per-patch
coverage delta. Statuses are informational (they do not fail the build); to gate a component
against regression later, remove `informational` and set a `target`/`threshold` for it.

To read coverage for a single module locally, run its tests with the JaCoCo agent and open the
generated `target/site/jacoco*/index.html`, for example:

```bash
mvn test -pl hudi-common -Punit-tests -Djacoco.skip=false
```

Instruction coverage is the headline metric; branch coverage (also in the report) shows whether
both sides of each conditional are exercised, which is what catches untested error and config paths.
Loading