Reading a table partitioned on a nested column fails with "Illegal character in: <field>"

**Describe the problem you faced**

Reading a Hudi table that is partitioned on a **nested column** (a partition field whose name is a dotted path, e.g. `nested_record.level`) fails on the batch read path with:

```
org.apache.hudi.internal.schema.HoodieSchemaException: Illegal character in: nested_record.level
	at org.apache.hudi.HoodieSchemaConversionUtils$.convertStructTypeToHoodieSchema(HoodieSchemaConversionUtils.scala:143)
	at org.apache.spark.sql.execution.datasources.parquet.HoodieFileGroupReaderBasedFileFormat.buildReaderWithPartitionValues(HoodieFileGroupReaderBasedFileFormat.scala:269)
	at org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(...)
```

`HoodieFileGroupReaderBasedFileFormat` is the only batch reader, so the table cannot be read at all once it is partitioned on a nested field.

**Root cause**

When a partition field is "mandatory", `HoodieFileGroupReaderBasedFileFormat#buildReaderWithPartitionValues` adds it to the `StructType` it converts into a `HoodieSchema` (a top-level Avro field) so it can be read from the data file. For a nested partition column the field name is a dotted path (`nested_record.level`), which is not a valid Avro field name, so `convertStructTypeToHoodieSchema` throws before the read can start.

A nested partition column is never a flat top-level column in the data file: its value is materialized from the partition path, and its root field (`nested_record`) is already read via the normal data schema. So it should not be converted into a top-level Avro field at all. Hudi already takes the root-level field name for nested mandatory columns elsewhere (`HoodieBaseRelation#appendMandatoryColumns` via `HoodieAvroUtils.getRootLevelFieldName`); the file-group reader path does not.

**To Reproduce**

1. Write a COW table partitioned on a nested column (partition field path `nested_record.level`).
2. `spark.read.format("hudi").load(path).filter("nested_record.level = 'INFO'")` (or any read that makes the partition column mandatory).
3. The query fails with `HoodieSchemaException: Illegal character in: nested_record.level`.

This surfaces in Apache XTable's HUDI -> ICEBERG conversion when the source Hudi table is partitioned on a nested column.

**Expected behavior**

Reading a table partitioned on a nested column should succeed; the partition value should be materialized from the partition path like any other appended partition field.

**Environment Description**

* Hudi version: 1.x (master)
* Spark version: 3.4
* Running on Docker? : no


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reading a table partitioned on a nested column fails with "Illegal character in: <field>" #19122

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Reading a table partitioned on a nested column fails with "Illegal character in: <field>" #19122

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions