Apache Iceberg Rust version
Current main at ac92ec9 (version 0.10.0).
Describe the bug
When an INSERT INTO ... SELECT ... WHERE false (or any other insert that produces zero rows) is executed through the DataFusion integration, IcebergCommitExec returns an empty RecordBatch and still creates a new table snapshot.
DataFusion expects an INSERT statement to always produce a single-row, single-column result containing the affected row count (0 in this case). Returning an empty batch violates this contract and can confuse downstream consumers.
Additionally, creating a snapshot for an insert that wrote no data files is unnecessary and pollutes table history.
To Reproduce
- Create an Iceberg table through the DataFusion catalog.
- Execute
INSERT INTO catalog.my_table SELECT * FROM (VALUES (1, 'a')) AS t(foo1, foo2) WHERE false.
- Collect the result batches.
- Observe that the returned batch has zero rows instead of one row with count
0.
- Load the table metadata and observe that a snapshot is created even though no data files were written.
Relevant code:
crates/integrations/datafusion/src/physical_plan/commit.rs returns RecordBatch::new_empty(...) when data_files is empty.
- The commit path does not short-circuit before starting a transaction for empty inserts.
Expected behavior
IcebergCommitExec should return a single-row RecordBatch with a UInt64 count column set to 0 when no data files are produced.
- No new snapshot should be created for an empty insert.
Willingness to contribute
I can contribute a fix for this bug independently.
Apache Iceberg Rust version
Current
mainatac92ec9(version 0.10.0).Describe the bug
When an
INSERT INTO ... SELECT ... WHERE false(or any other insert that produces zero rows) is executed through the DataFusion integration,IcebergCommitExecreturns an emptyRecordBatchand still creates a new table snapshot.DataFusion expects an
INSERTstatement to always produce a single-row, single-column result containing the affected row count (0 in this case). Returning an empty batch violates this contract and can confuse downstream consumers.Additionally, creating a snapshot for an insert that wrote no data files is unnecessary and pollutes table history.
To Reproduce
INSERT INTO catalog.my_table SELECT * FROM (VALUES (1, 'a')) AS t(foo1, foo2) WHERE false.0.Relevant code:
crates/integrations/datafusion/src/physical_plan/commit.rsreturnsRecordBatch::new_empty(...)whendata_filesis empty.Expected behavior
IcebergCommitExecshould return a single-rowRecordBatchwith aUInt64count column set to0when no data files are produced.Willingness to contribute
I can contribute a fix for this bug independently.