Skip to content

DataFusion empty INSERT returns empty batch and creates unnecessary snapshot #2713

Description

@u70b3

Apache Iceberg Rust version

Current main at ac92ec9 (version 0.10.0).

Describe the bug

When an INSERT INTO ... SELECT ... WHERE false (or any other insert that produces zero rows) is executed through the DataFusion integration, IcebergCommitExec returns an empty RecordBatch and still creates a new table snapshot.

DataFusion expects an INSERT statement to always produce a single-row, single-column result containing the affected row count (0 in this case). Returning an empty batch violates this contract and can confuse downstream consumers.

Additionally, creating a snapshot for an insert that wrote no data files is unnecessary and pollutes table history.

To Reproduce

  1. Create an Iceberg table through the DataFusion catalog.
  2. Execute INSERT INTO catalog.my_table SELECT * FROM (VALUES (1, 'a')) AS t(foo1, foo2) WHERE false.
  3. Collect the result batches.
  4. Observe that the returned batch has zero rows instead of one row with count 0.
  5. Load the table metadata and observe that a snapshot is created even though no data files were written.

Relevant code:

  • crates/integrations/datafusion/src/physical_plan/commit.rs returns RecordBatch::new_empty(...) when data_files is empty.
  • The commit path does not short-circuit before starting a transaction for empty inserts.

Expected behavior

  • IcebergCommitExec should return a single-row RecordBatch with a UInt64 count column set to 0 when no data files are produced.
  • No new snapshot should be created for an empty insert.

Willingness to contribute

I can contribute a fix for this bug independently.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions