-
Notifications
You must be signed in to change notification settings - Fork 495
fix(transaction): don't let user properties override computed summary metrics #2726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -404,8 +404,13 @@ impl<'a> SnapshotProducer<'a> { | |
|
|
||
| let previous_snapshot = table_metadata.current_snapshot(); | ||
|
|
||
| let mut additional_properties = summary_collector.build(); | ||
| additional_properties.extend(self.snapshot_properties.clone()); | ||
| // User-supplied snapshot properties are applied first, then the computed | ||
| // metrics overwrite any colliding keys. This matches iceberg-java | ||
| // (`SnapshotProducer.summary`), where computed `added-*`/`total-*` values | ||
| // are written after user properties so a user cannot shadow them with a | ||
| // bad (or merely wrong) value that would corrupt the snapshot summary. | ||
| let mut additional_properties = self.snapshot_properties.clone(); | ||
| additional_properties.extend(summary_collector.build()); | ||
|
Comment on lines
+407
to
+413
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is still possible for a user to supply bad values, where the new snapshots value would be 0 as it won't be populated. Should we try and avoid this, for instance by dropping those values? As a fix, I think this change is fine, but I am wondering if we need to follow-up here.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're right, and worth being precise about the residual gap: I checked iceberg-java for comparison: Proactively dropping any user property whose key collides with a reserved metric name would go a step beyond Java and close the gap fully. I'd prefer to keep that as a follow-up rather than widen this PR — agreed it's worth doing. I can open an issue to track it. |
||
|
|
||
| let summary = Summary { | ||
| operation: snapshot_produce_operation.operation(), | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, I introduced this bug. I wasn't aware that users can overwrite the computed deltas.
The original intent was that at this point in the code, we should be the ones who wrote these values, hence the
expectrather than parsing and error handling.I think we're still open to bad values being passed in so avoiding
expectsounds reasonable, although I wonder if we should be just avoiding the possibility of passing values matching totals/added/removed all together.Related: one of the things I wanted to do is move total updates into the same part of the code as delta calculations - then we wouldn't need deserialize values we just wrote in the first place. Not for this PR though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries — and thanks for the context on the original
expect. You're right that the intent (we just wrote these, so they should be parsable) was sound; it only broke once user properties could reach the same keys. Tolerating the bad value here is a cheap safety net regardless, since a stale/garbage value can also arrive from a previous snapshot's summary.Agreed that collapsing total computation into the same place as the delta calculation (so we never round-trip through strings) is the cleaner long-term shape — happy to leave that as a separate follow-up as you suggest.