Add project-context.md for datastream-to-spanner#3902
Conversation
610c543 to
5595f80
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3902 +/- ##
============================================
+ Coverage 53.73% 61.93% +8.20%
+ Complexity 6743 2961 -3782
============================================
Files 1087 532 -555
Lines 66794 32141 -34653
Branches 7478 3515 -3963
============================================
- Hits 35890 19908 -15982
+ Misses 28477 11206 -17271
+ Partials 2427 1027 -1400
🚀 New features to boost your workflow:
|
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces documentation and architectural diagrams for the datastream-to-spanner Dataflow template. The goal is to provide a centralized source of truth regarding the pipeline's design, technical constraints, and operational guidelines to assist future development and maintenance. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request adds comprehensive architectural documentation for the Datastream to Spanner Dataflow template, including a Graphviz DOT file, its corresponding SVG diagram, and a detailed project context markdown file. Feedback on the architecture diagram points out a misleading label 'Write To DLQ2' and suggests renaming it to 'Write To Severe DLQ' to correctly represent that severe errors are routed to a subdirectory of the main DLQ rather than a separate resource.
| WriteSpanner [label="Write events to Cloud Spanner"]; | ||
| WriteFiltered [label="Write Filtered Events"]; | ||
| WriteDLQRetry [label="Write To DLQ\n(Retryable Errors)"]; | ||
| WriteDLQSevere [label="Write To DLQ2\n(Severe Errors)"]; |
There was a problem hiding this comment.
The label Write To DLQ2 is misleading because there is no separate DLQ2 resource or parameter configured in the pipeline. Both retryable and severe errors are written to the same Dead Letter Queue (DLQ) directory under different subdirectories (/tmp_retry/ and /tmp_severe/). Labeling this as Write To Severe DLQ is more accurate and avoids confusion. Please regenerate the architecture.svg file after applying this change to keep them in sync.
WriteDLQSevere [label="Write To Severe DLQ\n(Severe Errors)"];
| * DLQ retry logic (both `retryDLQ` and `retryAllDLQ` modes) handles data integrity on errors. Modifying it must be done carefully to prevent infinite loops or skipped events. | ||
| * **Fatal Errors:** Unexpected/fatal errors (like type conversion failures) should not be endlessly retried. Ensure any new exceptions are properly routed to the severe DLQ bucket. | ||
| * **Version Overflow:** Be mindful of edge cases in version ordering (e.g. if the Oracle SCN exceeds limits and restarts at zero). Ensure comparisons in `ChangeEventSequence` remain robust against edge case overflows. | ||
| * **Data Size Limits:** Datastream enforces a 3MB size limit per Change Event. Ensure no individual rows exceed this. |
There was a problem hiding this comment.
this seems to be incorrect, we already have tests for 10MB rows
b/521743991