Skip to content

Add project-context.md for datastream-to-spanner#3902

Open
aasthabharill wants to merge 2 commits into
GoogleCloudPlatform:mainfrom
aasthabharill:datastream-to-spanner-context
Open

Add project-context.md for datastream-to-spanner#3902
aasthabharill wants to merge 2 commits into
GoogleCloudPlatform:mainfrom
aasthabharill:datastream-to-spanner-context

Conversation

@aasthabharill

Copy link
Copy Markdown
Member

b/521743991

@aasthabharill aasthabharill force-pushed the datastream-to-spanner-context branch from 610c543 to 5595f80 Compare June 9, 2026 12:11
@codecov

codecov Bot commented Jun 9, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.93%. Comparing base (3b2def5) to head (a114714).
⚠️ Report is 64 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3902      +/-   ##
============================================
+ Coverage     53.73%   61.93%   +8.20%     
+ Complexity     6743     2961    -3782     
============================================
  Files          1087      532     -555     
  Lines         66794    32141   -34653     
  Branches       7478     3515    -3963     
============================================
- Hits          35890    19908   -15982     
+ Misses        28477    11206   -17271     
+ Partials       2427     1027    -1400     
Components Coverage Δ
spanner-templates 80.24% <ø> (+7.40%) ⬆️
spanner-import-export ∅ <ø> (∅)
spanner-live-forward-migration 90.15% <ø> (+9.20%) ⬆️
spanner-live-reverse-replication 83.93% <ø> (+6.84%) ⬆️
spanner-bulk-migration 92.57% <ø> (+1.46%) ⬆️
gcs-spanner-dv 90.37% <ø> (+4.61%) ⬆️
see 654 files with indirect coverage changes
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@aasthabharill aasthabharill added the addition New feature or request label Jun 9, 2026
Comment thread v2/datastream-to-spanner/project-context.md Outdated
@aasthabharill aasthabharill requested a review from darshan-sj June 10, 2026 04:46
@aasthabharill aasthabharill marked this pull request as ready for review June 10, 2026 04:46
@aasthabharill aasthabharill requested a review from a team as a code owner June 10, 2026 04:46
@aasthabharill aasthabharill requested a review from sm745052 June 10, 2026 04:46
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces documentation and architectural diagrams for the datastream-to-spanner Dataflow template. The goal is to provide a centralized source of truth regarding the pipeline's design, technical constraints, and operational guidelines to assist future development and maintenance.

Highlights

  • Project Documentation: Added a comprehensive project-context.md file to provide AI agents and developers with architectural insights, coding standards, and best practices for the datastream-to-spanner template.
  • Architecture Visualization: Included an updated architecture diagram in both .dot and .svg formats to reflect the current data flow and component interactions.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds comprehensive architectural documentation for the Datastream to Spanner Dataflow template, including a Graphviz DOT file, its corresponding SVG diagram, and a detailed project context markdown file. Feedback on the architecture diagram points out a misleading label 'Write To DLQ2' and suggests renaming it to 'Write To Severe DLQ' to correctly represent that severe errors are routed to a subdirectory of the main DLQ rather than a separate resource.

WriteSpanner [label="Write events to Cloud Spanner"];
WriteFiltered [label="Write Filtered Events"];
WriteDLQRetry [label="Write To DLQ\n(Retryable Errors)"];
WriteDLQSevere [label="Write To DLQ2\n(Severe Errors)"];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The label Write To DLQ2 is misleading because there is no separate DLQ2 resource or parameter configured in the pipeline. Both retryable and severe errors are written to the same Dead Letter Queue (DLQ) directory under different subdirectories (/tmp_retry/ and /tmp_severe/). Labeling this as Write To Severe DLQ is more accurate and avoids confusion. Please regenerate the architecture.svg file after applying this change to keep them in sync.

    WriteDLQSevere [label="Write To Severe DLQ\n(Severe Errors)"];

* DLQ retry logic (both `retryDLQ` and `retryAllDLQ` modes) handles data integrity on errors. Modifying it must be done carefully to prevent infinite loops or skipped events.
* **Fatal Errors:** Unexpected/fatal errors (like type conversion failures) should not be endlessly retried. Ensure any new exceptions are properly routed to the severe DLQ bucket.
* **Version Overflow:** Be mindful of edge cases in version ordering (e.g. if the Oracle SCN exceeds limits and restarts at zero). Ensure comparisons in `ChangeEventSequence` remain robust against edge case overflows.
* **Data Size Limits:** Datastream enforces a 3MB size limit per Change Event. Ensure no individual rows exceed this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to be incorrect, we already have tests for 10MB rows

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

addition New feature or request size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants