Skip to content

feat: parallelize parquet load by row group#24808

Open
iamlinjunhong wants to merge 2 commits into
matrixorigin:4.0-devfrom
iamlinjunhong:d4-24254
Open

feat: parallelize parquet load by row group#24808
iamlinjunhong wants to merge 2 commits into
matrixorigin:4.0-devfrom
iamlinjunhong:d4-24254

Conversation

@iamlinjunhong
Copy link
Copy Markdown
Contributor

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #24254

What this PR does / why we need it:

Plan Parquet LOAD around file and row-group fanout, carry shard metadata through ExternalScan, and keep S3 prefetch behavior bounded for sharded readers.

Harden unsupported option handling, add Parquet profile stats, and cover compile/runtime/BVT regressions for schema, conversion, rollback, and parallel-load paths.

Plan Parquet LOAD around file and row-group fanout, carry shard metadata through ExternalScan, and keep S3 prefetch behavior bounded for sharded readers.

Harden unsupported option handling, add Parquet profile stats, and cover compile/runtime/BVT regressions for schema, conversion, rollback, and parallel-load paths.
@qodo-code-review
Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

Add the missing DATE32 Parquet resource used by load_data_parquet.sql so the 4.0-dev BVT can load the cherry-picked date32-to-DATETIME case instead of failing on a missing file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XXL Denotes a PR that changes 2000+ lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants