Skip to content

feat(datalink): GC/reclamation for pinned datalink CAS#24835

Draft
ck89119 wants to merge 19 commits into
matrixorigin:mainfrom
ck89119:issue-24743
Draft

feat(datalink): GC/reclamation for pinned datalink CAS#24835
ck89119 wants to merge 19 commits into
matrixorigin:mainfrom
ck89119:issue-24743

Conversation

@ck89119

@ck89119 ck89119 commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #24743

What this PR does / why we need it:

Depends on #24641 (introduces the datalink CAS). This PR is stacked on that branch, so the diff currently includes #24641's commits; it will be rebased onto main once #24641 merges, leaving only this issue's changes.

Adds garbage collection / reclamation for the per-account, content-addressed store (CAS) datalink_cas/<accountID>/<hash> that backs pinned datalinks (datalink_pin, #24641). Without it, pinned blobs are retained forever once their referencing datalink values are gone (rows deleted, table dropped, snapshot expired, account dropped), leaking storage in the SHARED file service.

Two mechanisms:

  1. DROP ACCOUNT cleanup — on DROP ACCOUNT, the account's entire datalink_cas/<accountID>/ prefix is reclaimed (best-effort; a failure never aborts the drop, the sweep is the backstop). pkg/frontend/authenticate.go, pkg/datalink/cas.go.

  2. Reference-aware periodic sweep (pkg/datalink/casgc/) — an hourly cron task that, per account, collects the content hashes referenced by live datalink columns and live snapshots, then reclaims unreferenced CAS blobs. Because the file service exposes no object mtime, the grace period is a two-pass mark-and-delete with in-process state: a blob is deleted only after staying an orphan across passes for at least GraceWindow (default 24h, configurable; sweep interval 1h). The sweep is candidate-driven (it skips snapshot scans when nothing is collectable), and a singleton Sweeper preserves the grace-window state across cron firings.

Snapshot safety: CAS blobs live in the SHARED file service, not in MVCC data, so snapshots/PITR do not pin them. The sweep therefore treats hashes referenced by any live snapshot as live, preventing dangling datalink references after a restore.

Tests

  • Unit: CAS primitives, reference discovery, two-pass sweep (grace window / candidate-driven / re-reference reset / cross-account / SweepAll fault-tolerance), and a mock-based SQL reference layer. Coverage: pkg/datalink 76.6%, pkg/datalink/casgc 84.0%.
  • BVT: func_datalink_cas_gc — pin content into a table, DROP ACCOUNT with pinned blobs present, recreate a same-named account with a clean namespace (11/11 pass).
  • make and make static-check clean.

🤖 Generated with Claude Code

ck89119 and others added 19 commits May 27, 2026 17:21
datalink snapshots only versioned the reference string, not the bytes of the
referenced file, so historical state was not byte-reproducible after an
out-of-band overwrite (issue matrixorigin#24555).

Add an opt-in datalink_pin(datalink) builtin that freezes the bytes currently
referenced by a datalink into an immutable content-addressed store (CAS) backed
by the SHARED file service, and returns a datalink carrying ?contenthash=<sha256>.
The read path serves a pinned datalink directly from the CAS (a missing object
errors out instead of falling back to the live, possibly overwritten file), so
historical snapshots stay byte-reproducible. Reading the SHARED service uses a
plain FileService rather than GetForETL, since SHARED may be a LocalFS that does
not implement ETLFileService in standalone.

Default (un-pinned) datalink behavior is unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Size guard: refuse to pin objects larger than MaxBlobLen (mirroring
  load_file), accounting for offset/size, to avoid loading a huge object fully
  into memory and OOMing the CN. Streaming hash/copy is a follow-up.
- Strip offset/size case-insensitively when emitting the pinned URL, so a
  mixed-case ?Offset=/?Size= no longer survives and wrongly re-slices the CAS
  object on read.
- Derive Datalink.ContentHash from the parsed MoPath instead of re-parsing the
  query, keeping it consistent with ParseDatalink for duplicate/mixed-case
  contenthash params.
- BVT: add the core overwrite regression — pin a value, repoint the stage to
  new content, and assert the pinned datalink still reads the original bytes
  while the live datalink reads the new bytes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…24555

# Conflicts:
#	pkg/sql/plan/function/function_id.go
#	pkg/sql/plan/function/function_id_test.go
In compose multi-CN environments the SHARED file service includes an
internal directory prefix (e.g. server/data/) in its error messages,
making the path differ from the CAS key used in standalone mode.

Intercept ErrFileNotFound in the pinned read/stat paths and re-wrap
with NewFileNotFound(d.MoPath) so the message always uses the CAS key
(datalink_cas/<h2>/<hash>) regardless of how the file service stores
objects internally.  The BVT result file already expects this form.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Review feedback (XuPeng-SH): the content-addressed datalink store was
globally addressable by raw SHA-256, making a contenthash a bearer
capability — any caller knowing a hash could read pinned bytes without
the original path/stage authorization, and identical bytes were
deduplicated across tenants (cross-account visibility by hash).

Scope every CAS object by the calling account:
- CAS layout becomes datalink_cas/<accountID>/<h2>/<hash>.
- The accountID is resolved from the trusted execution context
  (defines.GetAccountId), never from the datalink URL, so it cannot be
  forged by editing query params.
- Reads reconstruct the key under the caller's account, so a contenthash
  pinned by one account is not readable by another and there is no
  cross-account dedup visibility.

Also key the pinned-not-found error on the content hash rather than the
internal CAS storage path: the path embeds the file-service layout
(server/data/ prefix in compose) and the account id, both
non-deterministic; the hash is what the user supplied.

Tests: per-account CAS isolation, account-scoped parse, cross-account
read isolation (UT), and a cross-account BVT case where a second account
cannot read the sys account's pinned content.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…d writes

Review feedback (XuPeng-SH):

1. datalink_pin() treated an already-pinned URL as idempotent after only
   validating the contenthash format, without checking the CAS object exists in
   the caller's account namespace. A contenthash minted by another account was
   returned unchanged and failed only later at read time. pinDatalink now calls
   CASExists for the caller's account: it returns the URL unchanged only when the
   blob is present, otherwise it strips the contenthash, re-reads the live source
   and re-pins for the current account. The re-pinned live bytes must still hash
   to the declared contenthash (verified before CASPut), so an unavailable version
   errors out rather than silently pinning different bytes.

2. Pinned datalinks read from account-scoped CAS keys, but save_file() still
   built a generic writer from MoPath, which for a pinned value is the internal
   CAS key rather than the original external URL. Datalink.NewWriter now rejects
   writes for contenthash datalinks: pinned CAS blobs are immutable.

Tests: NewWriter/save_file pinned-write rejection, per-account idempotency,
re-pin from live when the blob is absent, and re-pin rejection on changed live
content (UT); plus a BVT case asserting save_file on a pinned datalink errors.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Introduce package-level function variables for sqlexec.RunTxnWithSqlContext
and sqlexec.RunSql in production.go so tests can stub them with gostub.
Add production_test.go covering listAccountIDs, datalinkColumns, scanColumn,
liveSnapshots, refsForAccount, and DatalinkCASGCTaskMetadata, raising package
coverage from 43.3% to 84.0%.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@matrix-meow matrix-meow added the size/XXL Denotes a PR that changes 2000+ lines label Jun 4, 2026
@mergify mergify Bot added the kind/feature label Jun 4, 2026
@ck89119 ck89119 marked this pull request as draft June 4, 2026 03:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/feature size/XXL Denotes a PR that changes 2000+ lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants