Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions GENERATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# MobilityDuck generation — the canonical per-binding generator policy

This document is the contract for how MobilityDuck is generated, under the ecosystem-wide
per-binding generator policy.

## The policy (ecosystem-wide)

Every MobilityDB language/surface binding is a **pure projection of the MEOS-API catalog**,
and **each binding owns its own generator, in its own repo**, in a canonical layout. The
single source of truth is the **catalog** (`MEOS-API/output/meos-idl.json`, generated from
the MEOS C headers). A binding is an independent, plug-and-play module that owns its
generation.

Each binding repo satisfies the same invariants: in-repo generator; own
`tools/pin/compose-order.txt`; vendored/pinned catalog; thin language projection
(language-neutral decisions live in the catalog); full automation toward a zero-hand-written
surface (generate-then-retire; the last green-CI version is the equivalence probe).

## Current state and the canonical target

MobilityDuck's UDF layer is **hand-written C++** today. The canonical target is to **generate
the DuckDB scalar UDFs from `meos-idl.json`** via an in-repo generator
(`tools/codegen_duck_udfs.py`), organized **by `@ingroup` group** (one registration unit per
group, the same structure as the MEOS reference manual and the JMEOS/Spark generators),
family-gated by `#ifdef MEOS_ENABLE_<FAMILY>` emitted from catalog metadata. Marshalling
crosses MEOS values in-process as DuckDB `BLOB` (`BlobToTemporal` / `TemporalToBlob`, and the
per-family `BlobTo<X>` helpers), with the per-thread MEOS-init guard asserted in every
emitted body.

## Generate-then-retire — the green-CI version is the probe

The hand-written UDFs are replaced **family by family, never wipe-first**:

1. generate the full surface, build the extension green;
2. **prove generated ⊇ hand** against the **last green-CI version** (the equivalence probe)
— `scripts/parity-audit.py` + the full sqllogictest suite + the BerlinMOD benchmark;
3. retire the hand registrations for that family (drop the coexistence prefix);
4. repeat. End state: zero hand-registered scalar UDFs; only non-generatable hand code
survives, each justified (type registration, casts, aggregates, table functions). As the
generated surface lands, most of the hand feature PRs in `tools/pin/compose-order.txt` are
mooted.

## Pinning

The MEOS surface (vcpkg portfile + the vendored `meos-idl.json`) is pinned to a MobilityDB
`ecosystem-pin-*`. That pin is the *catalog/surface* input; MobilityDuck's own
`tools/pin/compose-order.txt` governs *this repo's* PR accumulate. See it for the composing
set and the disposition of every open PR.
93 changes: 93 additions & 0 deletions tools/pin/compose-order.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# USER-APPROVED-PIN-WRITE — creating MobilityDuck's committed pin manifest (user 2026-06-25,
# per-binding generator policy rollout). New file in the MobilityDuck repo, NOT a mutation
# of MobilityDB's pin tooling.
#
# MobilityDuck pin — THE canonical, dependency-ordered fold manifest (per-binding policy).
#
# Derived from the LIVE base<-head DAG (gh pr list, verified this turn), organized into the
# established waves. Within a stack, the BASE folds before its HEAD. Edit THIS file (reviewed)
# — never re-derive ad hoc. (policy: generator-per-binding-canonical-policy, binding-compose-order-manifest)
#
# NOTE: MobilityDuck is today HAND-WRITTEN C++ UDFs (the 30 PRs below are hand features). The
# catalog-driven generator (`tools/codegen_duck_udfs.py`) is the planned migration — see
# GENERATION.md; it is NOT yet on GitHub. These hand PRs dissolve as generation lands
# (generate-then-retire). base = current origin/main.
#
# Format: <PR#> <head-branch> # role. '?' = membership/order UNCONFIRMED.

# ── WAVE 0 — PIN FOUNDATION (the MEOS pin bump everything rebases on) ──
197 fix/bump-meos-pin-clean # pin vcpkg MEOS to the canonical ecosystem SHA + meosType->MeosType
# (the clean pin bump; the `fix/bump-meos-pin` base branch of the stacks below)
190 fix/worker-thread-linkage-crashes # worker-thread + linkage crash fixes (head: fix/meos-error-longjmp / #149)
185 fix/stbox-from-hexwkb-nul-terminate # stboxFromHexWKB past-buffer read (#170)
173 fix/span-bins-bind-copy-unique-ptr-conv
188 fix/normalize-tnumber-mult-to-mul # mult -> mul naming
165 cleanup/retire-geodetic-stbox-workarounds

# ── WAVE 1 — EXTENDED TYPES (port stack; fold bottom-up: tcbuffer<-tnpoint<-tpose<-trgeometry<-pointcloud<-tpcpoint<-tpcpatch) ──
150 feat/tnpoint-port-core # base feat/tcbuffer-port-core
151 feat/tpose-port-core # base feat/tnpoint-port-core
153 feat/trgeometry-port-core # base feat/tpose-port-core
154 feat/pointcloud-vcpkg-enabler # base feat/trgeometry-port-core
155 feat/tpcpoint-port-core # base feat/pointcloud-vcpkg-enabler
156 feat/tpcpatch-port-core # base feat/tpcpoint-port-core

# ── WAVE 2 — PARITY (cell-index/json/trig stack: th3index<-h3-prefilter<-{setset,asmfjson,expandspace,trig}<-pin-11c<-tjsonb) ──
193 feat/parity-th3index # full H3 cell index API (base fix/bump-meos-pin)
198 feat/parity-h3-static-prefilter # base feat/parity-th3index
192 feat/setset-spatial-join-udfs # base feat/parity-h3-static-prefilter
194 feat/asmfjson-number-temporal-types # base feat/parity-h3-static-prefilter
195 feat/expandspace-tgeo-overloads # base feat/parity-h3-static-prefilter
199 feat/parity-tfloat-trig # base feat/parity-h3-static-prefilter
200 feat/parity-pin-11c # base feat/parity-tfloat-trig (a pin bump inside the parity stack)
201 feat/parity-tjsonb # base feat/parity-pin-11c
203 feat/parity-tquadbin # QUADBIN/TQUADBIN cell index (main-based; bumps pin)

# ── WAVE 3 — GEOGRAPHY ──
168 doc/geography-boundary-design
169 feat/register-geography-logicaltype
174 feat/geography-io-udfs
175 feat/geography-casts
176 feat/geography-operations
177 feat/geography-test-matrix

# ── WAVE 4 — FEATURES ──
205 feat/parity-raster-quadbin # rasterTileValueQuadbin + trajectoryQuadbins (Raquet sampling)

# ── WAVE 5 — MULTI-VERSION / CI ──
166 feat/multi-duckdb-version-foundation
167 ci/multi-duckdb-version-matrix
186 build/fast-extension-iteration
189 ci/parallel-extension-build
170 ci/exclude-osx-arm64-pending-hex-wkb
171 ci/probe-mingw-build
172 ci/probe-musl-build
184 fix/icu-resilience-version-agnostic
182 fix/meos-tz-init-resilient
181 fix/tgeogpoint-test-tz-pinned

# ── WAVE 6 — DIAGNOSTICS / FIXES ──
162 diag/hex-wkb-odd-length
180 feat/hex-wkb-diag
163 fix/drop-range-todos

# ── WAVE 7 — DOCS ──
158 feat/edge-to-cloud-quickstart-rebased
159 doc/reviewer-guide-rebased
160 consolidate/pr-coordination-and-tz-lint-rebased
179 feat/iceberg-polaris-readiness
183 docs/duckdb-version-alignment-geoparquet

# ── WAVE 8 — BENCHMARK (evidence vehicle; not a deliverable) ──
196 feat/berlinmod-canonical-queries
202 consolidate/berlinmod-canonical-queries

# ── PIN BUMPS (pin mechanics, fold near WAVE 0 at assembly) ──
204 feat/pin-2026-06-18b # bump pinned MEOS to ecosystem-pin-2026-06-18b

# ════════════════════════════════════════════════════════════════════════════════════
# The deep parity/pin bumps inside the stacks (#200 pin-11c, #203/#204) are pin MECHANICS, not
# features; at assembly time the binding pins to the LATEST ecosystem pin once (GENERATION.md).
# The catalog generator (tools/codegen_duck_udfs.py) is the endgame: once generated ⊇ hand is
# proven, most of these hand feature PRs are mooted (generate-then-retire).
# ════════════════════════════════════════════════════════════════════════════════════
29 changes: 29 additions & 0 deletions tools/regen-from-pin.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
#!/usr/bin/env bash
# regen-from-pin.sh — regenerate the MobilityDuck UDFs from the MEOS catalog (per GENERATION.md).
#
# Usage: tools/regen-from-pin.sh <pin>
# env: CATALOG = path to meos-idl.json produced by MEOS-API run.py (required)
# MEOS_HEADERS = the installed MEOS headers dir for the pin-gate (required by the generator)
#
# NOTE: the catalog generator `tools/codegen_duck_udfs.py` is the target (GENERATION.md); it
# lands generate-then-retire alongside the hand-written UDFs. Until it is in the tree this
# script documents the invocation. Invoked standalone, or by tools/ecosystem-generate.sh.
set -euo pipefail
PIN="${1:?usage: regen-from-pin.sh <pin>}"
CATALOG="${CATALOG:?set CATALOG to the meos-idl.json from MEOS-API run.py}"
HERE="$(cd "$(dirname "$0")/.." && pwd)"
GEN="$HERE/tools/codegen_duck_udfs.py"

if [ ! -f "$GEN" ]; then
echo "NOTE: $GEN not present yet (the catalog generator lands generate-then-retire — see GENERATION.md)."
echo " Once it is in the tree, this script regenerates src/generated/generated_temporal_udfs.cpp."
exit 0
fi

# generator CLI: codegen_duck_udfs.py <catalog> <out.cpp> <headers-dir>
python3 "$GEN" "$CATALOG" "$HERE/src/generated/generated_temporal_udfs.cpp" "${MEOS_HEADERS:?set MEOS_HEADERS to the installed pin headers}"

# build-verify the extension (the in-repo fast build target)
( cd "$HERE" && cmake --build build/release --target mobilityduck_loadable_extension ) \
|| echo "WARN: MobilityDuck extension build returned non-zero"
echo "[mobilityduck] regenerated UDFs from catalog at pin $PIN"
Loading