From 30a040ccb47d0939ffa655c013b6597b55e881e8 Mon Sep 17 00:00:00 2001 From: "Motowidlo, Jaroslaw" Date: Thu, 14 May 2026 17:01:16 +0200 Subject: [PATCH] docs(convert-app): add conversion prompt, reference files and user guide - Add convert-app.prompt.md orchestrator in .github/prompts/ - Add 9 reference files in .github/skills/dlstreamer-coding-agent/convert-app/ - Add user guide page docs/user-guide/dev_guide/convert_app.md - Add "What Shapes the Converted Application" section to user guide - Register convert_app in dev_guide_index.md toctree --- .github/prompts/convert-app.prompt.md | 178 +++++++ .../assets/cmake-app-template.cmake | 89 ++++ .../convert-app/cmake-and-deliverables.md | 143 ++++++ .../convert-app/conversion-bootstrap.md | 198 ++++++++ .../convert-app/deprecation-discovery.md | 71 +++ .../convert-app/documentation-spec.md | 330 +++++++++++++ .../convert-app/final-audit-checklist.md | 77 +++ .../convert-app/model-sourcing.md | 228 +++++++++ .../convert-app/pipeline-implementation.md | 449 ++++++++++++++++++ .../convert-app/runsh-pitfalls.md | 56 +++ .../convert-app/validation-protocol.md | 153 ++++++ docs/user-guide/dev_guide/coding_agent.md | 53 +++ docs/user-guide/dev_guide/convert_app.md | 128 +++++ docs/user-guide/dev_guide/dev_guide_index.md | 3 + 14 files changed, 2156 insertions(+) create mode 100644 .github/prompts/convert-app.prompt.md create mode 100644 .github/skills/dlstreamer-coding-agent/assets/cmake-app-template.cmake create mode 100644 .github/skills/dlstreamer-coding-agent/convert-app/cmake-and-deliverables.md create mode 100644 .github/skills/dlstreamer-coding-agent/convert-app/conversion-bootstrap.md create mode 100644 .github/skills/dlstreamer-coding-agent/convert-app/deprecation-discovery.md create mode 100644 .github/skills/dlstreamer-coding-agent/convert-app/documentation-spec.md create mode 100644 .github/skills/dlstreamer-coding-agent/convert-app/final-audit-checklist.md create mode 100644 .github/skills/dlstreamer-coding-agent/convert-app/model-sourcing.md create mode 100644 .github/skills/dlstreamer-coding-agent/convert-app/pipeline-implementation.md create mode 100644 .github/skills/dlstreamer-coding-agent/convert-app/runsh-pitfalls.md create mode 100644 .github/skills/dlstreamer-coding-agent/convert-app/validation-protocol.md create mode 100644 docs/user-guide/dev_guide/convert_app.md diff --git a/.github/prompts/convert-app.prompt.md b/.github/prompts/convert-app.prompt.md new file mode 100644 index 000000000..98a627880 --- /dev/null +++ b/.github/prompts/convert-app.prompt.md @@ -0,0 +1,178 @@ +--- +description: Convert the provided application into a native Intel DL Streamer / OpenVINO C++ application. +--- + +# Goal + +Convert the provided application into a native Intel DL Streamer / OpenVINO C++ +application. + +# How this prompt is organized + +This prompt is intentionally short. It is a **sequential orchestrator** — each +step links to a single, focused reference file under +[`.github/skills/dlstreamer-coding-agent/convert-app/`](../skills/dlstreamer-coding-agent/convert-app/). +Read the linked reference **at the moment you start a given step** (not all +at once at the beginning) so the relevant rules stay in active attention +while you work on that step. + +| # | Step | Reference | +|---|---|---| +| 0 | Pipeline construction skill (always available) | [SKILL.md](../skills/dlstreamer-coding-agent/SKILL.md) | +| 1 | Scaffold from scratch (source + output dir + env) | [conversion-bootstrap.md](../skills/dlstreamer-coding-agent/convert-app/conversion-bootstrap.md) | +| 2 | Build the per-run deprecation deny-list | [deprecation-discovery.md](../skills/dlstreamer-coding-agent/convert-app/deprecation-discovery.md) | +| 3 | Review the source app + build the functional block inventory | (this file, §Step 3) | +| 4 | Plan model substitutions + 1-to-1 element mapping | [model-sourcing.md](../skills/dlstreamer-coding-agent/convert-app/model-sourcing.md) | +| 5 | Implement (pipeline, probes, paths, encoders, traceability) | [pipeline-implementation.md](../skills/dlstreamer-coding-agent/convert-app/pipeline-implementation.md) + [cmake-and-deliverables.md](../skills/dlstreamer-coding-agent/convert-app/cmake-and-deliverables.md) | +| 6 | Verify correctness (clean-shell runs + auto-fix loop) | [validation-protocol.md](../skills/dlstreamer-coding-agent/convert-app/validation-protocol.md) + [runsh-pitfalls.md](../skills/dlstreamer-coding-agent/convert-app/runsh-pitfalls.md) | +| 7 | Document the conversion | [documentation-spec.md](../skills/dlstreamer-coding-agent/convert-app/documentation-spec.md) | +| 8 | Final compliance audit (last gate before reporting completion) | [final-audit-checklist.md](../skills/dlstreamer-coding-agent/convert-app/final-audit-checklist.md) | + +# Target platform + +Intel hardware — iGPU / dGPU (Arc, Iris Xe), CPU, or NPU. Default to `GPU` +with CPU fallback unless the user specifies otherwise. + +# Checkpoint protocol (mandatory liveness signal) + +After completing each numbered step below, the agent MUST emit a one-line +checkpoint message of the form +`Checkpoint N/8: ` AND make at least one tool +call as part of that checkpoint (typically `manage_todo_list` to flip the step +from `in-progress` to `completed` and the next from `not-started` to +`in-progress`). + +If a step is naturally large (e.g. step 5 Implement), emit additional +intra-step checkpoints (`Checkpoint 5/8a`, `5/8b`, …) every time a major +sub-artifact is created (CMakeLists, main `.cpp`, `run.sh`, `export_models.sh`). + +**Never go more than ~3 minutes of wall-clock time without a tool call or a +checkpoint message.** This guarantees a visible liveness signal lands in the +chat for every step, so the user can distinguish a working agent from a +stalled one even during a single long turn. If a substep is genuinely +long-running (e.g. `cmake --build`, `omz_downloader`), break it up: emit a +short `Checkpoint N/8x: — starting` before launching the command, +then a follow-up checkpoint with the outcome when it returns. + +# Steps + +## Step 1 — Scaffold from scratch + +Execute the procedure in +[conversion-bootstrap.md](../skills/dlstreamer-coding-agent/convert-app/conversion-bootstrap.md): + +- Resolve workspace root. +- Idempotent source clone. +- Create a **fresh numbered output directory** (`_NNN/`). +- Announce the directory name to the user up-front. +- Apply the mandatory env-setup recipe (§4.1) in full, run the verification + (§4.2), and if anything reports `MISSING:` apply the registry-cache + recovery (§4.3) before proceeding. + +All subsequent file writes happen inside the newly-created numbered directory. +The exact same env-setup recipe MUST also be wired verbatim into the generated +`run.sh` (see [cmake-and-deliverables.md](../skills/dlstreamer-coding-agent/convert-app/cmake-and-deliverables.md) §4). + +## Step 2 — Run the upstream deprecation discovery + +Execute the grep procedure in +[deprecation-discovery.md](../skills/dlstreamer-coding-agent/convert-app/deprecation-discovery.md) +and build the per-run deny-list. Keep it handy as the authoritative deny-list +for steps 4–6. + +## Step 3 — Review the source app + build the functional block inventory + +Review the cloned source application to understand its functionality, +structure, and generated output. Then build a **functional block inventory** +— a numbered list of every distinct: + +- inference stage (PGIE, SGIE, secondary detectors/classifiers), +- pre/post-processing step, +- tracking, +- analytics, +- visualization element + +…together with what each block contributes to the visible output (bounding +boxes, labels, counters, overlay text). This inventory is the contract for +step 4 — every block listed here MUST receive a dedicated counterpart in the +converted pipeline. + +## Step 4 — Plan the conversion + +Map **every** functional block from the step-3 inventory to a corresponding +DL Streamer element or OpenVINO API. The mapping MUST be **strictly 1-to-1** +— simplifications, merges, and stage drops are NOT permitted. Example: an +N-stage cascade in the source app (e.g. detect → detect → classify) becomes +an N-stage cascade in the conversion using the equivalent DL Streamer +elements (e.g. `gvadetect` → `gvadetect` on upstream ROIs → `gvaclassify`). +The only exception is blocks explicitly listed under *Scope → Exclude* in +[cmake-and-deliverables.md](../skills/dlstreamer-coding-agent/convert-app/cmake-and-deliverables.md). + +For model substitutions, precision, OCR language defaults, and device +mapping, follow +[model-sourcing.md](../skills/dlstreamer-coding-agent/convert-app/model-sourcing.md). + +The agent MUST make all element-mapping decisions **autonomously** based on +DL Streamer documentation, installed element properties +(`gst-inspect-1.0 `), sample code, and the coding-agent skill — +**without** asking the user for confirmation. If multiple elements could +satisfy a block, pick the best match (highest performance, lowest deprecation +risk, closest semantic equivalence) and document the choice with a one-line +justification in the README. + +Cross-check every chosen element/property against the deny-list built in +step 2. + +## Step 5 — Implement + +Implement the conversion in C++ following +[pipeline-implementation.md](../skills/dlstreamer-coding-agent/convert-app/pipeline-implementation.md) +end-to-end (see its §§1–10 for the full set of rules). + +Generate the build system and wrapper per +[cmake-and-deliverables.md](../skills/dlstreamer-coding-agent/convert-app/cmake-and-deliverables.md): +`CMakeLists.txt` (with mandatory adjustments), `run.sh` (with `--help`, +`--sink display|file|fake`, headless auto-detection, pre-flight resource +audit), and `export_models.sh` if applicable. + +Cross-check every element, property, file format, and metadata API against +the deny-list from step 2. + +## Step 6 — Verify correctness + +Run the full validation protocol in +[validation-protocol.md](../skills/dlstreamer-coding-agent/convert-app/validation-protocol.md): + +1. Capture baseline outputs (if the original can be run on this host). +2. Build (`cmake -S . -B build && cmake --build build -j$(nproc)`). +3. Clean-shell `./run.sh` runs (`env -i …`) for all five invocations + (`--help`, defaults, `--sink fake`, `--sink file`, `--sink invalid`) with + the **auto-fix loop**. +4. Compare against baseline (if available). +5. Benchmark. +6. Deprecated-API scan (dynamic grep against the step-2 deny-list). + +When a clean-shell run fails, map the error to +[runsh-pitfalls.md](../skills/dlstreamer-coding-agent/convert-app/runsh-pitfalls.md), +apply the fix, and **re-run every test** — fixes for one bug frequently +expose the next. + +## Step 7 — Document + +Generate the README **strictly** following +[documentation-spec.md](../skills/dlstreamer-coding-agent/convert-app/documentation-spec.md) — see its §1 +(required README sections) and §3 (Conversion Notes sub-sections) for the +full contract. + +Source traceability comments in the C++ code are mandatory and are enforced +by [pipeline-implementation.md](../skills/dlstreamer-coding-agent/convert-app/pipeline-implementation.md) +§9 — not by the README itself. + +## Step 8 — Final requirements compliance audit + +Walk through the checklist in +[final-audit-checklist.md](../skills/dlstreamer-coding-agent/convert-app/final-audit-checklist.md) +and verify every requirement with a concrete artifact or test result. Emit +the filled checklist (with `[x]` / `[ ]` and a one-line note per item) in the +final user-facing summary message. Any remaining `[ ]` items block completion +— fix them and re-audit. diff --git a/.github/skills/dlstreamer-coding-agent/assets/cmake-app-template.cmake b/.github/skills/dlstreamer-coding-agent/assets/cmake-app-template.cmake new file mode 100644 index 000000000..d3b4c60ce --- /dev/null +++ b/.github/skills/dlstreamer-coding-agent/assets/cmake-app-template.cmake @@ -0,0 +1,89 @@ +# ============================================================================== +# Copyright (C) 2026 Intel Corporation +# +# SPDX-License-Identifier: MIT +# ============================================================================== +# +# CMakeLists.txt template for a standalone DL Streamer / OpenVINO C++ application. +# +# Usage: +# 1. Copy this file as `CMakeLists.txt` into your application directory. +# 2. Replace `{{TARGET_NAME}}` with the application name (no spaces). +# 3. Remove sections marked OPTIONAL if your app does not need them +# (OpenCV, custom compile flags). +# 4. Place all `.cpp` and `.h` files in the same directory as this CMakeLists.txt +# (they are picked up via GLOB). +# +# Build: +# mkdir build && cd build +# cmake .. +# make -j$(nproc) +# ============================================================================== + +cmake_minimum_required(VERSION 3.20) + +set(TARGET_NAME "{{TARGET_NAME}}") +project(${TARGET_NAME} CXX) + +# --- Required dependencies ---------------------------------------------------- +find_package(PkgConfig REQUIRED) + +pkg_check_modules(GSTREAMER gstreamer-1.0>=1.16 REQUIRED) +pkg_check_modules(GSTVIDEO gstreamer-video-1.0>=1.16 REQUIRED) +pkg_check_modules(GSTANALYTICS gstreamer-analytics-1.0>=1.16 REQUIRED) +pkg_check_modules(GLIB2 glib-2.0 REQUIRED) + +# --- OPTIONAL: OpenCV (remove this block if your app does not use OpenCV) ----- +find_package(OpenCV OPTIONAL_COMPONENTS core imgproc) + +# --- DL Streamer install layout (default Intel DL Streamer install paths) ----- +set(DLSTREAMER_INSTALL_PREFIX /opt/intel/dlstreamer) +set(DLSTREAMER_INCLUDE_DIRS ${DLSTREAMER_INSTALL_PREFIX}/include) +set(GSTREAMER_INCLUDE_DIR ${DLSTREAMER_INSTALL_PREFIX}/gstreamer/include/gstreamer-1.0) + +link_directories( + ${DLSTREAMER_INSTALL_PREFIX}/lib + ${DLSTREAMER_INSTALL_PREFIX}/Release/lib + ${DLSTREAMER_INSTALL_PREFIX}/gstreamer/lib + /usr/lib/x86_64-linux-gnu +) + +# --- Sources ------------------------------------------------------------------ +file(GLOB MAIN_SRC *.cpp) +file(GLOB MAIN_HEADERS *.h) + +add_executable(${TARGET_NAME} ${MAIN_SRC} ${MAIN_HEADERS}) + +set_target_properties(${TARGET_NAME} PROPERTIES CXX_STANDARD 23) + +# --- OPTIONAL: silence noisy GStreamer enum-conversion warnings --------------- +include(CheckCXXCompilerFlag) +check_cxx_compiler_flag(-Wno-deprecated-enum-enum-conversion HAVE_DEPRECATED_ENUM_CONVERSION) +if(HAVE_DEPRECATED_ENUM_CONVERSION) + target_compile_options(${TARGET_NAME} PRIVATE -Wno-deprecated-enum-enum-conversion) +endif() + +# --- Include directories ------------------------------------------------------ +target_include_directories(${TARGET_NAME} + PRIVATE + ${DLSTREAMER_INCLUDE_DIRS} + ${DLSTREAMER_INCLUDE_DIRS}/dlstreamer/gst + ${GSTREAMER_INCLUDE_DIR} + ${GSTREAMER_INCLUDE_DIRS} + ${GSTVIDEO_INCLUDE_DIRS} + ${GSTANALYTICS_INCLUDE_DIRS} + ${GLIB2_INCLUDE_DIRS} + ${OpenCV_INCLUDE_DIRS} # OPTIONAL — remove if not using OpenCV +) + +# --- Link libraries ----------------------------------------------------------- +target_link_libraries(${TARGET_NAME} + PUBLIC + dlstreamer_gst_meta + PRIVATE + ${GLIB2_LIBRARIES} + ${GSTREAMER_LIBRARIES} + ${GSTVIDEO_LIBRARIES} + ${GSTANALYTICS_LIBRARIES} + ${OpenCV_LIBS} # OPTIONAL — remove if not using OpenCV +) diff --git a/.github/skills/dlstreamer-coding-agent/convert-app/cmake-and-deliverables.md b/.github/skills/dlstreamer-coding-agent/convert-app/cmake-and-deliverables.md new file mode 100644 index 000000000..82936b5a4 --- /dev/null +++ b/.github/skills/dlstreamer-coding-agent/convert-app/cmake-and-deliverables.md @@ -0,0 +1,143 @@ +# Build System & Deliverables + +Used during **implementation (step 5)** and produced as part of the output of +[`convert-app.prompt.md`](../../../prompts/convert-app.prompt.md). + +## Build Instructions + +Generate `CMakeLists.txt` from the +[CMake Application Template](../assets/cmake-app-template.cmake) — copy the +file into the application directory and substitute `{{TARGET_NAME}}` with the +converted app's name. Remove sections marked `OPTIONAL` if the app does not +need them (e.g. OpenCV). + +### CMake template adjustments (mandatory) + +- **`Release/lib` link directory** — the template's `link_directories()` block + is missing `${DLSTREAMER_INSTALL_PREFIX}/Release/lib`. On many DL Streamer + installations, `libdlstreamer_gst_meta.so` and other runtime libraries live + under `/opt/intel/dlstreamer/Release/lib/`, not `/opt/intel/dlstreamer/lib/`. + Without this path, the linker fails with `cannot find -ldlstreamer_gst_meta`. + **Add `${DLSTREAMER_INSTALL_PREFIX}/Release/lib`** to the + `link_directories()` block. + +- **OpenCV is REQUIRED when using `GVA::VideoFrame`** — the template declares + `find_package(OpenCV OPTIONAL_COMPONENTS ...)`. However, the + `dlstreamer/gst/videoanalytics/video_frame.h` header (which provides + `GVA::VideoFrame`, `GVA::RegionOfInterest`, `GVA::Tensor`) transitively + includes OpenCV headers. If the converted app uses any GVA C++ API for + metadata access (which is the recommended pattern for tensor data + extraction), OpenCV MUST be changed from `OPTIONAL` to `REQUIRED`: + + ```cmake + find_package(OpenCV REQUIRED COMPONENTS core imgproc) + ``` + + Without this, compilation fails with missing `cv::Mat` or + `opencv2/core.hpp` errors. + +For a real-world example of the same pattern in production use, see +[`samples/gstreamer/cpp/draw_face_attributes/CMakeLists.txt`](../../../samples/gstreamer/cpp/draw_face_attributes/CMakeLists.txt). + +## Deliverables + +The converted application directory MUST contain: + +### 1. `CMakeLists.txt` + +Generated from the template above with the mandatory adjustments applied. + +### 2. Source files (`.cpp` / `.h`) + +The converted C++ application. Reference implementation to study first: +[`samples/gstreamer/cpp/draw_face_attributes/`](../../../samples/gstreamer/cpp/draw_face_attributes/) +— a working example of a native DL Streamer / OpenVINO C++ application that +defines the expected code structure, `CMakeLists.txt` layout, and coding +conventions to follow. + +### 3. `README.md` + +See [`documentation-spec.md`](./documentation-spec.md) for the full +specification (required sections, Pipeline Comparison diagram spec, Conversion +Notes tables, traceability comments). + +### 4. `run.sh` + +A thin wrapper that invokes the built binary with sensible default arguments +so the user can run the app with a single command (`./run.sh`). The script +MUST: + +- **Check that `build/` exists**; if not, instruct the user to build + first. +- **Use environment variables for overridable inputs** (`INPUT`, `MODEL`, + `DEVICE`) with sensible defaults. Do **not** reuse env var names owned by + `setup_dls_env.sh` (notably `MODELS_PATH` — use a unique app-specific name + of the form `_MODELS_PATH`, e.g. `LPR_MODELS_PATH` for the LPR app, + `PEOPLE_MODELS_PATH` for a people-detection app, etc.). See + [`runsh-pitfalls.md`](./runsh-pitfalls.md). +- **Forward extra CLI args** (`"$@"`) to the binary so power users can override + anything. + +- **Detect whether a display is available** — check `$DISPLAY` and + `$WAYLAND_DISPLAY` (and on Windows, the equivalent presence of an + interactive desktop session). If neither is set (headless / SSH session / + CI), the script MUST automatically force the application into file-output / + no-display mode (e.g. add `--no-display` or set the equivalent env var) and + print a one-line notice to the user explaining that the display was not + found and the output is being written to the file sink instead. The user + must always be able to override this auto-detection by exporting `DISPLAY` + (or by passing an explicit flag through `"$@"`). + +- **Provide a `--help` / `-h` flag** that prints, on stdout and with exit + status `0`: + - A one-line synopsis + (`Usage: ./run.sh [--help] [--sink display|file|fake] [extra binary flags...]`) + - The list of supported environment variables (`INPUT`, `DEVICE`, `OUTPUT`, + `_MODELS_PATH`, plus any app-specific ones) with defaults and meanings. + - The list of wrapper-level flags (at minimum `--help`, `--sink`) with + examples. + - A pointer telling the user that any further flag is forwarded to the + underlying binary, and that `./build/ --help` lists every + binary-level flag. + +- **Support a `--sink` selector** (or equivalent env var, + e.g. `SINK=display|file|fake`) that switches the output backend among three + modes: + 1. `display` — render to an on-screen window (e.g. `autovideosink` / + `gvawatermark` + display sink). Subject to headless auto-detection. + 2. `file` — encode and write to a file (`OUTPUT` path), no window. + 3. `fake` — discard frames via `fakesink` (useful for benchmarking and CI; + produces no file and no window). + + The default mode is `display` when a display is present, otherwise `file`. + The script MUST validate the chosen value and fail fast with a clear error + message on an unknown sink. Document each mode and its default in the + README's `Run` and `Command-Line Arguments` sections. + +See [`runsh-pitfalls.md`](./runsh-pitfalls.md) for the full list of bugs to +prevent (env var clobbering, plugin path, kmssink, audio track stream +selection, GPU↔CPU transfer, transparent watermark, encoder fallback, etc.). + +### 5. `export_models.sh` (if applicable) + +When models must be downloaded/converted from upstream sources, ship an +executable script that performs the download + conversion idempotently. The +script MUST: + +- Skip work when the target files already exist. +- Print clear progress messages. +- Exit non-zero on any failure. +- Be invoked automatically by `run.sh` (or surface a clear error pointing the + user to it) when the model files are missing at the default path. + +## Scope reminder + +Convert **only** the inference and media pipeline logic. **Exclude**: + +- Model training or fine-tuning code. +- GUI / visualization frameworks not available in DL Streamer (Qt, OpenCV + HighGUI windows, etc. — replace with `gvawatermark + autovideosink` or file + sink). +- Cloud-specific APIs (AWS, GCP, Azure SDKs). +- CUDA-specific extensions with no OpenVINO equivalent (flag these explicitly + in the documentation). diff --git a/.github/skills/dlstreamer-coding-agent/convert-app/conversion-bootstrap.md b/.github/skills/dlstreamer-coding-agent/convert-app/conversion-bootstrap.md new file mode 100644 index 000000000..63701d7f2 --- /dev/null +++ b/.github/skills/dlstreamer-coding-agent/convert-app/conversion-bootstrap.md @@ -0,0 +1,198 @@ +# Conversion Bootstrap — Source Acquisition & Environment Setup + +Used by **step 1** of [`convert-app.prompt.md`](../../../prompts/convert-app.prompt.md). + +## 1. Workspace root resolution + +The prompt MUST be runnable end-to-end starting from an empty workspace. The agent +never assumes any prior state. + +- **Workspace root** = the parent directory of the `dlstreamer` repository in the + user's open workspace (i.e. `dirname(/dlstreamer)`). +- All checkout / output happens directly under this root, **never inside + `dlstreamer/`**. + +## 2. Idempotent source clone + +If the user provides a remote URL (e.g. GitHub repository): + +```bash +if [ ! -d "/" ]; then + git clone "/" +fi +``` + +- If the directory exists, **reuse it** — do not re-clone, do not delete. +- Record the source URL and resolved commit hash (`git rev-parse HEAD`) in the + README so the conversion is reproducible. + +## 3. Fresh numbered output directory (per invocation) + +Every invocation of `/convert-app` MUST create a **brand-new** output directory. +Previous conversions are never overwritten or modified. + +Naming scheme: + +- **Base name**: `` derived deterministically from the source + (e.g. `deepstream_lpr_app` → `deepstream_lpr_app_dls`). +- **Suffix**: 3-digit iteration `_NNN` where `NNN` is the smallest integer ≥ 1 + such that `/_NNN/` does not yet exist. + Examples: `deepstream_lpr_app_dls_001`, `deepstream_lpr_app_dls_002`, … + +Procedure: + +```bash +# Scan workspace root for existing numbered dirs, pick next index +max=$(ls -d /_[0-9][0-9][0-9] 2>/dev/null \ + | sed -E 's/.*_([0-9]{3})$/\1/' | sort -n | tail -1) +next=$(printf "%03d" $(( ${max:-0} + 1 ))) +mkdir -p "/_${next}/" +``` + +- All deliverables go into this newly-created numbered directory **only**. +- Do not touch the previous numbered directories. +- **Report the chosen directory name to the user up-front**, e.g. + > "Creating conversion #003 in `deepstream_lpr_app_dls_003/`" + +## 4. DL Streamer runtime prerequisite + +A working DL Streamer installation **with environment variables exported** is +required to build, run, and validate the converted application. Before starting +the conversion, the agent MUST follow the **mandatory env-setup recipe** below +in order. Skipping any step has historically caused the conversion to abort +mid-flight with `No such element 'gvadetect'`. + +### 4.1 Mandatory env-setup recipe (apply in order) + +Apply this recipe both in the **agent's own shell** (before running any +`gst-inspect-1.0` verification) AND verbatim in the generated `run.sh` (see +[cmake-and-deliverables.md](./cmake-and-deliverables.md) §4 and the *Known +`run.sh` pitfalls* table in [runsh-pitfalls.md](./runsh-pitfalls.md)). + +```bash +# Step 1 — Detect install +if [[ ! -d /opt/intel/dlstreamer ]]; then + echo "ERROR: DL Streamer not found at /opt/intel/dlstreamer." >&2 + # Direct user to the install guide and exit non-zero + exit 1 +fi + +# Step 2 — Source the upstream env script with set -u relaxed +# setup_dls_env.sh references unset vars (e.g. GST_PLUGIN_FEATURE_RANK) and +# crashes under `set -u`. Wrap the source in set +u / set -u. +set +u +source /opt/intel/dlstreamer/scripts/setup_dls_env.sh +set -u + +# Step 3 — Add Release/lib to GST_PLUGIN_PATH and LD_LIBRARY_PATH +# setup_dls_env.sh does NOT add the Release/lib directory, but on most +# installs that is exactly where libgstvideoanalytics.so (which provides +# gvadetect/gvaclassify/gvawatermark/gvatrack) actually lives. +# Without this, gst-inspect-1.0 gvadetect returns "No such element". +export GST_PLUGIN_PATH="/opt/intel/dlstreamer/Release/lib:${GST_PLUGIN_PATH:-}" +export LD_LIBRARY_PATH="/opt/intel/dlstreamer/opencv/lib:/opt/intel/dlstreamer/Release/lib:${LD_LIBRARY_PATH:-}" + +# Step 4 — Deprioritize kmssink (avoids negotiation errors on remote/SSH/CI) +export GST_PLUGIN_FEATURE_RANK="kmssink:NONE,${GST_PLUGIN_FEATURE_RANK:-}" +``` + +**User-facing notice (mandatory when the agent applies §4.1 in its own shell +because the env was not already exported)** — the agent MUST explicitly inform +the user that the fix is per-session and tell them how to make it permanent: + +> "DL Streamer was installed but its environment was not exported in this +> shell. I sourced `setup_dls_env.sh` (plus the `Release/lib` / `kmssink:NONE` +> additions above) for the current session. To make this permanent, add the +> same `source` line — followed by the `export GST_PLUGIN_PATH=…`, +> `export LD_LIBRARY_PATH=…`, and `export GST_PLUGIN_FEATURE_RANK=…` lines — +> to your `~/.bashrc` (or `~/.zshrc` on zsh)." + +The generated `run.sh` applies the full recipe at every launch, so the user +does NOT need the shell-rc change to run the converted app — only for ad-hoc +`gst-inspect-1.0` / `gst-launch-1.0` commands or building other DL Streamer +apps in fresh shells. + +### 4.2 Mandatory verification (the only acceptable evidence of success) + +After applying the recipe, the agent MUST verify each of the GVA elements the +converted pipeline will actually use. Do NOT only check `gvadetect` — a partial +plugin load can leave `gvadetect` working while `gvaclassify` or `gvatrack` is +silently missing. + +```bash +for e in gvadetect gvaclassify gvawatermark gvatrack gvafpscounter gvapython; do + gst-inspect-1.0 "$e" >/dev/null 2>&1 && echo "OK: $e" || echo "MISSING: $e" +done +``` + +All required elements MUST report `OK:` before proceeding to step 2 of the +prompt. If any reports `MISSING:`, apply the recovery procedure in §4.3 below. + +### 4.3 Registry-cache recovery (if verification still fails) + +A corrupted `~/.cache/gstreamer-1.0/registry.x86_64.bin` can persist failed +discovery state across shells. Symptoms: + +- `gst-inspect-1.0 gvadetect` returns "No such element" even after applying + §4.1 in full. +- `gst-plugin-scanner` prints `CRITICAL **: Couldn't set __plugin__ attribute` + or `TypeError: PyModule_AddObjectRef()` (a broken `libgstpython.so` aborted + registry rebuild on a previous run, leaving a stale partial cache). +- `gst-inspect-1.0 --gst-plugin-load=/opt/intel/dlstreamer/Release/lib/libgstvideoanalytics.so gvadetect` + **does** print the factory details (proves the `.so` itself is fine — only + auto-discovery is broken). + +Recovery: + +```bash +# 1. Delete the corrupted registry cache +rm -rf ~/.cache/gstreamer-1.0 + +# 2. Re-apply the §4.1 recipe in a clean shell (env -i …) to ensure the +# registry is rebuilt with the correct GST_PLUGIN_PATH on the first try +env -i HOME="$HOME" PATH="/usr/local/bin:/usr/bin:/bin" TERM="${TERM:-xterm}" \ + bash -lc ' + set +u; source /opt/intel/dlstreamer/scripts/setup_dls_env.sh; set -u + export GST_PLUGIN_PATH="/opt/intel/dlstreamer/Release/lib:${GST_PLUGIN_PATH:-}" + export LD_LIBRARY_PATH="/opt/intel/dlstreamer/opencv/lib:/opt/intel/dlstreamer/Release/lib:${LD_LIBRARY_PATH:-}" + gst-inspect-1.0 gvadetect >/dev/null 2>&1 && echo OK || echo STILL_BROKEN + ' + +# 3. If STILL_BROKEN — the libgstpython.so plugin shipped with DL Streamer is +# incompatible with the system Python. Move it aside as a non-destructive +# workaround and rebuild the registry. Inform the user of the move. +# Only do this if the conversion does NOT rely on `gvapython`. +PY_PLUGIN=/opt/intel/dlstreamer/gstreamer/lib/gstreamer-1.0/libgstpython.so +if [[ -f "$PY_PLUGIN" ]]; then + sudo mv "$PY_PLUGIN" "${PY_PLUGIN}.disabled-by-agent" + rm -rf ~/.cache/gstreamer-1.0 + # re-verify +fi +``` + +If §4.3 step 3 is reached, the agent MUST: + +- Surface the change to the user (the inline quoted notice above, including + the exact `sudo mv ... ${PY_PLUGIN}` restore command) AND record the + workaround in the README under `Conversion Notes → Environment workarounds`. +- If the converted pipeline DOES need `gvapython`, do NOT apply step 3 — + instead block the conversion and ask the user to fix the Python plugin + (mismatch between DLS-shipped Python and system Python). + +### 4.4 Install-from-scratch (only if `/opt/intel/dlstreamer` is missing) + +Follow the official guide for the user's platform: + +- [Installation Guide Index](../../../../docs/user-guide/get_started/install/install_guide_index.md) +- [Ubuntu](../../../../docs/user-guide/get_started/install/install_guide_ubuntu.md) +- [Ubuntu on WSL2](../../../../docs/user-guide/get_started/install/install_guide_ubuntu_wsl2.md) +- [Windows](../../../../docs/user-guide/get_started/install/install_guide_windows.md) + +Then return to §4.1 and proceed. + +### 4.5 Documentation requirement + +The converted app's README MUST reproduce the install-guide reference(s) and +the §4.1 env-setup recipe under `Prerequisites`, and record any §4.3 step 3 +workaround under `Conversion Notes → Environment workarounds`. See +[`documentation-spec.md`](./documentation-spec.md) §1 for the README layout. diff --git a/.github/skills/dlstreamer-coding-agent/convert-app/deprecation-discovery.md b/.github/skills/dlstreamer-coding-agent/convert-app/deprecation-discovery.md new file mode 100644 index 000000000..56388a712 --- /dev/null +++ b/.github/skills/dlstreamer-coding-agent/convert-app/deprecation-discovery.md @@ -0,0 +1,71 @@ +# Deprecated APIs — DO NOT USE + +Used by **step 2** of [`convert-app.prompt.md`](../../../prompts/convert-app.prompt.md). + +The agent MUST NOT introduce any DL Streamer or GStreamer API/element/file format +that the official documentation marks as **deprecated**, **legacy**, +**discontinued**, **obsolete**, or **"will be removed"**. This applies to new +code generated during conversion regardless of whether the original app used a +now-deprecated mechanism — the conversion is the opportunity to modernize, not +to mirror legacy choices. + +Before using ANY element, property, file format, or metadata API, the agent +MUST discover the current set of deprecations directly from the upstream +documentation — never from a hard-coded list (which would go stale). + +## Discovery procedure + +1. **Scan the docs and skills for deprecation notices** at the start of every + conversion run: + + ```bash + grep -rniB1 -A3 --include='*.md' \ + -E '\b(deprecat|discontinu|obsolete|legacy|will be removed|no longer supported|end[- ]of[- ]life)\b' \ + /dlstreamer/docs \ + /dlstreamer/.github/skills \ + | grep -viE 'CHANGELOG|third[-_]party|node_modules' + ``` + +2. **Build a per-run deprecation table** from the grep output. For each hit + extract: + - the deprecated symbol/element/file/API name, + - the source file + line, + - (from the surrounding paragraph) the documented replacement. + +3. **Treat that table as authoritative** for the current conversion. The agent + MUST re-run this discovery on every `/convert-app` invocation; do not cache + results from a previous run. + +4. **Forbid every discovered item in newly-generated code.** Pick the documented + replacement from the same paragraph the deprecation notice lives in. If the + replacement is not stated explicitly, follow the link the deprecation notice + points to and read it to determine the correct modern API. + +5. **Record the resulting table in the README** under + `Conversion Notes → Deprecation discovery (run NNN)` so the user can audit + which version of the docs the conversion was made against. + +## Enforcement + +1. If a deprecated API is the **only** technically viable path for a given + conversion (extremely rare), the agent MUST: + - Document the constraint in the README under + `Conversion Notes → Deprecated API usage justification`, naming the exact + API, the deprecation source file + line found in step 1, the replacement + that was attempted, and the concrete reason it failed. + - Add a `TODO:` comment in the source code at the call site referencing that + README section. + - Surface the constraint in the user-facing summary message at the end of + the run. + +2. When converting a source application that itself relies on a now-deprecated + DL Streamer pattern (e.g. an old sample shipped with a `model_proc/*.json`), + the agent MUST modernize it during the port — do NOT carry the legacy + artifact forward "because the upstream sample uses it". + +## Final scan (validation) + +The final dynamic-grep scan that blocks the conversion if any deprecated +construct survives in `/` is specified in +[`validation-protocol.md`](./validation-protocol.md) §6. The agent runs that +scan during step 6, using the per-run table built above as the regex source. diff --git a/.github/skills/dlstreamer-coding-agent/convert-app/documentation-spec.md b/.github/skills/dlstreamer-coding-agent/convert-app/documentation-spec.md new file mode 100644 index 000000000..b8aa90662 --- /dev/null +++ b/.github/skills/dlstreamer-coding-agent/convert-app/documentation-spec.md @@ -0,0 +1,330 @@ +# Documentation Specification + +Used by **step 7 (Document)** of [`convert-app.prompt.md`](../../../prompts/convert-app.prompt.md). + +This file is the single source of truth for what the converted app's +documentation MUST contain. The agent MUST generate the README based on the +[README Template](../assets/README-template.md), adapted for C++, with **all** +the sections and tables specified below. + +## 1. Required README sections (in order) + +1. **Title + one-paragraph synopsis** — what the app does, what hardware it + targets. +2. **Source application** — original repo URL, commit hash from + `git rev-parse HEAD`, brief description of the source app's behavior. +3. **Prerequisites** — DL Streamer install, drivers (GPU/NPU), required + models, any system packages, and the **full env-setup recipe** + reproduced verbatim from + [`conversion-bootstrap.md`](./conversion-bootstrap.md) §4.1. The README + MUST include all four steps of that recipe (install detection, + `set +u` / `source setup_dls_env.sh` / `set -u`, prepending + `Release/lib` to both `GST_PLUGIN_PATH` and `LD_LIBRARY_PATH` plus + `opencv/lib`, and `GST_PLUGIN_FEATURE_RANK=kmssink:NONE,...`) — not + only the bare `source setup_dls_env.sh` line, which on its own does + not produce a working environment on most installs. +4. **Build** + + ```bash + cmake -S . -B build && cmake --build build -j$(nproc) + ``` + + This MUST be the exact command line documented in the README — it matches + the build invocation enforced by + [`validation-protocol.md`](./validation-protocol.md) §2 and the + [`final-audit-checklist.md`](./final-audit-checklist.md). Do not document a + different recipe (e.g. `mkdir build && cd build && cmake .. && make`) — + that would diverge from what the validation step actually runs. + +5. **Run** — exact command line with all arguments and a working example + invocation, e.g. + + ```bash + ./build/ --input videos/sample.mp4 --model models/.xml --device GPU + ``` + + Also document the `run.sh` wrapper invocations: + - `./run.sh` (defaults) + - `./run.sh --sink display|file|fake` + - `./run.sh --help` + +6. **Command-Line Arguments** — table describing every CLI flag of the binary + AND every flag/env var of the `run.sh` wrapper. Columns: name, type, + default, description. +7. **Expected Output** — what the user should see (annotated video, JSON file, + console logs, FPS counter line). +8. **Pipeline Comparison** — Mermaid diagram (spec in §2 below). +9. **Conversion Notes** — sub-sections detailed in §3 below. +10. **Observed Output** — clean-shell test results (spec in §4 below). + +> **Section-numbering convention.** The `§N.M` headings inside §2, §3, and §4 +> below number sub-rules **within this spec**, not the section numbers in the +> generated README. In the README, every numbered sub-section MUST be +> renumbered to match its parent README section from §1: e.g. the Pipeline +> Comparison sub-headings become `### 8.1`, `### 8.2`, `### 8.3`; the +> Conversion Notes sub-headings become `### 9.1` … `### 9.9`. + +## 2. Pipeline Comparison diagram (mandatory) + +Two Mermaid flowcharts that show the converted (DL Streamer) and the reference +(source) pipelines. The diagrams go in the README under a dedicated +`## 8. Pipeline Comparison` section (per the §1 ordering), **placed before** +`## 9. Conversion Notes`. + +### 2.1 Layout rules (MANDATORY) + +These layout rules exist so the diagrams render legibly on GitHub (which has +a fixed Mermaid viewport width) and so the two pipelines can be compared by +eye: + +1. **Two separate ```` ```mermaid ```` blocks**, not one block with two + `subgraph`s side-by-side. GitHub's Mermaid renderer auto-lays-out sibling + subgraphs **horizontally**, which produces unreadably small nodes on real + pipelines. Two top-level blocks are stacked vertically by the surrounding + Markdown — the desired layout. +2. **DL Streamer block first**, **DeepStream / source block second**. The + converted target is the primary subject of the document; the source is the + reference. Use sub-section headings `### 8.1 DL Streamer / OpenVINO (this + port)` and `### 8.2 (source)`. +3. **Each block uses `flowchart LR`** so element flow is **left → right**. + Do **not** use `flowchart TB` for either block. +4. **Group elements into numbered logical stage boxes** (`subgraph S1["S1 — + …"]`, …`Sn`). The stage boxes MUST be identical and identically numbered + in both diagrams so that "stage N in DLS" ↔ "stage N in source" is + unambiguous. Recommended canonical stages for a vision pipeline (omit any + that do not apply to the specific app): + + | Stage | Purpose | + |---|---| + | **S1** | Source / decode | + | **S2** | Primary detection (PGIE) | + | **S3** | Tracking | + | **S4** | Secondary detection (SGIE0) | + | **S5** | Classification / OCR / SGIE1 | + | **S6** | Overlay / metrics / OSD | + | **S7** | Encode / sink | + +5. **Add a Markdown stage-by-stage mapping table** (sub-section `### 8.3 + Stage-by-stage mapping` in the README) immediately after the second diagram. One row per + stage box, columns: `Stage | DLS element(s) | Source element(s) | Notes`. + This table is the textual counterpart of the diagrams and must always + agree with them. +6. Place the **colour legend** in a separate Markdown table beneath the + stage-mapping table, not as Mermaid nodes inside either diagram (legend + nodes inflate the viewport and shrink the real pipeline). + +### 2.2 Colour coding + +Use colours to classify each node. Apply colours by `classDef` + `class …` +statements at the end of each Mermaid block (not inline `style` directives, +which fight with `classDef`): + +- **Green** (`fill:#1f7a1f,color:#fff,stroke:#0a3`) — functionally equivalent + stages present in both pipelines. +- **Red** (`fill:#9a2222,color:#fff,stroke:#c33`) — stages present in the + reference pipeline but absent (N/A) in the converted pipeline, or vice + versa. Only appears in the diagram that actually contains the element. +- **Orange** (`fill:#a55a00,color:#fff,stroke:#c70`) — stages that exist in + both pipelines but differ significantly in implementation (e.g. different + model architecture, different inference region, merged functionality). +- **Blue** (`fill:#1864ab,color:#fff,stroke:#36b`) — stages that are new in + the converted pipeline with no direct counterpart in the reference + (e.g. `vapostproc` for VA → system-memory transfer). + +### 2.3 Required structure inside each diagram + +- Each leaf node label MUST name the **concrete element/plugin** and, where + meaningful, the model file and the inference region (e.g. + `gvadetect
vehicle-detection-0200
full-frame`). +- Connect nodes with arrows (`-->`) showing data flow **between** stage boxes + as well as inside multi-element stage boxes (e.g. + `vapostproc --> gvawatermark --> gvafpscounter` inside `S6`). +- Both diagrams MUST cover the same stage set (boxes with no element in one + pipeline may either be omitted or rendered with a single placeholder node + classed `red`). + +### 2.4 Anti-patterns (do NOT do these) + +- ❌ A single ```` ```mermaid ```` block with `flowchart TB` and two sibling + subgraphs that each set `direction LR` — GitHub renders the two subgraphs + horizontally, defeating rule 1. +- ❌ Subgraphs with `direction TB` inside a `flowchart LR` outer — element + flow becomes top-to-bottom inside each block, violating rule 3. +- ❌ Mixing the colour legend with pipeline nodes inside the Mermaid block. +- ❌ Element nodes without the concrete element/plugin name (e.g. labelling a + node simply "OCR" instead of `gvaclassify
ch_PP-OCRv4_rec_infer`). +- ❌ Reordering the two diagrams so the source pipeline appears first — the + converted target is always primary. + +### 2.5 Minimal skeleton + +````markdown +## 8. Pipeline comparison + +Both pipelines flow **left → right**. The two diagrams are rendered as +separate Mermaid blocks stacked vertically (DL Streamer on top — the converted +target — source pipeline below as the reference). Each element is grouped +into a numbered **logical stage box** (S1 … Sn); identical stage numbers in +the two diagrams identify functional counterparts. + +### 8.1 DL Streamer / OpenVINO (this port) + +```mermaid +flowchart LR + subgraph S1["S1 — Source / decode"] + L1[uridecodebin3] + end + subgraph S2["S2 — Vehicle detection"] + L2["gvadetect
vehicle-detection-0200
full-frame"] + end + %% …S3 … Sn… + L1 --> L2 + %% … + classDef green fill:#1f7a1f,color:#fff,stroke:#0a3,stroke-width:1px + classDef blue fill:#1864ab,color:#fff,stroke:#36b,stroke-width:1px + class L1,L2 green +``` + +### 8.2 NVIDIA DeepStream (source) + +```mermaid +flowchart LR + subgraph S1["S1 — Source / decode"] + DS1[uridecodebin] --> DS2[nvstreammux] + end + subgraph S2["S2 — Vehicle detection"] + DS3["PGIE
trafficcamnet"] + end + %% …S3 … Sn… + DS2 --> DS3 + %% … + classDef green fill:#1f7a1f,color:#fff,stroke:#0a3,stroke-width:1px + classDef red fill:#9a2222,color:#fff,stroke:#c33,stroke-width:1px + class DS1,DS3 green + class DS2 red +``` + +### 8.3 Stage-by-stage mapping + +| Stage | DL Streamer (this port) | (source) | Notes | +|---|---|---|---| +| **S1** Source / decode | `uridecodebin3 caps=video/x-raw(ANY)` | `uridecodebin` → `nvstreammux` | DLS has no muxer — single-stream pipeline. | +| **S2** Vehicle detection | `gvadetect vehicle-detection-0200` (full-frame) | PGIE `trafficcamnet` | OpenVINO SSD replaces TAO model. | +| … | … | … | … | + +**Legend** + +| Colour | Meaning | +|---|---| +| 🟩 **green** | Functional equivalent present in both pipelines. | +| 🟧 **orange** | Element retained but behaviour differs (scope, region, etc.). | +| 🟦 **blue** | New DLS-only element required by the Intel stack. | +| 🟥 **red** | Source-only element with no DLS counterpart — intentionally omitted. | +```` + +## 3. Conversion Notes — required sub-sections + +The `## 9. Conversion Notes` section MUST contain the following sub-sections +(any that do not apply may be marked "N/A — "). In the README, the +sub-section numbers below MUST be re-prefixed with `9.` to match the parent +section number from §1 — e.g. `### 9.1 Model substitutions`, +`### 9.2 Label file indexing`, … `### 9.9 Open issues / TODOs`. + +### 3.1 Model substitutions + +Table listing every model used by the source app and its Intel-compatible +replacement. Columns: + +| Original model | Framework | Task | Replacement | Source URL | License | Character set / language | Precision | Equivalence rationale | Trade-offs | + +The `Character set / language` column is **mandatory for OCR / text-recognition +models** (LPR, scene text, etc.). State explicitly: dictionary size, languages +supported (e.g. "English (A–Z, 0–9, CTC blank)"), and a note if the model is +**not** suited for inputs outside that language. See +[`model-sourcing.md`](./model-sourcing.md) for the rules. + +### 3.2 Label file indexing + +For every SSD-style detection model, document the verified mapping: + +``` +class_id 0 = background → labels line 0 = "background" +class_id 1 = vehicle → labels line 1 = "vehicle" +class_id 2 = license-plate → labels line 2 = "license-plate" +``` + +### 3.3 Inference mode selection + +For every secondary detector that was a candidate for `inference-region=roi-list`, +record the A/B test result: + +- Detection counts (full-frame vs. roi-list). +- Maximum confidence in each mode. +- Chosen mode and rationale. + +See [`pipeline-implementation.md`](./pipeline-implementation.md) §3. + +### 3.4 Inference device choice + +Why `GPU` / `CPU` / `NPU` was chosen as the default, what fallback is wired in, +and how the user can override it. + +### 3.5 Hardware encoder fallback + +The runtime encoder priority list used by the C++ code +(`vah264enc` → `vah264lpenc` → `qsvh264enc` → `openh264enc`), and which +encoder was actually selected during the validation runs. + +### 3.6 Deprecation discovery (run NNN) + +Table built from the per-run deprecation scan (see +[`deprecation-discovery.md`](./deprecation-discovery.md)): + +| Deprecated symbol/element/file/API | Source file:line | Documented replacement | + +### 3.7 Deprecated API usage justification (only if applicable) + +If a deprecated API was the only viable path, document for each occurrence: + +- Exact API used. +- Deprecation source file + line found in the discovery scan. +- Replacement that was attempted and the concrete reason it failed. +- Code call site reference (file:line) where the matching `TODO:` comment + lives. + +### 3.8 Excluded features (from the source app) + +List any feature dropped per the *Scope → Exclude* rules (GUI frameworks, +cloud SDKs, CUDA-only extensions, training code) and what — if anything — +replaces it. + +### 3.9 Open issues / TODOs + +Anything intentionally left unfinished, with a pointer to the relevant +`TODO:` comments in the code. + +## 4. Observed Output — clean-shell test results + +For **each** of the five clean-shell invocations from +[`validation-protocol.md`](./validation-protocol.md) §3a, fill the following +table. Columns: exact clean-shell command line (`env -i HOME="$HOME" PATH=… +bash -lc 'cd && ./run.sh '`), exit code, FPS / artifact +snippet (the final `FpsCounter(overall …)` line is ideal), and any pitfall +from [`runsh-pitfalls.md`](./runsh-pitfalls.md) that was actually triggered +(with the fix applied). + +| Invocation | Exit code | FPS / artifact | Pitfall triggered (if any) | +|---|---|---|---| +| `./run.sh --help` | 0 | usage printed | — | +| `./run.sh` (defaults) | 0 | 55 fps, lpr_output.mp4 | — | +| `./run.sh --sink fake` | 0 | 172 fps | — | +| `./run.sh --sink file` | 0 | lpr_output.mp4 (1.2 MB) | — | +| `./run.sh --sink invalid` | 2 | "ERROR: unknown sink: invalid" | — | + +## 5. Source traceability comments (in the C++ code, not in README) + +Every logically distinct section of the converted C++ code MUST carry a +`/* --- Ref: : --- */` comment +that traces it back to the reference application. Full rule and examples in +[`pipeline-implementation.md`](./pipeline-implementation.md) §9. This is +enforced at code-review time, not in the README. diff --git a/.github/skills/dlstreamer-coding-agent/convert-app/final-audit-checklist.md b/.github/skills/dlstreamer-coding-agent/convert-app/final-audit-checklist.md new file mode 100644 index 000000000..13dbb5686 --- /dev/null +++ b/.github/skills/dlstreamer-coding-agent/convert-app/final-audit-checklist.md @@ -0,0 +1,77 @@ +# Final Requirements Compliance Audit + +Used as the **last step before reporting completion** in +[`convert-app.prompt.md`](../../../prompts/convert-app.prompt.md). + +The agent MUST walk through the checklist below and verify that every +requirement has been fulfilled. For each item, confirm with a concrete +artifact or test result — not just "I think I did this". If any item fails, +fix it and re-verify before proceeding. + +The agent MUST emit the filled checklist (with `[x]` for pass, `[ ]` for +fail, and a one-line note per item) in the final user-facing summary message. +Any remaining `[ ]` items block completion. + +## Deliverables + +- [ ] `CMakeLists.txt` exists and builds without errors (`cmake --build` exit 0). +- [ ] Source files (`.cpp` / `.h`) exist and compile cleanly. +- [ ] `README.md` exists and is fully compliant with + [`documentation-spec.md`](./documentation-spec.md) §1 (required + sections) and §3 (Conversion Notes sub-sections). +- [ ] `run.sh` exists, is executable (`-x`), and meets every requirement in + [`cmake-and-deliverables.md`](./cmake-and-deliverables.md) §"4. `run.sh`" + (incl. `--help`, `--sink display|file|fake` selector + invalid-value + error, headless auto-detection, pre-flight resource audit) AND wires + the full env-setup recipe from + [`conversion-bootstrap.md`](./conversion-bootstrap.md) §4.1 verbatim + with `gst-inspect-1.0 OK:` for every GVA element the pipeline uses + (§4.2). +- [ ] `export_models.sh` (if applicable) exists, is executable, and downloads + models successfully when the models target directory is empty. + +## Resource integrity + +- [ ] Default input video path resolves to an existing file (no broken symlinks). +- [ ] All model files (`*.xml` + `*.bin`) exist at the default models path. +- [ ] Any label files, dictionaries, or configs referenced at runtime exist. + +## Clean-shell runs (from [`validation-protocol.md`](./validation-protocol.md) §3a) + +- [ ] `env -i … ./run.sh --help` → exit 0, usage printed. +- [ ] `env -i … ./run.sh` (defaults) → exit 0, non-zero FPS. +- [ ] `env -i … ./run.sh --sink fake` → exit 0, non-zero FPS. +- [ ] `env -i … ./run.sh --sink file` → exit 0, output file created. +- [ ] `env -i … ./run.sh --sink invalid` → exit non-zero, error message. + +## Deprecated API compliance + +- [ ] Dynamic deprecation scan returns empty (no deprecated constructs in + output). See [`validation-protocol.md`](./validation-protocol.md) §6. + +## Functional coverage + +- [ ] Every functional block from the source-app inventory has a **dedicated, + separate** counterpart element in the converted pipeline. No stages have + been merged or dropped — the converted pipeline has the same number of + inference stages as the original. +- [ ] The visual output of the converted app matches the source in kind: same + categories of bounding boxes, labels, overlays, and counters are + rendered (though model accuracy may differ). For example, if the + original draws vehicle bounding boxes AND plate bounding boxes, the + converted app MUST also draw both. + +## Documentation completeness + +See [`documentation-spec.md`](./documentation-spec.md) for the full README +specification. Confirm with one checkbox: + +- [ ] README is fully compliant with `documentation-spec.md` §1 (required + sections), §2 (Pipeline Comparison diagram), §3 (Conversion Notes + sub-sections, incl. model substitutions table with + `Character set / language` and `Domain match` columns where + applicable), and §4 (Observed Output for all five clean-shell + invocations). Source traceability comments + (`/* --- Ref: … --- */`) are present in every logically distinct C++ + section per + [`pipeline-implementation.md`](./pipeline-implementation.md) §9. diff --git a/.github/skills/dlstreamer-coding-agent/convert-app/model-sourcing.md b/.github/skills/dlstreamer-coding-agent/convert-app/model-sourcing.md new file mode 100644 index 000000000..398f6eaa8 --- /dev/null +++ b/.github/skills/dlstreamer-coding-agent/convert-app/model-sourcing.md @@ -0,0 +1,228 @@ +# Model Sourcing & Conversion + +Used during **planning (step 4)** of [`convert-app.prompt.md`](../../../prompts/convert-app.prompt.md). + +Convert all AI models to OpenVINO IR format **before** implementing the C++ +pipeline. + +## Precision preference + +Prefer **FP16** over FP32 for all models unless the model explicitly requires +FP32 precision (e.g. output tensors with integer indices that lose meaning under +half-precision quantization, or documented accuracy degradation). FP16 halves +memory bandwidth, enables GPU inference at full throughput on Intel GPUs (which +natively execute FP16), and reduces model load time — with negligible accuracy +loss for the vast majority of detection and classification models. + +When both FP16 and FP32 variants are available (e.g. from `omz_downloader` or +HuggingFace), always select FP16. Document any exception (model kept at FP32) +with a one-line justification in the README's model substitution table. + +For detailed `ovc` usage, framework-specific recipes (ONNX, PyTorch, TensorRT, +PaddlePaddle, HuggingFace), precision options (FP32 / FP16 / INT8), and +ready-to-use export script templates, follow the +[Model Preparation Reference](../references/model-preparation.md). + +## Model Sourcing Strategy + +The agent MUST find an Intel-compatible model for **every** model used by the +source application. Apply this strategy in order: + +1. **Direct conversion** — if the original model is available in an open + framework format (ONNX, PyTorch `.pt`, TensorFlow `.pb`, PaddlePaddle), + convert it directly with `ovc`. Preserve original weights and architecture. + +2. **Functional equivalent from a curated source** — if the original model + cannot be converted (encrypted, proprietary, no open weights), search for a + **functionally equivalent** pre-converted model in this priority order (skip + any source flagged by the per-run deprecation discovery): + 1. [Hugging Face Hub](https://huggingface.co/) — use `optimum-cli export openvino` + 2. [Ultralytics models](https://github.com/ultralytics/ultralytics) — YOLO family, use `download_ultralytics_models.py` + 3. DL Streamer sample models — see `samples/download_public_models.sh` + +3. **Document the substitution** — in the README, list every model used by the + source app and the chosen Intel-compatible replacement, including: + - Original model name + framework + task + - Replacement model name + source URL + license + - Functional equivalence rationale (same task, similar accuracy class, same + input/output schema) + - Any accuracy or capability trade-offs the user should be aware of + +## Detector model — domain alignment with input content (mandatory) + +Functional task match (e.g. "this is a license-plate detector") is **not +sufficient** when picking a detector model. The agent MUST also verify that +the model's **training-domain distribution** matches the input video the +converted app will actually receive. A model trained on a narrow scenario +(e.g. frontal barrier/toll-booth view) silently fails on out-of-distribution +content (parking surveillance, dashcam, traffic) — the pipeline runs, reports +detections, even draws bounding boxes for the vehicles it does find, but +recall on the secondary task collapses (typical observation: <10 % vs the +trained scenario's >80 %). + +This is **not the same** as the model–inference-mode A/B test in +[`pipeline-implementation.md` §3](./pipeline-implementation.md) (which only +decides between `full-frame` and `roi-list`). Domain mismatch cannot be fixed +by toggling `inference-region` — both modes degrade together because the input +distribution itself is out-of-domain. + +### Discovery procedure (apply before committing to a detector model) + +1. **Read the model card / OMZ description** for any scenario-restrictive + wording. Reject defaults if the description contains any of: + - `"barrier"`, `"gate"`, `"toll"`, `"checkpoint"` — frontal close-up only + - `"surveillance"`, `"ceiling"`, `"top-down"` — high mounting angle only + - `"dashcam"`, `"in-vehicle"`, `"driver-facing"` — vehicle-mounted POV only + - `"document"`, `"scan"` — paginated/scanned input only + + Cross-check against the input video's actual scenario. A model labelled + e.g. `"optimized for license-plate recognition at a barrier"` is the + wrong default for an out-of-domain input clip (e.g. a parking-lot + surveillance recording). + +2. **Check the model's input resolution vs the input video resolution.** For + a detector with input 300×300 used full-frame on 1920×1080, every object + smaller than ~50×50 px in the source becomes ≤ 8×8 px at the network input + — below the minimum receptive field of most SSD/YOLO detection heads. + Concretely: target object pixel size on the input network MUST be ≥ 16×16 + for SSD-MobileNet, ≥ 20×20 for YOLOv5/8 nano. If not, either: + - Pick a higher-input-resolution model (e.g. YOLO at 640×640), OR + - Switch to a true cropped-cascade in a probe (extract upstream ROI, + up-scale, re-inject as appsrc) — document the added complexity. + +3. **Run a quick recall sanity check** at a deliberately low threshold + (≥ 0.01) on a representative input clip: + ```bash + gst-launch-1.0 -q filesrc location= ! decodebin ! videoconvert ! \ + gvadetect model= threshold=0.01 ! \ + gvametaconvert format=json ! \ + gvametapublish file-format=json-lines file-path=/tmp/probe.jsonl ! \ + fakesink + # Bucket by confidence + python3 -c "import json,collections; b=collections.Counter(); \ + [b.update([round(o['detection']['confidence'],1)]) for l in open('/tmp/probe.jsonl') \ + if l.startswith('{') for o in json.loads(l).get('objects',[]) \ + if o['detection']['label_id']==]; \ + print(sorted(b.items()))" + ``` + **Decision rule** — if ≥ 70 % of detections fall in the `[0.0, 0.1)` bucket + AND fewer than 5 % land at `≥ 0.5`, the model is out-of-domain for this + input. Pick a different model before continuing. + +4. **Preferred fallbacks for detector-domain mismatch** (priority order): + 1. **Generic high-recall detector that already covers the target class as + a sub-class.** For any detector whose source-app role is to find a + broad super-category (e.g. "vehicle", "person", "animal"), the first + fallback is a public detector trained on a large, diverse, multi-class + dataset that includes the target class (e.g. a COCO-trained YOLO + variant). Map the relevant fine-grained sub-classes to the source + app's umbrella label via the `labels-file` (e.g. for a "vehicle" + umbrella, remap COCO classes `car` / `bus` / `truck` → `vehicle`). + This is almost always a better default than a scenario-locked detector + whose training set was restricted to a single mounting angle / camera + view. + 2. **Task-specific community detectors** from curated hubs (e.g. Hugging + Face), exportable via `optimum-cli export openvino`, that were trained + on mixed-angle / mixed-scale datasets. + 3. **Public fine-tunes** of the same architecture family on broader + public datasets — e.g. for YOLO variants, `download_ultralytics_models.py` + then `ovc`. + 4. **Original source-app model** if it is the only domain-matching option + AND it is exportable to ONNX/OpenVINO — preserve the original weights + with `ovc`, only swap the inference runtime. + + **Mandatory A/B before defaulting to a narrow-domain model** — whenever + any of the scenario-restrictive keywords listed in step 1 of the + *Discovery procedure* above (e.g. `barrier`, `gate`, `toll`, + `surveillance`, `dashcam`, …) appears in the source app's chosen + detector's name or description, the agent MUST run the confidence-bucket + sanity check from step 3 of the *Discovery procedure* on **both** the + source's default model AND the generic fallback from priority 1, and pick + the higher-recall option. Record both numbers in the README's model + substitution table. + +5. **README documentation requirement** — when the chosen model's training + domain does not fully match the input video, the README's model + substitution table MUST include a `Domain match` column with one of + `match` / `partial` / `mismatch (accepted)`. For `partial` and + `mismatch (accepted)`, the Observed Output section MUST include the + confidence-bucket histogram from step 3 and the *Conversion Notes* MUST + list at least one suggested alternative model from step 4 that would + close the gap. Column semantics and table layout are specified in + [`documentation-spec.md`](./documentation-spec.md) §3.1. + +## Default language for OCR / text-recognition models + +For any model that performs **OCR, text recognition, license-plate recognition +(LPR), scene-text recognition, or any other character-classification task with +a language-specific character set**, the default target language is **English +(Latin alphabet, ASCII letters A–Z + digits 0–9)** — unless the user explicitly +requests a different language. + +This rule overrides naïve "closest match by task" model selection. Concretely: + +1. **Reject models trained on non-Latin character sets by default.** For + example, `license-plate-recognition-barrier-0007` from Intel OMZ is trained + on **Chinese** plates (its dictionary contains 31 Chinese province name tags + like ``, `` in addition to Latin letters, and the model + architecturally expects a Chinese province prefix in the output). It will + produce nonsense (hallucinated Chinese tags) on European, US, or other + Latin-alphabet plates. Do NOT use it as the default LPR model just because + it is the only Intel-curated LPR model — pick a Latin-alphabet alternative + instead. + +2. **Inspect every candidate OCR model's character dictionary before selecting + it.** The dictionary file (whether shipped with the model, embedded in + `model-proc`, or documented on the model's source page) is the authoritative + source of truth about which languages/scripts the model can produce. If the + dictionary contains non-Latin characters (CJK, Cyrillic, Arabic, Devanagari, + etc.) and there is no way to restrict output to Latin only, pick a different + model. + +3. **Preferred sources for English/Latin OCR models** — apply the + [*Model Sourcing Strategy*](#model-sourcing-strategy) priority list above + (HF Hub via `optimum-cli export openvino`, then Ultralytics / DL Streamer + sample models). For OCR additionally prefer Intel OMZ Latin-only models + (e.g. `text-recognition-0012`, `text-recognition-0014`, + `text-recognition-resnet-fc` — verify the dictionary excludes CJK) and + PaddleOCR English recognizers (via `paddle2onnx` → `ovc`) over + general-purpose hubs when available. + +4. **When the user explicitly requests a different language** (e.g. "use a + Chinese LPR model", "this is for Polish plates with diacritics"), document + the choice in the README's model substitution table together with the + dictionary contents and the explicit user request that justified the + deviation from the default. + +README documentation of the OCR model's character set / language is +mandatory and specified in +[`documentation-spec.md`](./documentation-spec.md) §3.1 +(`Character set / language` column). + +## Inference Device Mapping + +When mapping inference device from the source app: + +| Source | DL Streamer / OpenVINO | +|-------------------------|------------------------| +| `cuda` / `gpu` (NVIDIA) | `GPU` (Intel) | +| `cpu` | `CPU` | +| — | `NPU` (if available) | + +For DeepStream-specific element mapping and conversion examples, see the +[Converting DeepStream to DL Streamer Guide](../../../../docs/user-guide/dev_guide/converting_deepstream_to_dlstreamer.md). +For mixed NVIDIA + Intel hardware deployments, see +[DL Streamer and DeepStream Coexistence](../../../../docs/user-guide/dev_guide/dlstreamer-deepstream-coexistence.md). + +## Blockers requiring user action + +Stop and ask before proceeding when: + +- **Encrypted model files** (e.g. NVIDIA TAO `.etlt`, TensorRT `.engine` from + unknown source) — first attempt to find a functional equivalent (see + *Model Sourcing Strategy*). If no equivalent exists in any curated source, + request the unencrypted ONNX export or original framework weights from the + user. +- **Custom CUDA kernels or proprietary closed-source plugins without an + OpenVINO equivalent** — document the gap and ask the user how to proceed + (re-implement in OpenVINO / drop the feature / keep as a TODO). diff --git a/.github/skills/dlstreamer-coding-agent/convert-app/pipeline-implementation.md b/.github/skills/dlstreamer-coding-agent/convert-app/pipeline-implementation.md new file mode 100644 index 000000000..42ff3e34c --- /dev/null +++ b/.github/skills/dlstreamer-coding-agent/convert-app/pipeline-implementation.md @@ -0,0 +1,449 @@ +# Pipeline Implementation — Verification, Patterns, Pitfalls + +Used by **step 5 (Implement)** of [`convert-app.prompt.md`](../../../prompts/convert-app.prompt.md). + +This reference collects all in-code checks and patterns the agent must apply +while writing the converted C++ application. + +## 1. Pipeline element availability verification (after EVERY modification) + +Every time the pipeline is modified — whether during initial implementation, +debugging, fixing negotiation errors, adding/removing sink paths, or any other +change — re-verify that **every** GStreamer element referenced in the pipeline +actually exists in the current runtime. Applies to **both** styles: + +- **String-based** (`gst_parse_launch`): extract every element name from the + pipeline string. +- **Programmatic** (`gst_element_factory_make` / `gst_element_factory_create`): + check every factory name passed to `make`/`create` calls. + +```bash +# For every element in the pipeline, verify it exists: +for elem in vapostproc gvawatermark videoconvert gvafpscounter autovideosink vah264enc; do + gst-inspect-1.0 "$elem" >/dev/null 2>&1 && echo "OK: $elem" || echo "MISSING: $elem" +done +``` + +For programmatic pipelines, additionally grep the C++ source: + +```bash +grep -oP 'gst_element_factory_make\s*\(\s*"([^"]+)"' *.cpp | \ + sed 's/.*"\(.*\)"/\1/' | sort -u | \ + while read elem; do + gst-inspect-1.0 "$elem" >/dev/null 2>&1 && echo "OK: $elem" || echo "MISSING: $elem" + done +``` + +A pipeline that compiles in C++ but references a non-existent element fails +only at runtime with an opaque `Failed to create pipeline: no element "..."` +or a NULL return from `gst_element_factory_make`. Utility elements like +`vapostproc`, `vah264enc`, `x264enc`, encoders, and format converters are +common sources. If anything is missing, find an alternative +(`gst-inspect-1.0 --list` or `gst-inspect-1.0 | grep `) and update +the pipeline before proceeding. **Never assume an element exists based on +documentation alone — always confirm against the live registry.** + +## 2. Metadata flow verification between pipeline stages + +Every time the pipeline is modified, verify that GVA inference metadata +propagates correctly from the first inference element to the visualization / +output element. Metadata can silently stop flowing when: + +- A `videoconvert` or `capsfilter` is inserted at a position that triggers a + buffer copy without metadata preservation, or an element between inference + and watermark fails to forward `GstVideoRegionOfInterestMeta`. +- An element that operates on ROI metadata (e.g. `gvainference` with + `object-class=license-plate`) receives ROIs with wrong labels due to a + labels file mismatch. +- A probe is attached at a point in the pipeline where upstream inference + metadata has not yet been attached to the buffer. + +### Verification procedure (after every pipeline change) + +1. **Stage-by-stage detection count audit** — run with `--sink fake` and verify + each stage of the cascade reports the expected number of detections. For + an N-stage cascade, all N counts must be non-zero. The wrapper / probe + code MUST print one log line per stage (named after the stage's role) + that can be grep'd from stdout, e.g.: + + ```bash + ./run.sh --sink fake 2>&1 | grep -E "||" + ``` + + If any downstream stage shows zero detections while the upstream shows + non-zero, metadata is not flowing between those stages. Common causes: + wrong `object-class` filter, labels file mismatch, missing ROI metadata. + +2. **Visual output spot-check** — run with `--sink file`, extract a frame + known to contain detections, and verify that **every expected overlay type** + is present (bounding boxes, labels, classification text, tracking IDs). If + bboxes appear but classification/OCR text is missing, the most likely cause + is `videoconvert` dropping `GstVideoRegionOfInterestMeta.params` — see §6 + (two-probe architecture). If no overlays appear at all, `gvawatermark` is + likely in transparent mode — see §5. + +3. **Semantic sanity check** — verify detection labels match expected + semantics by comparing each detection's bounding-box dimensions against + the size of the object class it is supposed to represent. If the pipeline + emits an ROI whose bbox dimensions clearly do not fit the semantic class + (e.g. an ROI labelled as a small object but covering most of the frame), + the labels file indexing is wrong (§4). Run the model on a single frame + via OpenVINO Python API to confirm `class_id` values align with the + labels file. + +4. **Emitted-label spot dump (MANDATORY before relying on `object-class=` or + `roi.label()` string compares)** — OpenVINO IR models can ship with an + embedded `` + block. `gvadetect` may use that block as the source of class names and + **the external `labels-file` may be ignored or merged** depending on the + model_type and the DL Streamer version. The effective ROI label is + therefore not predictable from the `labels-file` alone. + + Before wiring `object-class=` on a downstream element OR comparing + `roi.label()` in a probe, the agent MUST dump the **actual emitted label** + for the first frames containing a detection: + + ```bash + gst-launch-1.0 -q filesrc location= ! decodebin ! videoconvert ! \ + gvadetect model= labels-file= threshold=0.1 ! \ + gvametaconvert format=json ! gvametapublish file-format=json-lines \ + file-path=/tmp/lab.jsonl ! fakesink + python3 -c "import json,collections; c=collections.Counter(); \ + [c.update([o['detection'].get('label','?')]) for l in open('/tmp/lab.jsonl') \ + if l.startswith('{') for o in json.loads(l).get('objects',[])]; \ + print(c.most_common())" + ``` + + The label strings that appear in the printed `Counter` are exactly the + strings that downstream `gvadetect object-class=` / `gvaclassify + object-class=` filters compare against, and the strings the probe will see + in `roi.label()`. If they do not match the expected vocabulary, adjust the + `labels-file`, change `object-class=` (and probe filters) to the + actually-emitted strings, or insert a labels-rewriting probe. Never assume + the `labels-file` remap took effect without this dump. + +This verification is especially critical for cascade architectures — a single +misalignment can cause the entire downstream chain to produce zero results +while the pipeline reports no errors. + +## 3. Model–inference-mode compatibility validation + +When mapping a secondary detector that runs on ROIs in the source pipeline +(e.g. DeepStream SGIE with `operate-on-gie-id`), do NOT blindly translate it +to `inference-region=roi-list` in DL Streamer. Many models are trained on +full-frame images and suffer catastrophic accuracy loss when applied to +cropped/letterboxed ROIs produced by `roi-list` — in practice this can range +from a small recall drop to a near-total collapse (i.e. the same model and +clip yielding orders of magnitude more detections in full-frame mode than in +roi-list mode). + +Validate for **every** `gvadetect` / `gvainference` element that is a +candidate for `inference-region=roi-list`: + +1. **Run a quick A/B test** with `gst-launch-1.0`. Bare `fakesink` does not + serialise GVA metadata to stdout, so always pipe through + `gvametaconvert ! gvametapublish` (writing JSON-Lines to disk) and count + detections from the output file — never `grep -c` on stderr/stdout from + `fakesink`, which always reports 0. + - **A** — `inference-region=full-frame` with `threshold=0.3` + - **B** — `inference-region=roi-list` with `threshold=0.01` (intentionally + low to catch even weak detections) + + ```bash + # Variant A (full-frame). Re-run with `inference-region=roi-list object-class=` + # and a separate output path (e.g. /tmp/ab_roi.jsonl) for variant B. + gst-launch-1.0 -q ... ! gvadetect model= inference-region=full-frame threshold=0.3 ! \ + gvametaconvert format=json ! \ + gvametapublish file-format=json-lines file-path=/tmp/ab_full.jsonl ! fakesink + # Count detections in either output file (one JSON object per frame; sum object array lengths) + python3 -c "import json,sys; print(sum(len(json.loads(l).get('objects',[])) for l in open(sys.argv[1]) if l.startswith('{')))" /tmp/ab_full.jsonl + ``` + +2. **Decision rule**: + - If `roi-list` count is **≥ 50%** of `full-frame` count AND max confidence + is **≥ 0.3** → use `roi-list` (preserves cascade semantics, avoids false + positives on background). + - Otherwise → use `full-frame` and document the decision with A/B numbers + in the README under `Conversion Notes → Inference mode selection`. + +3. **Metadata impact** — switching from `roi-list` to `full-frame` changes the + metadata topology: detections become top-level `ODMtd` entries (no + `parent_id` / no `CONTAIN` relation to an upstream detector's ROI). The + probe callback MUST be adjusted accordingly — iterate all ODs and match by + the label string the model actually emits (see §2, step 4 of the + verification procedure, for how to discover it) instead of walking + `CONTAIN` relations from a parent object. + +4. **Document** the A/B test results and chosen mode in the README. Include + detection counts, max confidence, and rationale. + +## 4. Label file indexing for SSD DetectionOutput models + +SSD-style models with a `DetectionOutput` layer use `class_id=0` for the +background class. The labels file consumed by `gvadetect` maps line number N +(0-indexed) to `class_id=N`. If the labels file starts with the first real +object class instead of a placeholder for background at index 0, **all class +IDs shift by one** — every detection ends up under the label one row above +its true class, and the last class disappears entirely. The pipeline still +compiles, runs and reports detections, but every label is wrong (semantic +nonsense — e.g. ROIs whose bbox dimensions clearly do not match the assigned +class). + +Verify label-to-class-id alignment for every detection model: + +1. Run the model on a single representative frame using OpenVINO Python API + (`ov.Core().compile_model`) and inspect the raw `class_id` values in the + `DetectionOutput` tensor. +2. Confirm the labels file entry at that index matches the expected semantic + class. +3. If the model uses `class_id=0` for background (standard for SSD, + SSD-MobileNet, and similar), the labels file MUST have `background` (or an + empty line) as its first entry. +4. Record the verified mapping in the README's model substitution table + (e.g. "class_id 0=background, 1=vehicle, 2=license-plate → labels line + 0='background', line 1='vehicle', line 2='license-plate'"). + +## 5. `gvawatermark` rendering — transparent path pitfall + +`gvawatermark` negotiates `ANY` memory type when upstream buffers are not +explicitly constrained to system memory. This causes it to select the +transparent rendering path (path=3), which bypasses all drawing. The pipeline +runs without errors and processes all frames, but the video is visually +identical to the input. + +Debug log shows: `"Transparent path linked (identity bypassed)"`. + +**Fix**: insert `capsfilter caps=video/x-raw,format=BGRx` immediately before +`gvawatermark` to force system-memory caps negotiation. The capsfilter MUST +have `name=pre_watermark` so the label probe (§6) can attach to its src pad: + +``` +videoconvert n-threads=4 ! capsfilter caps=video/x-raw,format=BGRx name=pre_watermark ! gvawatermark ! videoconvert n-threads=4 +``` + +This adds a CPU copy step; expect a measurable FPS reduction (e.g. 170 → 55 +FPS) when GPU inference is used. Verify that `gvawatermark` actually renders +overlays by comparing a frame from the output video against the original input +(pixel diff > 0 in bbox regions). + +## 6. Probe callbacks must write decoded results to ROI metadata + +> **Before writing any manual decoder in a probe**: verify that `gvaclassify` +> does not already post-process this model. DL Streamer's classification +> element ships built-in post-processors for several model families +> (e.g. CTC text-recognition heads, ImageNet-style softmax classifiers, +> regression heads, custom JSON-described outputs). When the embedded +> `rt_info.model_info.model_type` (or a matching `model-proc` file) selects +> such a post-processor, the decoded text/label is **already** present on the +> classification tensor's `label` field and the raw `tensor.data()` +> blob may be **empty** — the raw output is consumed and discarded by the +> post-processor. A probe that ignores `tensor.label()` and tries to run +> its own decoder on the empty blob silently produces zero results. +> +> Verification step before writing a manual decoder: +> +> ```cpp +> // dump first 3 classification tensors raw +> for (auto &t : roi.tensors()) { +> if (t.is_detection()) continue; +> const GstStructure *s = t.gst_structure(); +> g_print("[T] %s\n", gst_structure_to_string(s)); +> } +> ``` +> +> If the dump already contains a populated `label=(string)…` field, **do not +> write a manual decoder** — read `tensor.label()` directly. Only when the +> dump shows `label=""` AND a non-empty `data` blob should the probe perform +> the decode itself. + +If a pad probe decodes inference results (e.g. OCR text from an LPR model, +classification labels), printing to console via `g_print()` is **not +sufficient** for visual output. `gvawatermark` renders only metadata attached +to GVA `RegionOfInterest` objects — specifically, it reads `tensor.label()` +from every non-detection tensor in each ROI. If the probe does not write the +decoded result back to the ROI, `gvawatermark` will draw the bounding box but +display no text. + +### CRITICAL: `videoconvert` drops `GstVideoRegionOfInterestMeta.params` + +In GStreamer 1.24+, `videoconvert` copies `GstVideoRegionOfInterestMeta` to the +output buffer (so bounding box coordinates survive), but it does **NOT** copy +the `params` field (`GList`) which stores all tensor data +(detection results, classification labels, custom tensors added via +`roi.add_tensor()`). `GstAnalyticsRelationMeta` (the new analytics API) **does** +survive `videoconvert` fully. Net effect: bounding boxes drawn by +`gvawatermark` appear (bbox comes from analytics meta), but **every** text +label — custom decoded text and even detection-confidence labels — is missing +because `tensor.label()` returns empty. + +### Two-probe architecture (required when `videoconvert` sits between inference and `gvawatermark`) + +1. **Decode probe** — attached to the inference element's **src pad** (before + `videoconvert`). At this point, `params` still contain raw tensor data. + Decode the inference output (e.g. LPR int32 indices → plate text string) + and store in a **global thread-safe store** keyed by `roi.region_id()` + (stable across the copy — comes from `GstAnalyticsODMtd.id`). + +2. **Label probe** — attached to the **capsfilter src pad** (after + `videoconvert`, immediately before `gvawatermark`). Iterate all ROIs on the + buffer, look up stored text by `region_id()`, and re-attach the decoded + result as a fresh `GstStructure` via `roi.add_tensor()`. Since this runs + just before `gvawatermark`, the tensor is present when `preparePrimsForRoi` + iterates `roi.tensors()`. + +```cpp +/* --- Example pattern: a 2-stage detect + decode-in-probe cascade. + * Identifiers below (`lp_text_*`, `decode_output`, `"license-plate"`, + * `int32_t` indices) are illustrative for an OCR-style text-decoding probe + * \u2014 substitute the data type and the target ROI label with whatever the + * concrete pipeline actually emits. + */ +#include +#include + +/* Global store: region_id \u2192 decoded text (populated by decode probe, consumed by label probe) */ +static std::mutex lp_text_mutex; +static std::map lp_text_store; + +/* Probe 1: on inference element src pad (BEFORE videoconvert) */ +static GstPadProbeReturn decode_probe(GstPad *pad, GstPadProbeInfo *info, gpointer) { + GstBuffer *buf = GST_PAD_PROBE_INFO_BUFFER(info); + GstCaps *caps = gst_pad_get_current_caps(pad); + GVA::VideoFrame vf(buf, caps); + gst_caps_unref(caps); + for (auto &roi : vf.regions()) { + if (roi.label() == "") { // e.g. "license-plate" + for (auto tensor : roi.tensors()) { + std::string decoded = decode_output(tensor.data()); + { + std::lock_guard lock(lp_text_mutex); + lp_text_store[roi.region_id()] = decoded; + } + } + } + } + return GST_PAD_PROBE_OK; +} + +/* Probe 2: on capsfilter src pad (AFTER videoconvert, BEFORE gvawatermark) */ +static GstPadProbeReturn label_probe(GstPad *pad, GstPadProbeInfo *info, gpointer) { + GstBuffer *buf = GST_PAD_PROBE_INFO_BUFFER(info); + GstCaps *caps = gst_pad_get_current_caps(pad); + GVA::VideoFrame vf(buf, caps); + gst_caps_unref(caps); + std::lock_guard lock(lp_text_mutex); + for (auto &roi : vf.regions()) { + auto it = lp_text_store.find(roi.region_id()); + if (it != lp_text_store.end()) { + GstStructure *s = gst_structure_new( + "classification_result", + "label", G_TYPE_STRING, it->second.c_str(), + "confidence", G_TYPE_DOUBLE, 1.0, + nullptr); + roi.add_tensor(GVA::Tensor(s)); + } + } + return GST_PAD_PROBE_OK; +} +``` + +### GstStructure naming rules for `gvawatermark` text rendering + +- The structure name MUST NOT be `"detection"` — + `gvawatermarkimpl::preparePrimsForRoi` skips tensors where + `tensor.is_detection()` returns true (name equals `"detection"`). Use + `"classification_result"` or any other name. +- The structure MUST have a `"label"` field of type `G_TYPE_STRING` — this is + what `tensor.label()` reads. +- Optionally include `"confidence"` (`G_TYPE_DOUBLE`). + +The capsfilter before `gvawatermark` MUST have a `name=` property (e.g. +`name=pre_watermark`) so the C++ code can find it via `gst_bin_get_by_name()` +and attach the label probe to its src pad. + +## 7. Model path resolution — absolute canonical paths required + +DL Streamer inference elements (`gvadetect`, `gvainference`, `gvaclassify`) +reject relative paths and may silently fail on symlinks when resolving the +model file. The converted C++ code MUST resolve every model path to an +absolute canonical path via POSIX `realpath()` **before** embedding it in the +pipeline string or passing it to `g_object_set()`: + +```cpp +auto resolve_path = [](const gchar *path) -> std::string { + char *abs = realpath(path, nullptr); + if (!abs) { + g_printerr("ERROR: Cannot resolve path: %s (%s)\n", path, strerror(errno)); + return ""; + } + std::string result(abs); + free(abs); + return result; +}; +std::string model_abs = resolve_path(user_supplied_model_path); +``` + +Apply this to **all** file paths passed to inference elements: `model=`, +`model-proc=`, `labels-file=`. Failure to do so causes opaque +`Failed to set pipeline to PLAYING` or `Model file not found` errors that are +hard to diagnose because the working directory at runtime may differ from the +build directory. + +## 8. Runtime hardware encoder detection + +The converted application MUST NOT hardcode a specific GStreamer encoder +element name (e.g. `vah264enc`, `x264enc`). Encoder availability varies across +DL Streamer installations, driver versions, and hardware: + +- `vah264enc` requires the VA-API plugin (often missing). +- `qsvh264enc` requires Intel Media SDK/QSV. +- Software fallbacks (`openh264enc`, `x264enc`) may not be installed. + +The C++ code MUST probe the GStreamer element registry at runtime and select +the first available encoder from a priority-ordered list: + +```cpp +static std::string find_hw_encoder() { + const char *candidates[] = {"vah264enc", "vah264lpenc", "qsvh264enc", nullptr}; + for (int i = 0; candidates[i]; i++) { + GstElementFactory *f = gst_element_factory_find(candidates[i]); + if (f) { gst_object_unref(f); return candidates[i]; } + } + return "openh264enc"; // software fallback +} +``` + +The chosen encoder MUST be logged at startup so the user can see which path +was taken. Document the encoder priority list and fallback logic in the README +under `Conversion Notes`. + +## 9. Source traceability comments (mandatory) + +Every logically distinct section of the converted C++ code MUST include a +comment that traces it back to the corresponding code in the reference +application. Format: + +``` +/* --- Ref: : --- */ +``` + +Examples: + +- `/* --- Ref: deepstream_lpr_app.c:create_pipeline() L120-L145 — build primary detector (PGIE) --- */` +- `/* --- Ref: deepstream_lpr_app.c:osd_sink_pad_buffer_probe() L55-L90 — iterate batch meta, extract plate text --- */` +- `/* --- Ref: N/A — DL Streamer-specific: vapostproc for GPU→CPU transfer --- */` (for code with no reference counterpart) + +This applies to: pipeline construction, inference element setup, +probe/callback functions, CLI option parsing, bus message handling, and any +helper functions. The goal is that a reader can open the converted code and +the reference code side-by-side and immediately see which parts correspond to +each other. + +## 10. Pipeline construction — preserve all stages (no merging) + +The 1-to-1 element-mapping rule is defined in +[`convert-app.prompt.md`](../../../prompts/convert-app.prompt.md) Step 4 and is +non-negotiable: every source-app inference stage maps to a separate DL +Streamer inference element. Element availability and property names are +verified per §1; deny-list checks are driven from +[`deprecation-discovery.md`](./deprecation-discovery.md). diff --git a/.github/skills/dlstreamer-coding-agent/convert-app/runsh-pitfalls.md b/.github/skills/dlstreamer-coding-agent/convert-app/runsh-pitfalls.md new file mode 100644 index 000000000..c04c9d23a --- /dev/null +++ b/.github/skills/dlstreamer-coding-agent/convert-app/runsh-pitfalls.md @@ -0,0 +1,56 @@ +# `run.sh` Pitfalls — Auto-Fix Checklist + +Used during **validation step 3** in [`validation-protocol.md`](./validation-protocol.md). + +Wire these patterns into `run.sh` from the start — do not wait for them to fail. + +If a clean-shell invocation fails: +1. Read the actual error message from stderr (do **not** guess). +2. Map it to the correct row below. +3. Edit `run.sh` (or the `.cpp` file if the bug is in the binary). +4. Re-run the clean-shell test. +5. Repeat until **all** invocations pass. Fixes for one bug frequently expose + the next — never declare done after a single fix without re-running every + test. + +## Known `run.sh` pitfalls + +| Symptom in stderr | Root cause | Fix in `run.sh` | +|---|---|---| +| `setup_dls_env.sh: line N: : unbound variable` | `set -u` is incompatible with the upstream env script which references unset vars (e.g. `GST_PLUGIN_FEATURE_RANK`) | Wrap the `source` in `set +u` … `source …setup_dls_env.sh` … `set -u`. Same applies to any other 3rd-party script you source. | +| `ERROR: Model not found: /home/…/models/…` with `MODELS_PATH` pointing to the **wrong** directory (e.g. `/home/user/models` instead of the app's local `./models`) | `setup_dls_env.sh` unconditionally exports `MODELS_PATH` (typically `$HOME/models`). If `run.sh` sets `MODELS_PATH` **before** sourcing `setup_dls_env.sh`, the upstream script overwrites it. If set **after** but the user also has `MODELS_PATH` in their env, the app's default is ignored. | **Never use `MODELS_PATH` as the app's own variable** — `setup_dls_env.sh` owns that name. Instead: (1) use a unique, app-specific env var (e.g. `LPR_MODELS_PATH`, `_MODELS_PATH`) for user overrides; (2) default to a path relative to the script directory (`SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"`, then `"${LPR_MODELS_PATH:-$SCRIPT_DIR/models}"`); (3) set the app's model path variable **after** sourcing `setup_dls_env.sh` so it is never clobbered. Pattern: `set +u; source …/setup_dls_env.sh; set -u; APP_MODELS="${LPR_MODELS_PATH:-$SCRIPT_DIR/models}"`. The agent MUST grep `setup_dls_env.sh` for any env var it exports (`grep '^export'`) and avoid reusing those names in `run.sh`. | +| `No such element or plugin 'gvadetect'` even after sourcing `setup_dls_env.sh` | On some builds `libgstvideoanalytics.so` lives under `/opt/intel/dlstreamer/Release/lib/` which `setup_dls_env.sh` does NOT add to `GST_PLUGIN_PATH` | Always (unconditionally) `export GST_PLUGIN_PATH="/opt/intel/dlstreamer/Release/lib:${GST_PLUGIN_PATH:-}"` and the same for `LD_LIBRARY_PATH` (incl. `opencv/lib`) after the source. | +| `setup_dls_env.sh: No such file or directory` | DL Streamer not installed at `/opt/intel/dlstreamer` | Detect the absence and print a clear actionable error pointing to the install guide; exit non-zero. Do not silently continue. | +| `command not found: …` from `set -e` halting on a pipeline | `set -e` + un-guarded optional command | Either guard with `\|\| true` for genuinely optional commands, or fix the missing dependency. | +| `Permission denied` when invoking `./run.sh` | Created without exec bit | `chmod +x run.sh export_models.sh` immediately after creating them. | +| `bash: ./build/: No such file or directory` | Build step skipped or failed silently | `run.sh` must check `[[ -x "$BIN" ]]` and exit with a clear "build first: `cmake -S . -B build && cmake --build build`" message. | +| Pipeline negotiates then aborts with `gst_element_set_state` returning `FAILURE` for an ASYNC pipeline | Some pipelines legitimately return `FAILURE` from `set_state(PLAYING)` for ASYNC transitions; aborting is wrong | In the C++ code, treat `GST_STATE_CHANGE_FAILURE` from `set_state(PLAYING)` as a warning and let the bus deliver the real error; only abort on a `GST_MESSAGE_ERROR` from the bus. | +| `videos/sample.mp4: No such file or directory` (broken symlink) | Test asset symlink points to a path that does not exist on this host | `run.sh` must validate `INPUT` exists with `[[ -f "$INPUT" \|\| "$INPUT" == *"://"* ]]` and print a clear error; verify the symlink resolves before running. | +| `Could not connect to display` from `autovideosink` on a headless host | No `$DISPLAY` / `$WAYLAND_DISPLAY` | `run.sh` must auto-fallback to `--sink=file` when both are unset. | +| `GStreamer error: negotiation problem` / `failed to configure video mode` from `kmssink` inside `autovideosink` | `autovideosink` auto-selects `kmssink` which cannot configure video mode on remote/SSH/multi-GPU setups | Deprioritize `kmssink` by exporting `GST_PLUGIN_FEATURE_RANK="kmssink:NONE,${GST_PLUGIN_FEATURE_RANK:-}"` in `run.sh` before launching the pipeline. This forces `autovideosink` to pick `xvimagesink` or `ximagesink` instead. | +| `Failed to set pipeline to PLAYING` with no further error detail | A default resource path (video, model, config) points to a file that does not exist on the host (missing file, broken symlink, wrong `MODELS_PATH`) | For **every** resource the script references, validate existence **before** launching the pipeline. Print a specific error naming the exact missing path and the variable/default it came from. Example: `[[ -f "$DETECTION_MODEL" ]] \|\| { echo "ERROR: model not found: $DETECTION_MODEL (MODELS_PATH=$MODELS_PATH)" >&2; exit 1; }`. Also verify that symlinks resolve: `readlink -e` or `[[ -e ... ]]`. | +| `Internal data stream error` / `streaming stopped, reason not-linked (-1)` from `qtdemux` | Container (mp4/mkv) has an audio track; `urisourcebin ! decodebin3` or bare `uridecodebin3` exposes an internal audio pad with no downstream consumer, causing `qtdemux` to error on the unlinked stream | Use **`uridecodebin3`** with **`caps="video/x-raw(ANY)"`** — the caps property restricts stream selection to video only. Example: `uridecodebin3 uri=file:///path/to/file.mp4 caps="video/x-raw(ANY)" ! queue ! gvadetect ...`. Do **NOT** use bare `uridecodebin3` without caps — it still fails on files with audio. Test with an input that contains an audio track. | +| `not-linked (-1)` from `qtdemux` or negotiation failure when using `--sink display` or `--sink file` with GPU pipeline (`va-surface-sharing`) | Pipeline outputs buffers in `video/x-raw(memory:VAMemory)` (GPU zero-copy), but downstream elements (`gvawatermark`, `videoconvert`, `autovideosink`, `vah264enc`) require system-memory `video/x-raw`. Without an explicit memory transfer element, caps negotiation fails. | Insert a **GPU→system-memory transfer element** immediately before `gvawatermark` (or before the first system-RAM element) in `display` and `file` sink paths when `device=GPU`. Run `gst-inspect-1.0 vapostproc` — if available, use `vapostproc`. Otherwise use `videoconvert` (handles the transfer implicitly when caps force system memory). Pattern with `vapostproc`: `... ! queue ! vapostproc ! gvawatermark ! videoconvert ! autovideosink`. Pattern without: `... ! queue ! videoconvert n-threads=4 ! capsfilter caps=video/x-raw,format=BGRx ! gvawatermark ! videoconvert n-threads=4 ! autovideosink`. The `fake` sink path does NOT need a transfer element. **Verify the chosen element exists via `gst-inspect-1.0` before using it.** | +| `gvawatermark` is in the pipeline and `transform_ip` is called (visible in `GST_DEBUG=gvawatermarkimpl:5`) but **no bounding boxes or labels appear**; debug log shows `"Transparent path linked (identity bypassed)"` | `gvawatermark` negotiates `ANY` memory type when upstream buffers are not explicitly constrained to system memory. This causes it to select the transparent rendering path (path=3), which bypasses all drawing. | Insert `capsfilter caps=video/x-raw,format=BGRx` immediately before `gvawatermark` to force system-memory caps negotiation. The capsfilter MUST have `name=pre_watermark` so the label probe can attach to its src pad. Pattern: `videoconvert n-threads=4 ! capsfilter caps=video/x-raw,format=BGRx name=pre_watermark ! gvawatermark ! videoconvert n-threads=4`. Adds a CPU copy step; expect FPS reduction (e.g. 170 → 55 FPS) with GPU inference. Verify `gvawatermark` actually renders overlays by comparing output vs. input frames. See also the two-probe architecture for text labels. | +| Bounding boxes appear on the output video but **no text labels** are rendered (no label text next to any bbox); `gvawatermark` CPU render path is active | `videoconvert` (GStreamer 1.24+) copies `GstVideoRegionOfInterestMeta` but does NOT copy its `params` field (`GList`). All tensor data is lost after `videoconvert`. `GstAnalyticsRelationMeta` (bbox + tracking IDs) survives, so `gvawatermark` draws boxes but has no tensors to extract text from. | Use the **two-probe architecture** (see `pipeline-implementation.md` §6): (1) decode probe on inference element src pad (before `videoconvert`) stores text in a global map keyed by `region_id()`; (2) label probe on capsfilter src pad (after `videoconvert`, before `gvawatermark`) re-attaches text as a fresh `GstStructure` via `roi.add_tensor()`. The `region_id()` is stable across `videoconvert` copies. The GstStructure name must NOT be `"detection"` (use `"classification_result"`). | +| `gst_parse_launch` returns `syntax error` when the pipeline string contains inline caps like `"video/x-raw,format=BGRx"` | In C++ string literals passed to `gst_parse_launch`, escaped quotes around caps filters are interpreted as part of the pipeline syntax tokens, causing parse failures. | Use the **`capsfilter` element** instead of inline caps notation: `! capsfilter caps=video/x-raw,format=BGRx !` instead of `! "video/x-raw,format=BGRx" !`. The `capsfilter` element is always safe in `gst_parse_launch` strings and avoids all quoting issues. For programmatic pipelines, use `gst_caps_from_string("video/x-raw,format=BGRx")` with `g_object_set(capsfilter, "caps", caps, NULL)`. | +| `No such element or plugin 'vah264enc'` (or `x264enc`, or any hardcoded encoder) when running the file-output sink path | The pipeline hardcodes a specific encoder element name that does not exist in the current GStreamer registry. Encoder availability varies. | The C++ code MUST detect the available encoder at runtime by probing the GStreamer element registry (see `pipeline-implementation.md` §8). Use a priority list: `vah264enc` → `vah264lpenc` → `qsvh264enc` → `openh264enc`. Log the chosen encoder at startup. The `run.sh` wrapper does NOT need to handle this — encoder selection is the binary's responsibility. | + +## Pre-flight resource audit (mandatory before first run) + +Before running `run.sh` for the first time, verify that every resource the +script and binary depend on actually exists on disk: + +- Default input video path (e.g. `videos/ParkingVideo.mp4`) — file present? + Symlink resolves? +- Model files (`$MODELS_PATH/…/*.xml` and `*.bin`) — do they exist? Is + `MODELS_PATH` set correctly? +- Label files, dictionaries, config files — anything the pipeline reads at + runtime. +- The compiled binary (`build/`) — does it exist and is it + executable? + +If any resource is missing, fix the issue (download models, create correct +symlinks, adjust defaults) **before** running the pipeline. This prevents +opaque `Failed to set pipeline to PLAYING` errors that give no clue about the +root cause. diff --git a/.github/skills/dlstreamer-coding-agent/convert-app/validation-protocol.md b/.github/skills/dlstreamer-coding-agent/convert-app/validation-protocol.md new file mode 100644 index 000000000..e0162cbbb --- /dev/null +++ b/.github/skills/dlstreamer-coding-agent/convert-app/validation-protocol.md @@ -0,0 +1,153 @@ +# Validation Protocol + +Used by **step 6 (Verify correctness)** of [`convert-app.prompt.md`](../../../prompts/convert-app.prompt.md). + +## 1. Capture baseline outputs + +First reproduce the original application's environment (Docker image, +virtualenv, CUDA/driver requirements, command-line invocation) and document +those steps in the README. Then execute the original application with a +representative set of inputs and record its outputs (annotated frames, JSON, +logs, metrics). + +If the original cannot be run (missing hardware, encrypted models, unavailable +SDK), explicitly state this in the README and rely on step 3's correctness +checks as the primary validation signal. + +## 2. Build the converted application — MANDATORY, NON-NEGOTIABLE + +Build the binary: + +```bash +cmake -S . -B build && cmake --build build -j$(nproc) +``` + +Resolve every compilation error before proceeding. Do not hand off a binary +that failed to build, or that has never been built in this run. Direct binary +invocation for correctness/output checks is covered by step 3 (which exercises +the wrapper); do not duplicate runs here. + +## 3. Verify the `run.sh` wrapper — MANDATORY, in a CLEAN SHELL, with auto-fix loop + +Before reporting the conversion as finished, execute `./run.sh` and confirm it +succeeds end-to-end. The check has the following non-negotiable requirements. + +### 3a. Run in a pristine shell + +The agent's own terminal session almost always already has DL Streamer env +vars exported (from previous `source setup_dls_env.sh`, prior pipelines, etc.). +This hides bugs in `run.sh` that only manifest for the end user. Invoke +`run.sh` in an environment that mimics a fresh login shell: + +```bash +env -i HOME="$HOME" PATH="/usr/local/bin:/usr/bin:/bin" TERM="${TERM:-xterm}" \ + bash -lc 'cd && ./run.sh [args]' +``` + +Run this clean-shell test for **every** documented invocation: + +- `./run.sh --help` +- `./run.sh` (defaults) +- `./run.sh --sink fake` +- `./run.sh --sink display` (skip / mark as "requires display" only when no + `$DISPLAY` is available — but still run it in headless mode to confirm the + auto-fallback to `file` works) +- One negative case: `./run.sh --sink ` (must exit non-zero with a + clear error). + +### 3b. Pass criteria + +For each clean-shell invocation: + +- Exit status `0` (or `2` for the negative `--sink` case). +- The documented output artifact exists at the documented location (e.g. + `lpr_output.mp4` for `--sink=file`). +- The pipeline reports a non-zero FPS via `gvafpscounter` (proves it actually + processed frames, did not crash on first buffer). +- **Every detection / classification stage of the cascade reports a non-zero + count over the full input clip.** Non-zero FPS and a non-zero aggregate + end-of-stream count are necessary but **not sufficient** — in a cascade, + the aggregate may be inflated by an earlier stage (e.g. the primary + detector emitting an auxiliary class that shares the umbrella label with + the final stage) while a downstream stage never ran. The wrapper MUST emit + one per-stage counter (one log line per stage, prefixed with the stage + name and showing the per-frame ROI / decoded-result count), and the + validation MUST assert that **each** per-stage counter is non-zero on at + least one frame (e.g. for a 3-stage detect → detect → classify cascade, + `grep -c "" run.log` ≥ 1). +- No `unbound variable`, `command not found`, `No such element`, + `cannot open shared object file`, `Permission denied`, or unhandled `ERROR:` + lines in stderr. + +### 3c. Auto-fix loop + +If any clean-shell invocation fails: + +1. Read the actual error message from stderr (do **not** guess). +2. Map it to the correct fix from [`runsh-pitfalls.md`](./runsh-pitfalls.md). +3. Edit `run.sh` (or, if the bug is in the C++ code, the `.cpp` file) to + apply the fix. +4. Re-run the clean-shell test from 3a. +5. Repeat until **all** invocations from 3a pass. Do not declare the + conversion complete after a single fix without re-running every test — + fixes for one bug frequently expose the next. + +### 3d. Pitfall checklist + +See [`runsh-pitfalls.md`](./runsh-pitfalls.md) for the full table of known +symptoms, root causes, and fixes. Wire these patterns into `run.sh` from the +start — do not wait for them to fail. + +### 3e. Pre-flight resource audit (mandatory before first run) + +Before running `run.sh` for the first time, verify that every resource the +script and binary depend on actually exists on disk. See +[`runsh-pitfalls.md`](./runsh-pitfalls.md) → *Pre-flight resource audit* for +the full list. + +### 3f. Reporting + +Record clean-shell test results in the README's `Observed Output` section per +[`documentation-spec.md`](./documentation-spec.md) §4 (one row per invocation +from 3a: command line, exit code, FPS snippet, pitfall triggered). +## 4. Compare against the baseline if available + +When the original application's outputs were captured in step 1, compare them +to the outputs collected by the wrapper runs in step 3 and document +differences (account for acceptable numerical differences from FP32 → FP16/INT8 +model conversion). If the original could not be run — see §1 — step 3's +correctness checks (meaningful output + non-zero FPS + non-zero per-stage +counters) are the primary validation signal. + +## 5. Benchmark performance + +Use the FPS numbers already captured by the step 3 wrapper runs (sink=file +and sink=fake at minimum). If the original was also run, record any +improvements or regressions side-by-side. Feed findings into the +documentation step. + +## 6. Deprecated-API scan — MANDATORY + +Before declaring the conversion complete, verify that no construct from the +per-run deprecation table (see +[`deprecation-discovery.md`](./deprecation-discovery.md)) appears in any file +generated under `/`. + +Build the regex dynamically from that table (one alternative per discovered +symbol/element/file/API), then run: + +```bash +grep -rnE "" / +``` + +Any non-empty result is a **conversion-blocking defect**. Replace the +offending construct with the documented replacement, re-run all build + +clean-shell `run.sh` tests from steps 2 and 3, and re-scan. Only after the +scan returns empty (or each remaining hit is documented under +`Conversion Notes → Deprecated API usage justification`) may the conversion +be reported as finished. + +## 7. Final requirements compliance audit + +See [`final-audit-checklist.md`](./final-audit-checklist.md) — the agent MUST +walk through that checklist as the last step before reporting completion. diff --git a/docs/user-guide/dev_guide/coding_agent.md b/docs/user-guide/dev_guide/coding_agent.md index 0708931c2..2742408c2 100644 --- a/docs/user-guide/dev_guide/coding_agent.md +++ b/docs/user-guide/dev_guide/coding_agent.md @@ -112,3 +112,56 @@ See the [example prompts in the DL Streamer Coding Agent repository](https://git - **License Plate Recognition** — Detection + OCR pipeline with JSON and annotated video output - **Event-Based Smart NVR** — Person detection with triggered video recording - **Multi-Stream Compose** — Multiple RTSP cameras with combined WebRTC output + +### 5. Convert Existing Apps with convert-app Prompt (Preview) + +The Coding Agent can convert an existing video analytics application into a +native Intel DL Streamer and OpenVINO C++ project using the `convert-app` +prompt workflow. + +This is a good fit when you want to: + +- migrate a DeepStream application to DL Streamer, +- port a legacy GStreamer AI pipeline, +- preserve the functional stages of an existing app, +- generate reproducible build, run, and documentation deliverables. + +Expected output: + +- C++ source files and CMake project files. +- `run.sh` wrapper for reproducible execution. +- Model export or download scripts when required. +- `README.md` with conversion notes and validation-oriented run results. + +How to invoke: + +- Start your prompt with a reference to the DL Streamer Coding Agent. +- Ask to convert an existing application and preserve functional pipeline + stages. +- Request complete project deliverables (build files, wrapper script, and + documentation). + +The agent reads the source application directly, but you must make sure the +repository is accessible to it, either as a local path or through a URL that +can be fetched with the required permissions. + +Example invocations: + +```text +/convert-app +``` + +```text +/convert-app /workspace/my_ds_app --output-name my_ds_app_dls --device GPU +``` + +In free-form chat, you can also use: + +```text +Use DL Streamer Coding Agent from https://github.com/open-edge-platform/dlstreamer + +Convert application at to a native DL Streamer and OpenVINO C++ project. +Preserve all inference stages, generate CMakeLists.txt, run.sh, README.md, and model export scripts if needed. +``` + +For implementation details, see [convert_app.md](./convert_app.md). diff --git a/docs/user-guide/dev_guide/convert_app.md b/docs/user-guide/dev_guide/convert_app.md new file mode 100644 index 000000000..8a4a99c5f --- /dev/null +++ b/docs/user-guide/dev_guide/convert_app.md @@ -0,0 +1,128 @@ +# Application Conversion Prompt (`convert-app`) + + +## Overview + +The [`convert-app`](../../../.github/prompts/convert-app.prompt.md) prompt is an automated workflow for converting existing applications into native Intel® DL Streamer / OpenVINO™ C++ applications. It guides users through all required steps, from project scaffolding to final compliance audit, ensuring best practices and reproducibility. + +This workflow is a good fit when you want to migrate an NVIDIA DeepStream application, port a legacy GStreamer AI pipeline, preserve the functional stages of an existing app, or generate reproducible build, run, and documentation deliverables. + +## How It Works + +The prompt orchestrates the conversion process using a sequence of well-defined steps, each referencing a dedicated instruction file. The workflow ensures reproducibility, best practices, and traceability for every conversion. All steps and requirements are documented in `.github/prompts/convert-app.prompt.md` and the `.github/prompts/convert-app/` directory. + + +### Workflow Steps + +| Stage | Description | +|---------------------------------------|-----------------------------------------------------------------------------------------------| +| Scaffold project and environment | Create a new project structure, output directory, and set up the build/runtime environment. | +| Build deprecation deny-list | Scan for deprecated APIs and build a deny-list to avoid unsupported or obsolete constructs. | +| Inventory source app functional blocks| Analyze the source application and list all functional blocks and pipeline stages. | +| Plan model and pipeline mapping | Plan model substitutions and map each functional block to DL Streamer/OpenVINO pipeline elements. | +| Implement pipeline and logic | Implement the pipeline, probes, encoders, and ensure all elements are available and correct. | +| Validate correctness and outputs | Run validation (clean-shell runs, auto-fix loop) to ensure the converted app works as intended. | +| Document the conversion | Generate a detailed README and documentation for the converted application. | +| Final compliance audit | Perform a final audit to ensure all requirements and deliverables are met. | + +## What Shapes the Converted Application + +The form and structure of the converted application are not fixed — they emerge +from the interplay of several factors that the workflow analyses and resolves +automatically. Understanding these factors helps set realistic expectations +before starting a conversion. + +### AI model and its context + +The choice of AI model is one of the most influential factors: + +- **Model architecture** (SSD, YOLO, ResNet, …) determines which DL Streamer + inference elements are used (`gvadetect`, `gvaclassify`) and how the + pipeline stages are chained. +- **Inference mode** (`full-frame` vs `roi-list`) affects whether a + single-pass or cascaded detection approach is used — this choice is made + per-model based on object scale and density. +- **Precision** (FP16 / FP32 / INT8) influences runtime performance and + the target inference device selection. +- **Training domain** of the model (e.g. barrier/toll-booth vs. open-road + surveillance) determines whether a direct model reuse is valid or a + domain-matching substitute must be found. +- **Character set / language** for OCR and text-recognition models constrains + which model is selected (e.g. Latin-alphabet vs. CJK character sets). + +### Architecture of the source application + +The source app's pipeline structure is preserved 1-to-1 in the conversion: + +- **Number and order of inference stages** (PGIE, SGIE, secondary + classifiers) maps directly to the number of DL Streamer elements. +- **Presence of object tracking** determines whether `gvatrack` is included + and which tracking mode is selected. +- **Visualization and metadata output** requirements determine the rendering + and publishing elements (`gvawatermark`, `gvametaconvert`, + `gvametapublish`). + +### Target execution environment + +Runtime characteristics of the deployment machine affect the output: + +- **Available inference device** (Intel iGPU / dGPU / CPU / NPU) determines + the `device=` parameter and whether hardware video encoding is available. +- **Display availability** (headless vs. GUI) determines whether + `autovideosink`, `filesink`, or `fakesink` is used as the output sink. +- **Operating system and driver stack** may require environment workarounds + (e.g. GStreamer registry cache rebuild, Python plugin compatibility). + +### Input and output format requirements + +- **Input source type** (local file, USB camera, RTSP stream) selects the + appropriate GStreamer source element. +- **Required output format** (annotated video, JSON metadata, CSV log) + determines the metadata publishing and sink configuration. + +> **In short:** the converted application is a direct function of the source +> app's pipeline, the models available for the target platform, and the +> execution environment. The workflow resolves all these factors automatically +> and documents every decision in the generated README under +> **Conversion Notes**. + +## Example Usage + +While the prompt is primarily designed for use with an agent or automation, a typical conversion workflow may be initiated as follows (pseudo-command): + +```bash +# Example: Run the convert-app prompt to convert an application +/convert-app +``` + +Replace `` with the repository you wish to convert. The agent will guide you through each step, generate the output project, and produce a detailed README for the converted application. + +Typical invocations: + +```text +/convert-app /workspace/deepstream_lpr_app +``` + +```text +/convert-app https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps/tree/master/legacy_apps/deepstream-app +``` + +```text +/convert-app /workspace/my_ds_app --output-name my_ds_app_dls --device GPU +``` + +In free-form chat, you can also use: + +```text +Use DL Streamer Coding Agent from https://github.com/open-edge-platform/dlstreamer + +Convert application at /workspace/deepstream_lpr_app to a native DL Streamer and OpenVINO C++ project. +Preserve all inference stages, generate CMakeLists.txt, run.sh, README.md, and model export scripts if needed. +``` + +**Note:** +The `/convert-app` command is a shorthand for invoking the application conversion workflow. It is equivalent to running the full prompt with the source repository as an argument. Make sure your environment supports this command or use the full prompt invocation if needed. + +For full details, see [convert-app.prompt.md](../../../.github/prompts/convert-app.prompt.md). + +If you are converting a DeepStream-based application, see also [Converting NVIDIA DeepStream Pipelines to Deep Learning Streamer Pipeline Framework](./converting_deepstream_to_dlstreamer.md) for framework-level mapping guidance and migration context. diff --git a/docs/user-guide/dev_guide/dev_guide_index.md b/docs/user-guide/dev_guide/dev_guide_index.md index e7a59f7b8..146417021 100644 --- a/docs/user-guide/dev_guide/dev_guide_index.md +++ b/docs/user-guide/dev_guide/dev_guide_index.md @@ -68,6 +68,8 @@ - [Pre-processing description (`input_preproc`)](./model_proc_file.md#pre-processing-description) - [Post-processing description (`output_postproc`)](./model_proc_file.md#post-processing-description-output_postproc) - [Pipeline Optimizer](./optimizer.md) +- [Application Conversion Prompt](./convert_app.md) + \ No newline at end of file