v1.1 chunks 2-4: real-video string-resolution + audio-alignment diagnosis by pgil256 · Pull Request #18 · pgil256/tab_vision

pgil256 · 2026-06-18T17:44:38Z

v1.1 chunks 2–4 — real-video string-resolution chain + audio-alignment diagnosis

Lands the historical v1.1 chunk 2–4 work (chunk-1 already merged via #17) plus the
private-corpus removal. Conflict-free vs main (merge-base 42dc85e; main's extra
commits are docs-only).

What's in it

Chunk 2 — real MediaPipe CV chain validated. Trained YOLO-OBB neck detector
generalizes to the eval rig; the real CV chain lifts aggregate Tab F1
0.42 → 0.54 (oracle 1.00) on 24 clips / 527 notes.
Chunk 3 — real-video robustness gate. Auto-orientation, multi-frame onset
voting, homography_confidence-scaled fusion weight, and collapse-to-audio when
video is sparse/weak. Gold-pitch real-video eval is no-regression and lifts
aggregate Tab F1 0.4243 → 0.5453.
Chunk 4 — audio transcription/alignment diagnosis. Cached UT-Austin alignment
probe + Guitar-TECHS second-corpus run. Conclusion: keep highres; the residual
Tab F1 ceiling is the audio-only string-resolution limit, so the video chain is the
lever (not an audio-model switch).
Removes the private video corpus and stale tabvision-server/ experiment
artifacts (~71k-line deletion, mostly test-data/ + tool outputs / benchmark
fixtures).

Blast-radius note (production safety)

Production entrypoints tabvision-server/modal_app.py and app/v1_adapter.py
are untouched.
Only live production edits: app/fusion_engine.py (4 lines, corpus path ref) and
requirements.txt (−4 lines, corpus-related deps). Eyeball requirements.txt if
the Modal build pins from it.

Tests

pytest: 410 passed, 12 skipped (local, Windows venv).

Not in this PR

The GAPS recon spike from this session (GAPS confirmed viable as the v1.1 acceptance
corpus) is documented separately.

🤖 Generated with Claude Code

…2->0.54 Wire and validate the MediaPipe CV chain on the Kaggle UT-Austin rig: PNG frame -> YOLO-OBB neck -> homography -> MediaPipe hands -> fingertip_to_fret -> FrameFingering. The string lever now holds on REAL video, not just oracle. - MediaPipe detection works; hand-selection inverts on this non-mirrored rig (v0 picks the strumming hand) -> fixed geometrically (project tips onto the canonical board) in the real-chain probe. - Trained the YOLO-OBB fretboard detector (Option 4): yolo11n-obb, CPU, epoch-4 best.pt (val mAP50 0.87). Generalizes to the rig: neck + localized homography 100% (vs the geometric detector's full-frame mis-fire). - Homography orientation is inverted on this rig (nut/body + string axes); a known preflight item. Corrected, the real chain lifts aggregate Tab F1 0.42 -> 0.54 (+0.12, up to +0.58/clip) vs oracle 1.00 (24 clips / 527 notes). Probes: scripts/eval/v1_1_{mediapipe_sanity,yolo_rig_probe,real_chain_probe}.py Train runner: scripts/train/_cpu_train_guitar_obb.py Report: docs/EVAL_REPORTS/v1_1_chunk2_cv_chain_2026-06-10.md Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Complete the v1.1 chunk-4 audio transcription/alignment work by landing the highres second-corpus diagnostic that the design doc left open. - Add scripts/eval/v1_1_second_corpus_probe.py: a cached/resumable wrapper over tabvision.eval.composite.run_composite_eval that caches each clip's predicted TabEvent list to disk (keyed by media path + mtime + backend/prior/video settings), so a long highres run survives the 30-min interactive budget and resumes from cache on restart. + 6 unit tests for the cache contract. - Score the full 12-clip Guitar-TECHS direct-input chord set (highres, prior none, no calibration): onset F1 0.7321, pitch F1 0.6787, Tab F1 0.0700, across 1292 notes; loss dominated by wrong_position_same_pitch (43.4%) and extra_detection (34.9%). Reports under docs/EVAL_REPORTS/. - Cross-corpus diagnosis (written into the design doc + DECISIONS.md): highres is not globally broken (uncalibrated 0.73/0.68 on a second corpus vs UT-Austin raw ~0), the UT-Austin collapse is corpus-specific alignment, and the residual Tab F1 ceiling is the audio-only string-resolution limit (the v1.1 video chain's job) -- so keep highres; do not switch audio models. Honesty bounds recorded: 0.68 pitch still fails the 0.90 audio gate; Guitar-TECHS is an electric/out-of-domain/n=12 diagnostic, not an acceptance baseline. - Also check in the pre-existing chunk-4 smoke artifacts (smoke manifest + reports + test) that seeded this run. Additive only: no production audio/fusion/video/pipeline code changed, so the accepted GuitarSet v1 audio evidence cannot regress. Full unit suite green (391 passed, 4 skipped); ruff + mypy clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

vercel · 2026-06-18T17:44:44Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
tab_vision	Ready	Preview, Comment	Jun 18, 2026 5:48pm

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

pgil256 and others added 7 commits June 11, 2026 06:31

chore(v1.1): checkpoint chunk-2 artifacts

7c8b9fa

feat(v1.1): chunk-3 real-video robustness and highres eval

d060be5

data: remove private video corpus

1d12f2d

eval: add utaustin replacement manifest

6e093b5

eval: add utaustin audio alignment probe

b25dfa9

style: ruff format test_eval_manifest.py to pass CI format gate

77fa961

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

vercel Bot deployed to Preview June 18, 2026 17:48 View deployment

pgil256 merged commit 7b2ad75 into main Jun 18, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.1 chunks 2-4: real-video string-resolution + audio-alignment diagnosis#18

v1.1 chunks 2-4: real-video string-resolution + audio-alignment diagnosis#18
pgil256 merged 8 commits into
mainfrom
v1.1/oracle-string-resolution

pgil256 commented Jun 18, 2026

Uh oh!

vercel Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pgil256 commented Jun 18, 2026

v1.1 chunks 2–4 — real-video string-resolution chain + audio-alignment diagnosis

What's in it

Blast-radius note (production safety)

Tests

Not in this PR

Uh oh!

vercel Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Jun 18, 2026 •

edited

Loading