Skip to content

v1.1 chunks 2-4: real-video string-resolution + audio-alignment diagnosis#18

Merged
pgil256 merged 8 commits into
mainfrom
v1.1/oracle-string-resolution
Jun 18, 2026
Merged

v1.1 chunks 2-4: real-video string-resolution + audio-alignment diagnosis#18
pgil256 merged 8 commits into
mainfrom
v1.1/oracle-string-resolution

Conversation

@pgil256

@pgil256 pgil256 commented Jun 18, 2026

Copy link
Copy Markdown
Owner

v1.1 chunks 2–4 — real-video string-resolution chain + audio-alignment diagnosis

Lands the historical v1.1 chunk 2–4 work (chunk-1 already merged via #17) plus the
private-corpus removal. Conflict-free vs main (merge-base 42dc85e; main's extra
commits are docs-only).

What's in it

  • Chunk 2 — real MediaPipe CV chain validated. Trained YOLO-OBB neck detector
    generalizes to the eval rig; the real CV chain lifts aggregate Tab F1
    0.42 → 0.54 (oracle 1.00) on 24 clips / 527 notes.
  • Chunk 3 — real-video robustness gate. Auto-orientation, multi-frame onset
    voting, homography_confidence-scaled fusion weight, and collapse-to-audio when
    video is sparse/weak. Gold-pitch real-video eval is no-regression and lifts
    aggregate Tab F1 0.4243 → 0.5453.
  • Chunk 4 — audio transcription/alignment diagnosis. Cached UT-Austin alignment
    probe + Guitar-TECHS second-corpus run. Conclusion: keep highres; the residual
    Tab F1 ceiling is the audio-only string-resolution limit, so the video chain is the
    lever (not an audio-model switch).
  • Removes the private video corpus and stale tabvision-server/ experiment
    artifacts (~71k-line deletion, mostly test-data/ + tool outputs / benchmark
    fixtures).

Blast-radius note (production safety)

  • Production entrypoints tabvision-server/modal_app.py and app/v1_adapter.py
    are untouched.
  • Only live production edits: app/fusion_engine.py (4 lines, corpus path ref) and
    requirements.txt (−4 lines, corpus-related deps). Eyeball requirements.txt if
    the Modal build pins from it.

Tests

pytest: 410 passed, 12 skipped (local, Windows venv).

Not in this PR

The GAPS recon spike from this session (GAPS confirmed viable as the v1.1 acceptance
corpus) is documented separately.

🤖 Generated with Claude Code

pgil256 and others added 7 commits June 11, 2026 06:31
…2->0.54

Wire and validate the MediaPipe CV chain on the Kaggle UT-Austin rig:
PNG frame -> YOLO-OBB neck -> homography -> MediaPipe hands -> fingertip_to_fret
-> FrameFingering. The string lever now holds on REAL video, not just oracle.

- MediaPipe detection works; hand-selection inverts on this non-mirrored rig
  (v0 picks the strumming hand) -> fixed geometrically (project tips onto the
  canonical board) in the real-chain probe.
- Trained the YOLO-OBB fretboard detector (Option 4): yolo11n-obb, CPU, epoch-4
  best.pt (val mAP50 0.87). Generalizes to the rig: neck + localized homography
  100% (vs the geometric detector's full-frame mis-fire).
- Homography orientation is inverted on this rig (nut/body + string axes); a
  known preflight item. Corrected, the real chain lifts aggregate Tab F1
  0.42 -> 0.54 (+0.12, up to +0.58/clip) vs oracle 1.00 (24 clips / 527 notes).

Probes: scripts/eval/v1_1_{mediapipe_sanity,yolo_rig_probe,real_chain_probe}.py
Train runner: scripts/train/_cpu_train_guitar_obb.py
Report: docs/EVAL_REPORTS/v1_1_chunk2_cv_chain_2026-06-10.md

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Complete the v1.1 chunk-4 audio transcription/alignment work by landing the
highres second-corpus diagnostic that the design doc left open.

- Add scripts/eval/v1_1_second_corpus_probe.py: a cached/resumable wrapper over
  tabvision.eval.composite.run_composite_eval that caches each clip's predicted
  TabEvent list to disk (keyed by media path + mtime + backend/prior/video
  settings), so a long highres run survives the 30-min interactive budget and
  resumes from cache on restart. + 6 unit tests for the cache contract.
- Score the full 12-clip Guitar-TECHS direct-input chord set (highres, prior
  none, no calibration): onset F1 0.7321, pitch F1 0.6787, Tab F1 0.0700,
  across 1292 notes; loss dominated by wrong_position_same_pitch (43.4%) and
  extra_detection (34.9%). Reports under docs/EVAL_REPORTS/.
- Cross-corpus diagnosis (written into the design doc + DECISIONS.md): highres
  is not globally broken (uncalibrated 0.73/0.68 on a second corpus vs UT-Austin
  raw ~0), the UT-Austin collapse is corpus-specific alignment, and the residual
  Tab F1 ceiling is the audio-only string-resolution limit (the v1.1 video
  chain's job) -- so keep highres; do not switch audio models. Honesty bounds
  recorded: 0.68 pitch still fails the 0.90 audio gate; Guitar-TECHS is an
  electric/out-of-domain/n=12 diagnostic, not an acceptance baseline.
- Also check in the pre-existing chunk-4 smoke artifacts (smoke manifest +
  reports + test) that seeded this run.

Additive only: no production audio/fusion/video/pipeline code changed, so the
accepted GuitarSet v1 audio evidence cannot regress. Full unit suite green
(391 passed, 4 skipped); ruff + mypy clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 18, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
tab_vision Ready Ready Preview, Comment Jun 18, 2026 5:48pm

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@pgil256 pgil256 merged commit 7b2ad75 into main Jun 18, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant