v1.1 chunks 2-4: real-video string-resolution + audio-alignment diagnosis#18
Merged
Conversation
…2->0.54
Wire and validate the MediaPipe CV chain on the Kaggle UT-Austin rig:
PNG frame -> YOLO-OBB neck -> homography -> MediaPipe hands -> fingertip_to_fret
-> FrameFingering. The string lever now holds on REAL video, not just oracle.
- MediaPipe detection works; hand-selection inverts on this non-mirrored rig
(v0 picks the strumming hand) -> fixed geometrically (project tips onto the
canonical board) in the real-chain probe.
- Trained the YOLO-OBB fretboard detector (Option 4): yolo11n-obb, CPU, epoch-4
best.pt (val mAP50 0.87). Generalizes to the rig: neck + localized homography
100% (vs the geometric detector's full-frame mis-fire).
- Homography orientation is inverted on this rig (nut/body + string axes); a
known preflight item. Corrected, the real chain lifts aggregate Tab F1
0.42 -> 0.54 (+0.12, up to +0.58/clip) vs oracle 1.00 (24 clips / 527 notes).
Probes: scripts/eval/v1_1_{mediapipe_sanity,yolo_rig_probe,real_chain_probe}.py
Train runner: scripts/train/_cpu_train_guitar_obb.py
Report: docs/EVAL_REPORTS/v1_1_chunk2_cv_chain_2026-06-10.md
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Complete the v1.1 chunk-4 audio transcription/alignment work by landing the highres second-corpus diagnostic that the design doc left open. - Add scripts/eval/v1_1_second_corpus_probe.py: a cached/resumable wrapper over tabvision.eval.composite.run_composite_eval that caches each clip's predicted TabEvent list to disk (keyed by media path + mtime + backend/prior/video settings), so a long highres run survives the 30-min interactive budget and resumes from cache on restart. + 6 unit tests for the cache contract. - Score the full 12-clip Guitar-TECHS direct-input chord set (highres, prior none, no calibration): onset F1 0.7321, pitch F1 0.6787, Tab F1 0.0700, across 1292 notes; loss dominated by wrong_position_same_pitch (43.4%) and extra_detection (34.9%). Reports under docs/EVAL_REPORTS/. - Cross-corpus diagnosis (written into the design doc + DECISIONS.md): highres is not globally broken (uncalibrated 0.73/0.68 on a second corpus vs UT-Austin raw ~0), the UT-Austin collapse is corpus-specific alignment, and the residual Tab F1 ceiling is the audio-only string-resolution limit (the v1.1 video chain's job) -- so keep highres; do not switch audio models. Honesty bounds recorded: 0.68 pitch still fails the 0.90 audio gate; Guitar-TECHS is an electric/out-of-domain/n=12 diagnostic, not an acceptance baseline. - Also check in the pre-existing chunk-4 smoke artifacts (smoke manifest + reports + test) that seeded this run. Additive only: no production audio/fusion/video/pipeline code changed, so the accepted GuitarSet v1 audio evidence cannot regress. Full unit suite green (391 passed, 4 skipped); ruff + mypy clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
v1.1 chunks 2–4 — real-video string-resolution chain + audio-alignment diagnosis
Lands the historical v1.1 chunk 2–4 work (chunk-1 already merged via #17) plus the
private-corpus removal. Conflict-free vs
main(merge-base42dc85e;main's extracommits are docs-only).
What's in it
generalizes to the eval rig; the real CV chain lifts aggregate Tab F1
0.42 → 0.54 (oracle 1.00) on 24 clips / 527 notes.
voting,
homography_confidence-scaled fusion weight, and collapse-to-audio whenvideo is sparse/weak. Gold-pitch real-video eval is no-regression and lifts
aggregate Tab F1 0.4243 → 0.5453.
probe + Guitar-TECHS second-corpus run. Conclusion: keep
highres; the residualTab F1 ceiling is the audio-only string-resolution limit, so the video chain is the
lever (not an audio-model switch).
tabvision-server/experimentartifacts (~71k-line deletion, mostly
test-data/+ tool outputs / benchmarkfixtures).
Blast-radius note (production safety)
tabvision-server/modal_app.pyandapp/v1_adapter.pyare untouched.
app/fusion_engine.py(4 lines, corpus path ref) andrequirements.txt(−4 lines, corpus-related deps). Eyeballrequirements.txtifthe Modal build pins from it.
Tests
pytest: 410 passed, 12 skipped (local, Windows venv).Not in this PR
The GAPS recon spike from this session (GAPS confirmed viable as the v1.1 acceptance
corpus) is documented separately.
🤖 Generated with Claude Code