Add vision evaluation CI for VLM recipes with PyTorch comparison and perf benchmarks#473
Open
apsonawane wants to merge 28 commits into
Open
Add vision evaluation CI for VLM recipes with PyTorch comparison and perf benchmarks#473apsonawane wants to merge 28 commits into
apsonawane wants to merge 28 commits into
Conversation
Copilot stopped reviewing on behalf of
apsonawane due to an error
June 4, 2026 19:33
hanbitmyths
reviewed
Jun 5, 2026
| lm-eval | ||
| mobius-ai | ||
| olive-ai[gpu] | ||
| onnxruntime==1.26.0 |
Contributor
There was a problem hiding this comment.
Should we pin the version of onnxruntime and genai? If we want to check regression, should it be minimum versions? We still can pin transformers or torch version, though.
Contributor
Author
There was a problem hiding this comment.
I pinned this version because 1.25.1 is the minimum version needed for Qwen3.5 but I kept it to be latest. Also, the CI pipeline runs on 3.10 and 1.26.0 does not support it.
After 1.27.0 is released we need to pin it to it because Gemma would need that as minimum
We can definitely pin transformers and torch version
…cipes into asonawane/e2e
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
2. Per-recipe CI configs
All recipes test INT4 quantized models only:
google-gemma-4-E2B-itgoogle/gemma-4-E2B-itQwen-Qwen3-VL-2B-InstructQwen/Qwen3-VL-2B-InstructQwen-Qwen2.5-VL-3B-InstructQwen/Qwen2.5-VL-3B-InstructQwen-Qwen3.5-4BQwen/Qwen3.5-4BExample olive_ci.json
[ { "name": "gemma4-e2b-int4-cpu-vision-eval", "os": "ubuntu", "device": "cpu", "requirements_file": "requirements.txt", "command": "python ../../.github/scripts/run_vision_eval.py --config cpu/int4/config.json --pytorch-model google/gemma-4-E2B-it --benchmarks textvqa --limit 100 --device cpu --perf --max-delta 0.05" }, { "name": "gemma4-e2b-int4-cuda-vision-eval", "os": "ubuntu", "device": "cuda", "requirements_file": "requirements.txt", "command": "python ../../.github/scripts/run_vision_eval.py --config cuda/int4/config.json --pytorch-model google/gemma-4-E2B-it --benchmarks textvqa --limit 200 --device gpu --perf --max-delta 0.05" } ]3. Smarter CI filtering
generate_matrix.pynow accepts--changed-filesto only run recipes whose files were modified:Qwen-Qwen3-VL-2B-Instruct/google-gemma-4-E2B-it/.github/scripts/run_vision_eval.py.github/workflows/main.ymlworkflow_dispatch(manual trigger)4. Updated workflow triggers
main.ymlnow triggers on:**/olive_ci.json(existing)**/config.json(new — recipe config changes).github/scripts/run_vision_eval.py(new — eval script changes)Example CI output
How to add vision eval to a new VLM recipe
olive_ci.json:[ { "name": "my-vlm-int4-cpu-vision-eval", "os": "ubuntu", "device": "cpu", "requirements_file": "requirements.txt", "command": "python ../../.github/scripts/run_vision_eval.py --config cpu/int4/config.json --pytorch-model org/model-name --benchmarks textvqa --limit 100 --device cpu --perf --max-delta 0.05" }, { "name": "my-vlm-int4-cuda-vision-eval", "os": "ubuntu", "device": "cuda", "requirements_file": "requirements.txt", "command": "python ../../.github/scripts/run_vision_eval.py --config cuda/int4/config.json --pytorch-model org/model-name --benchmarks textvqa --limit 200 --device gpu --perf --max-delta 0.05" } ]Dependencies
olive-ai) with vision evaluation support (PRs #2476, #2488 — both merged)onnxruntime-genaifor ONNX inferencemobius-aifor model export (Gemma 4)pillowfor image handlingTesting checklist
workflow_dispatchstill runs all recipes