[feat][sgl-atom] add qwen3 sglang dense model support by zhangxinyuanliuhengyu · Pull Request #1416 · ROCm/ATOM

zhangxinyuanliuhengyu · 2026-06-30T09:42:37Z

Summary

Register Qwen3ForCausalLM in the SGLang+ATOM model adapter registry so Qwen3-32B-FP8 uses the ATOM plugin model path instead of falling back to the built-in SGLang model.
Set --page-size 16 for the Qwen3-32B-FP8 benchmark entries to avoid the dense attention KV layout issue seen with page_size=1.
Keep the existing dense attention gating behavior unchanged.

Accuracy Validation

Validated Qwen3-32B-FP8 with the SGLang+ATOM service using the same --page-size 16 runtime configuration.

The initial GSM8K CI-style run looked low because lm_eval local-completions defaults to max_gen_toks=256, which truncates Qwen3 math reasoning outputs. A direct A/B confirmed this was an evaluation configuration issue, not a model accuracy regression:

GSM8K 3-shot, 100-sample smoke, default max_gen_toks=256: flexible-extract=0.56, strict-match=0.56
GSM8K 3-shot, 100-sample smoke, max_gen_toks=2048, max_length=8192: flexible-extract=0.95, strict-match=0.97
GSM8K 3-shot, full set, max_gen_toks=2048, max_length=8192: flexible-extract=0.9037, strict-match=0.9166

These full-set results are in the expected range for Qwen3-32B GSM8K accuracy and confirm the minimal Qwen3-32B support changes do not introduce an accuracy regression.

Test Plan

Launched Qwen3-32B-FP8 with SGLang+ATOM, --page-size 16, and Qwen3 reasoning parser.
Verified the service starts successfully without the previous RoPE/CUDA graph crash.
Ran GSM8K A/B evaluation to isolate the low-score cause to generation length.
Ran full GSM8K 3-shot evaluation with max_gen_toks=2048,max_length=8192.
Confirmed SGLang service cleanup and GPU memory release after validation.

Register Qwen3 dense for the SGLang ATOM wrapper and route page-size 1 dense decode through the existing native AITER path to avoid invalid KV layout reshapes during CUDA graph capture. Co-authored-by: Cursor <cursoragent@cursor.com>

Restore the existing dense attention routing and use page-size 16 only for Qwen3-32B MI308 SGLang benchmark entries so the model stays on the ATOM attention path. Co-authored-by: Cursor <cursoragent@cursor.com>

whx-sjtu and others added 2 commits June 30, 2026 16:17

fix Qwen3 dense startup with ATOM backend

a5839a7

Register Qwen3 dense for the SGLang ATOM wrapper and route page-size 1 dense decode through the existing native AITER path to avoid invalid KV layout reshapes during CUDA graph capture. Co-authored-by: Cursor <cursoragent@cursor.com>

Limit Qwen3 dense startup fix scope

1933384

Restore the existing dense attention routing and use page-size 16 only for Qwen3-32B MI308 SGLang benchmark entries so the model stays on the ATOM attention path. Co-authored-by: Cursor <cursoragent@cursor.com>

zhuyuhua-v changed the title ~~Fix/qwen3 sglang dense startup~~ [feat][sgl-atom] add qwen3 sglang dense model support Jun 30, 2026

zhuyuhua-v self-requested a review June 30, 2026 09:45

zhuyuhua-v approved these changes Jun 30, 2026

View reviewed changes

zhangxinyuanliuhengyu merged commit c8aa1a7 into main Jun 30, 2026
34 checks passed

zhangxinyuanliuhengyu deleted the fix/qwen3-sglang-dense-startup branch June 30, 2026 14:40

zhangxinyuanliuhengyu mentioned this pull request Jul 1, 2026

[sgl-atom] support Qwen3-32B in SGLang accuracy CI on MI308 #1430

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[feat][sgl-atom] add qwen3 sglang dense model support#1416

[feat][sgl-atom] add qwen3 sglang dense model support#1416
zhangxinyuanliuhengyu merged 2 commits into
mainfrom
fix/qwen3-sglang-dense-startup

zhangxinyuanliuhengyu commented Jun 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

zhangxinyuanliuhengyu commented Jun 30, 2026

Summary

Accuracy Validation

Test Plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants