[GLM5.1/5.2] Fix acc drop for long prompt by zejunchen-zejun · Pull Request #1379 · ROCm/ATOM

zejunchen-zejun · 2026-06-27T14:12:05Z

Before fix:

Model	Shot	Strict	Flexible
GLM5.1 FP8	5	`0.945413`	`0.939348`
GLM5.1 FP8	20	`0.780000`	`0.780000`
GLM5.2 FP8	5	`0.943139`	`0.943897`
GLM5.2 FP8	20	`0.001516`	`0.012130`

After fix:

Model	Shot	Strict	Flexible
GLM5.1 FP8	5	`0.935557240333586`	`0.9317664897649734`
GLM5.1 FP8	20	`0.9446550416982562`	`0.9454131918119788`
GLM5.2 FP8	5	`0.9416224412433661`	`0.9416224412433661`
GLM5.2 FP8	20	`0.9454131918119788`	`0.9454131918119788`
DeepSeek-V3.2	5	`0.9514783927217589`	`0.9514783927217589`
DeepSeek-V3.2	20	`0.9529946929492039`	`0.9537528430629265`

How to fix:

persistent mla meta not pass into the mla_decode_fwd when doing sparse prefill. For sparse prefill, total prefill requests are divided into multiple q_len=1 virtual decode request, while current code doesn't pass the metadata to mla_decode
chunked prefill breaks the causality
indexer should calculate the qk score after the ROPE, while current code calculates the q/k score without the ROPE
GLM5.2 use shared indexer, while for the shared layer, the indexer has been wrongly assigned None, unfortunately the succeeding layer use indexer is None to determine the mla is sparse mla or not, so the mla was wrongly chosen

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

Copilot

Pull request overview

This PR addresses long-prompt accuracy drops for GLM-5.1/5.2 (and confirms no regression for DeepSeek-V3.2) by fixing multiple issues in sparse MLA prefill/indexing, including causality handling, RoPE application in the indexer path, and correct sparse-mode selection for GLM-5.2 IndexShare “shared” layers.

Changes:

Fix indexer scoring to apply RoPE to q/k before computing QK scores, and make RoPE style configurable via rope_interleave.
Fix sparse prefill metadata construction to preserve causality across chunked prefill (per-token virtual decode layout) and generate the required sparse-prefill MLA metadata.
Ensure GLM-5.2 IndexShare “shared” layers still run sparse MLA by deriving sparsity at the model level (not per-layer indexer is None).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
atom/models/deepseek_v2.py	Applies RoPE correctly in indexer q/k scoring and propagates model-level sparse settings for IndexShare layers.
atom/model_ops/attentions/aiter_mla.py	Builds sparse-prefill per-token causality metadata and allocates/publishes sparse-prefill MLA work buffers.
atom/model_ops/attention_mla.py	Adds model-level sparse flag/top-k plumbing and forwards sparse-prefill work metadata into `mla_decode_fwd`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+            ) = get_mla_metadata_info_v1(
+                self.max_num_batched_tokens,
+                1,  # sparse prefill treats each query token as q_len=1
+                self.padded_num_attention_heads,
+                self.dtype_q,
+                self.dtype_kv,
+                is_sparse=True,
+                fast_mode=True,
+            )


[GLM5.1/5.2] Fix acc issue for long prompt

b041cb7

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

zufayu requested a review from jiayyu June 29, 2026 02:54

zejunchen-zejun added 2 commits June 29, 2026 12:11

fix GLM5.2 acc drop

5adba15

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

add potential fix for glm5.2

54873b5

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

zejunchen-zejun marked this pull request as ready for review June 30, 2026 02:25

Copilot AI review requested due to automatic review settings June 30, 2026 02:25

Copilot started reviewing on behalf of zejunchen-zejun June 30, 2026 02:26 View session

Copilot AI reviewed Jun 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GLM5.1/5.2] Fix acc drop for long prompt#1379

[GLM5.1/5.2] Fix acc drop for long prompt#1379
zejunchen-zejun wants to merge 3 commits into
mainfrom
zejun/fix_glm_20_shot_acc_issue_0627

zejunchen-zejun commented Jun 27, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

zejunchen-zejun commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zejunchen-zejun commented Jun 27, 2026 •

edited

Loading