Skip to content

[Minimax M3] Enable AR + RMSNorm + quant fusion#1415

Open
Phi-C wants to merge 3 commits into
ROCm:mainfrom
Phi-C:norm_quant_fusion
Open

[Minimax M3] Enable AR + RMSNorm + quant fusion#1415
Phi-C wants to merge 3 commits into
ROCm:mainfrom
Phi-C:norm_quant_fusion

Conversation

@Phi-C

@Phi-C Phi-C commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Motivation

Support allreduce + rmsnorm + quant fusion in Minimax M3

Technical Details

Test Plan

model_path=/shared/data/amd_int/models/MiniMax-M3-MXFP8/
export AITER_QUICK_REDUCE_QUANTIZATION=INT4
export AITER_QUICK_REDUCE_CAST_BF16_TO_FP16=0
export ATOM_FORCE_ATTN_TRITON=1
 
python -m atom.entrypoints.openai_server \
  --model "$model_path" \
  --tensor-parallel-size 4 \
  --server-port 8013 \
  --trust-remote-code \
  --gpu-memory-utilization 0.85 \
  --block-size 128 \
  --max-model-len 32768 \
  --no-enable_prefix_caching \
  --max-num-seqs 256 \
  --kv_cache_dtype fp8 \
  --online_quant_config '{"global_quant_config": "ptpc_fp8", "exclude_layer": ["lm_head", "model.embed_tokens", "vision_tower", "multi_modal_projector", "patch_merge_mlp", "*block_sparse_moe"]}' \
  --max-num-batched-tokens 32768 2>&1 | tee m3-mxfp8-server-tp4.log

Test Result

Without fusion

image

With fusion

image

Submission Checklist

Signed-off-by: Phi-C <chenxjhit@163.com>
@zufayu zufayu requested a review from ZhangLirong-amd July 1, 2026 01:43
Signed-off-by: Phi-C <chenxjhit@163.com>
@Phi-C Phi-C changed the title [Minimax M3] Enable AR RMSNorm quant fusion [Minimax M3] Enable AR + RMSNorm + quant fusion Jul 1, 2026
@Phi-C Phi-C requested a review from zhuyuhua-v July 1, 2026 02:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant