Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@

## 📢 News

- **[2026/06]** Experimental **Navi 4 (RDNA4 / gfx1201)** support — AMD Radeon RX 9070 / RX 9070 XT and Radeon AI PRO R9700. See the [Qwen3-8B-FP8](recipes/Qwen3-8B-FP8.md) and [Ministral-3-8B](recipes/Ministral-3-8B.md) recipes.
- **[2026/06]** Experimental **Navi 4 (RDNA4 / gfx1200, gfx1201)** support — AMD Radeon RX 9060 / RX 9060 XT (Navi 44 / gfx1200) and RX 9070 / RX 9070 XT and Radeon AI PRO R9700 (Navi 48 / gfx1201). Both chips share the same Triton fallback path; build aiter for the matching arch (`GPU_ARCHS=gfx1200` or `gfx1201`). See the [Qwen3-8B-FP8](recipes/Qwen3-8B-FP8.md) and [Ministral-3-8B](recipes/Ministral-3-8B.md) recipes.
- **[2026/06]** ATOM now supports **GLM-5.2** (`glm_moe_dsa`) in FP8, including the new **IndexShare** DSA schedule (shared layers reuse the preceding full layer's indexer). See [GLM-5.2 recipe](recipes/GLM-5.md#glm-52-indexshare).
- **[2026/05]** ATOM now supports **Qwen3.5 multimodal image+text inference** on the native engine and OpenAI-compatible chat API. See [Qwen3.5 multimodal recipe](recipes/Qwen3.5_multimodel.md).
- **[2026/05]** ATOM now supports **online quantization** — re-quantize unquantized or FP8-block source checkpoints to PTPC-FP8 / MXFP4 mixed precision at load time via `--online_quant_config`, no offline re-packing required. See [online quantization guide](docs/online_quantization_guide.md).
Expand Down
9 changes: 6 additions & 3 deletions recipes/Ministral-3-8B.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,13 @@ RDNA4 GPU. ATOM runs attention and GEMM through Triton
(`ATOM_USE_UNIFIED_ATTN=1`, `ATOM_USE_TRITON_GEMM=1`); the KV-cache write,
RoPE and norms use native aiter HIP kernels.

> **Navi (gfx1201) prerequisite:** aiter must be built for the arch — see
> **Navi (gfx1200 / gfx1201) prerequisite:** aiter must be built for the arch — see
> [ROCm/aiter#3846](https://github.com/ROCm/aiter/issues/3846). Short-term
> fix: build aiter from source with `GPU_ARCHS=gfx1201` (a native build on
> the card does this automatically).
> fix: build aiter from source with `GPU_ARCHS=gfx1201` (Navi 48: RX 9070 /
> RX 9070 XT / AI PRO R9700) or `GPU_ARCHS=gfx1200` (Navi 44: RX 9060 /
> RX 9060 XT). A native build on the card does this automatically. Both are
> RDNA4 and use the same Triton path below; the benchmarks here were
> measured on gfx1201.

## Model

Expand Down
9 changes: 6 additions & 3 deletions recipes/Qwen3-8B-FP8.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
# Qwen3-8B-FP8 (block-128) on RX 9070 XT (gfx1201) via ROCm/ATOM

Verified path on RX 9070 XT (gfx1201). Attention and GEMM run through
Triton; same backend setup and the **build-aiter-for-gfx1201** prerequisite
Triton; same backend setup and the **build-aiter-for-the-arch** prerequisite
([ROCm/aiter#3846](https://github.com/ROCm/aiter/issues/3846)) as the
[Ministral-3-8B recipe](./Ministral-3-8B.md).
[Ministral-3-8B recipe](./Ministral-3-8B.md) — build aiter with
`GPU_ARCHS=gfx1201` (Navi 48) or `GPU_ARCHS=gfx1200` (Navi 44: RX 9060 /
RX 9060 XT). Both RDNA4 chips share this Triton path; the numbers below are
from gfx1201.

## Model

Expand Down Expand Up @@ -34,7 +37,7 @@ export ATOM_ENABLE_ALLREDUCE_RMSNORM_FUSION=0
`ATOM_LLAMA_ENABLE_AITER_TRITON_FUSED_RMSNORM_QUANT=1` and
`ATOM_LLAMA_ENABLE_AITER_TRITON_FUSED_SILU_MUL_QUANT=1` to fuse
normalization/activation with FP8 quantization. Requires HIP
`rmsnorm_quant` to JIT-compile on gfx1201 — test before enabling.
`rmsnorm_quant` to JIT-compile on gfx1200 / gfx1201 — test before enabling.

## Required CLI flags

Expand Down