ROCm · 0xDELUXA · Jun 28, 2026
diff --git a/README.md b/README.md
@@ -13,7 +13,7 @@
 
 ## 📢 News
 
-- **[2026/06]** Experimental **Navi 4 (RDNA4 / gfx1201)** support — AMD Radeon RX 9070 / RX 9070 XT and Radeon AI PRO R9700. See the [Qwen3-8B-FP8](recipes/Qwen3-8B-FP8.md) and [Ministral-3-8B](recipes/Ministral-3-8B.md) recipes.
+- **[2026/06]** Experimental **Navi 4 (RDNA4 / gfx1200, gfx1201)** support — AMD Radeon RX 9060 / RX 9060 XT (Navi 44 / gfx1200) and RX 9070 / RX 9070 XT and Radeon AI PRO R9700 (Navi 48 / gfx1201). Both chips share the same Triton fallback path; build aiter for the matching arch (`GPU_ARCHS=gfx1200` or `gfx1201`). See the [Qwen3-8B-FP8](recipes/Qwen3-8B-FP8.md) and [Ministral-3-8B](recipes/Ministral-3-8B.md) recipes.
 - **[2026/06]** ATOM now supports **GLM-5.2** (`glm_moe_dsa`) in FP8, including the new **IndexShare** DSA schedule (shared layers reuse the preceding full layer's indexer). See [GLM-5.2 recipe](recipes/GLM-5.md#glm-52-indexshare).
 - **[2026/05]** ATOM now supports **Qwen3.5 multimodal image+text inference** on the native engine and OpenAI-compatible chat API. See [Qwen3.5 multimodal recipe](recipes/Qwen3.5_multimodel.md).
 - **[2026/05]** ATOM now supports **online quantization** — re-quantize unquantized or FP8-block source checkpoints to PTPC-FP8 / MXFP4 mixed precision at load time via `--online_quant_config`, no offline re-packing required. See [online quantization guide](docs/online_quantization_guide.md).

diff --git a/recipes/Ministral-3-8B.md b/recipes/Ministral-3-8B.md
@@ -5,10 +5,13 @@ RDNA4 GPU. ATOM runs attention and GEMM through Triton
 (`ATOM_USE_UNIFIED_ATTN=1`, `ATOM_USE_TRITON_GEMM=1`); the KV-cache write,
 RoPE and norms use native aiter HIP kernels.
 
-> **Navi (gfx1201) prerequisite:** aiter must be built for the arch — see
+> **Navi (gfx1200 / gfx1201) prerequisite:** aiter must be built for the arch — see
 > [ROCm/aiter#3846](https://github.com/ROCm/aiter/issues/3846). Short-term
-> fix: build aiter from source with `GPU_ARCHS=gfx1201` (a native build on
-> the card does this automatically).
+> fix: build aiter from source with `GPU_ARCHS=gfx1201` (Navi 48: RX 9070 /
+> RX 9070 XT / AI PRO R9700) or `GPU_ARCHS=gfx1200` (Navi 44: RX 9060 /
+> RX 9060 XT). A native build on the card does this automatically. Both are
+> RDNA4 and use the same Triton path below; the benchmarks here were
+> measured on gfx1201.
 
 ## Model
 

diff --git a/recipes/Qwen3-8B-FP8.md b/recipes/Qwen3-8B-FP8.md
@@ -1,9 +1,12 @@
 # Qwen3-8B-FP8 (block-128) on RX 9070 XT (gfx1201) via ROCm/ATOM
 
 Verified path on RX 9070 XT (gfx1201). Attention and GEMM run through
-Triton; same backend setup and the **build-aiter-for-gfx1201** prerequisite
+Triton; same backend setup and the **build-aiter-for-the-arch** prerequisite
 ([ROCm/aiter#3846](https://github.com/ROCm/aiter/issues/3846)) as the
-[Ministral-3-8B recipe](./Ministral-3-8B.md).
+[Ministral-3-8B recipe](./Ministral-3-8B.md) — build aiter with
+`GPU_ARCHS=gfx1201` (Navi 48) or `GPU_ARCHS=gfx1200` (Navi 44: RX 9060 /
+RX 9060 XT). Both RDNA4 chips share this Triton path; the numbers below are
+from gfx1201.
 
 ## Model
 
@@ -34,7 +37,7 @@ export ATOM_ENABLE_ALLREDUCE_RMSNORM_FUSION=0
 `ATOM_LLAMA_ENABLE_AITER_TRITON_FUSED_RMSNORM_QUANT=1` and
 `ATOM_LLAMA_ENABLE_AITER_TRITON_FUSED_SILU_MUL_QUANT=1` to fuse
 normalization/activation with FP8 quantization. Requires HIP
-`rmsnorm_quant` to JIT-compile on gfx1201 — test before enabling.
+`rmsnorm_quant` to JIT-compile on gfx1200 / gfx1201 — test before enabling.
 
 ## Required CLI flags