NVIDIA · pkisfaludi-nv · Jun 5, 2026 · Jun 5, 2026 · Jun 5, 2026
@@ -35,6 +35,9 @@ TensorRT-LLM **VisualGen** provides a unified inference stack for diffusion mode
 | `Lightricks/LTX-2` | Text-to-Video (with Audio), Image-to-Video (with Audio) |
 | `Qwen/Qwen-Image` | Text-to-Image |
 | `Qwen/Qwen-Image-2512` | Text-to-Image |
+| `Tencent-Hunyuan/HunyuanDiT-v1.2-Diffusers` | Text-to-Image |
+| `Tencent-Hunyuan/HunyuanDiT-v1.1-Diffusers` | Text-to-Image |
+| `Tencent-Hunyuan/HunyuanDiT-v1.0-Diffusers` | Text-to-Image |
 
 Models are auto-detected from the checkpoint directory. Diffusers-format models are detected via `model_index.json`; LTX-2 monolithic safetensors checkpoints are detected via embedded metadata. The `AutoPipeline` registry selects the appropriate pipeline class automatically.
 
@@ -48,10 +51,13 @@ Models are auto-detected from the checkpoint directory. Diffusers-format models
 | **Wan 2.2** | Yes | Yes | No | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
 | **LTX-2** | Yes | Yes | No | Yes | Yes | No | No | Yes | Yes | Yes | Yes | No |
 | **Qwen-Image** [^2] | Yes | Yes | No | No | Yes | No | Yes | Yes | Yes | Yes | Yes | No |
+| **HunyuanDiT** [^3] | No | No | No | No | Yes | No | No | No | Yes | Yes | No | No |
 
 [^1]: FLUX models use embedded guidance and do not have a separate negative prompt path, so CFG parallelism is not applicable.
 
-[^2]: Qwen-Image ships a native BF16 implementation with per-module numerical parity vs `diffusers.QwenImagePipeline` (cosine >= 0.999 on the full 20B transformer) and `trtllm-serve` / `/v1/images/generations` support. FP8 blockwise and NVFP4 use VisualGen dynamic quantization from BF16 checkpoints; no pre-quantized checkpoint is required. 
+[^2]: Qwen-Image ships a native BF16 implementation with per-module numerical parity vs `diffusers.QwenImagePipeline` (cosine >= 0.999 on the full 20B transformer) and `trtllm-serve` / `/v1/images/generations` support. FP8 blockwise and NVFP4 use VisualGen dynamic quantization from BF16 checkpoints; no pre-quantized checkpoint is required.
+
+[^3]: HunyuanDiT uses bilingual (Chinese/English) text conditioning via a BertModel CLIP encoder and an MT5EncoderModel. Ulysses sequence parallelism is supported: after the patch-embed the latent sequence is sharded across ranks; a custom attention processor injects all-to-all collectives around self-attention while text cross-attention remains standard SDPA (text tokens are replicated). Set `ulysses_size` to the desired number of sequence-parallel ranks (must divide `num_attention_heads=16`). Quantization and ring-attention optimizations are planned for future releases.
 
 ## Quick Start
 

@@ -0,0 +1,5 @@
+attention_config:
+  backend: VANILLA
+parallel_config:
+  cfg_size: 1
+  ulysses_size: 1
@@ -35,6 +35,7 @@
 from ..pipeline_registry import AutoPipeline, register_pipeline
 from .cosmos3 import Cosmos3OmniMoTPipeline
 from .flux import Flux2Pipeline, FluxPipeline
+from .hunyuandit import HunyuanDiTPipeline
 from .ltx2 import LTX2Pipeline  # noqa: F401
 from .qwen_image import QwenImagePipeline
 from .wan import WanImageToVideoPipeline, WanPipeline
@@ -44,6 +45,7 @@
     "BasePipeline",
     "FluxPipeline",
     "Flux2Pipeline",
+    "HunyuanDiTPipeline",
     "QwenImagePipeline",
     "WanPipeline",
     "WanImageToVideoPipeline",

@@ -0,0 +1,12 @@
+# SPDX-FileCopyrightText: Copyright (c) 2022-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+
+"""HunyuanDiT text-to-image pipeline exports."""
+
+from .pipeline_hunyuandit import HunyuanDiTPipeline
+from .transformer_hunyuandit import HunyuanDiT2DModelWrapper
+
+__all__ = [
+    "HunyuanDiTPipeline",
+    "HunyuanDiT2DModelWrapper",
+]
@@ -0,0 +1,36 @@
+# SPDX-FileCopyrightText: Copyright (c) 2022-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+
+"""HunyuanDiT default generation parameters and extra-param schema."""
+
+from tensorrt_llm._torch.visual_gen.pipeline import ExtraParamSchema
+
+_HUNYUANDIT_DEFAULT_PARAMS = {
+    "height": 1024,
+    "width": 1024,
+    "num_inference_steps": 50,
+    "guidance_scale": 7.5,
+    "max_sequence_length": 77,
+}
+
+
+def get_hunyuandit_default_params() -> dict:
+    return dict(_HUNYUANDIT_DEFAULT_PARAMS)
+
+
+def get_hunyuandit_extra_param_specs() -> dict:
+    return {
+        "negative_prompt": ExtraParamSchema(
+            type="str",
+            default="",
+            description="Negative text prompt for classifier-free guidance.",
+        ),
+        "use_resolution_binning": ExtraParamSchema(
+            type="bool",
+            default=True,
+            description=(
+                "Snap resolution to the nearest HunyuanDiT training bucket "
+                "(recommended for best quality)."
+            ),
+        ),
+    }