GitHub - chen-hao-chao/mdm-prime-v2: MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Scaling of Diffusion Language Models

News

✏️ [May 22, 2026] Released a corrected paper. Check out mdm-prime for perplexity evaluation on OWT.
📓 [May 1, 2026] Released errata note. The current NLL evaluation has bugs. (old preprint)

What’s Inside

This repository contains the code implementation of the experiments presented in the paper MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Scaling of Diffusion Language Models.

🐳 Docker environments for easy installation
🤗 Pretrained weights for inference and evaluation
📉 Weights and Biases logs for enhanced reproducibility
🔬 Code for all experiments in our paper:
- Scaling Analysis
- Larger-scale Pretraining

Overview

Scaling Analysis

Folder: mdm-prime-v2/megatron
Dataset: allenai/c4
Weights & Biases Logs: lance_chao/megatron-all-runs
Best for: (1) Studying the loss behavior; (2) Pretraining with advanced parallelism

Larger-scale Pretraining

Folder: mdm-prime-v2/lit_gpt
Dataset: cerebras/SlimPajama-627B (or gmongaras/SlimPajama-627B_Reupload)
Best for: (1) Pretraining 1.1B models; (2) Running inference and downstream applications

Demo

Download our docker image and launch gradio_demo.py:

# Pull and launch the docker image
docker pull chenhaochao/mdm-prime-v2-litgpt:latest
docker run -v $(pwd):/workspace --rm -it --gpus all --ipc=host -p 3000:3000 chenhaochao/mdm-prime-v2-litgpt:latest

# Install gradio and run gradio_demo.py
uv pip install gradio
/venv/mdm-prime-v2-litgpt/bin/python gradio_demo.py

Loading the model's weights takes a few minutes. After running the commands, the demo website will be available at http://localhost:3000/.

License

This code implementation is developed based on the following repositories.

ML-GSAI/SMDM (at commit 1df2e12), licensed under the Apache-2.0 license.
jzhang38/TinyLlama (at commit bf12224), licensed under the Apache-2.0 license.
NVIDIA/Megatron-LM (at commit 636179d), licensed under the Apache-2.0 license.
wmn-231314/diffusion-data-constraint (at commit 61002b2), licensed under the Apache-2.0 license.

Further changes based on the code in this folder are licensed under the Apache-2.0 license.

Citation

If you find this code implementation useful, please consider citing our papers.

@article{chao2026mdmprimev2,
      title = {{MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Scaling of Diffusion Language Models}}, 
      author = {Chen-Hao Chao, Wei-Fang Sun, Junwei Quan, Chun-Yi Lee, Rahul G. Krishnan},
      year = {2026},
}
@article{chao2026dependency,
      title   = {{Dependency Breaks Validity of Loss Functions in Masked Diffusion Models}},
      author  = {Chao, Chen-Hao and Xu, Minkai and Geffner, Tomas and Vahdat, Arash and Krishnan, Rahul G.},
      journal = {chen-hao-chao.github.io},
      year    = {2026}
}
@inproceedings{chao2025mdmprime,
      title = {{Beyond Masked and Unmasked: Discrete Diffusion Models via Partial Masking}}, 
      author = {Chen-Hao Chao, Wei-Fang Sun, Hanwen Liang, Chun-Yi Lee, Rahul G. Krishnan},
      booktitle = {Proceedings of the Conference on Neural Information Processing Systems (NeurIPS)},
      year = {2025},
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
assets		assets
lit_gpt		lit_gpt
megatron		megatron
plot		plot
README.md		README.md
gradio_demo.py		gradio_demo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News

What’s Inside

Overview

Scaling Analysis

Larger-scale Pretraining

Demo

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

News

What’s Inside

Overview

Scaling Analysis

Larger-scale Pretraining

Demo

License

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages