Skip to content

chen-hao-chao/mdm-prime-v2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation



MDM-Prime Paper on arXiv MDM-Prime-v2 on Hugging Face MDM-Prime-v2 on Docker MDM-Prime-v2 on Docker MDM-Prime-v2 on X

News

  • ✏️ [May 22, 2026] Released a corrected paper. Check out mdm-prime for perplexity evaluation on OWT.
  • 📓 [May 1, 2026] Released errata note. The current NLL evaluation has bugs. (old preprint)

What’s Inside

This repository contains the code implementation of the experiments presented in the paper MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Scaling of Diffusion Language Models.

  • 🐳 Docker environments for easy installation
  • 🤗 Pretrained weights for inference and evaluation
  • 📉 Weights and Biases logs for enhanced reproducibility
  • 🔬 Code for all experiments in our paper:
    • Scaling Analysis
    • Larger-scale Pretraining

Overview

Scaling Analysis

Larger-scale Pretraining

Demo

  • Download our docker image and launch gradio_demo.py:
# Pull and launch the docker image
docker pull chenhaochao/mdm-prime-v2-litgpt:latest
docker run -v $(pwd):/workspace --rm -it --gpus all --ipc=host -p 3000:3000 chenhaochao/mdm-prime-v2-litgpt:latest

# Install gradio and run gradio_demo.py
uv pip install gradio
/venv/mdm-prime-v2-litgpt/bin/python gradio_demo.py
  • Loading the model's weights takes a few minutes. After running the commands, the demo website will be available at http://localhost:3000/.

License

This code implementation is developed based on the following repositories.

Further changes based on the code in this folder are licensed under the Apache-2.0 license.

Citation

If you find this code implementation useful, please consider citing our papers.

@article{chao2026mdmprimev2,
      title = {{MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Scaling of Diffusion Language Models}}, 
      author = {Chen-Hao Chao, Wei-Fang Sun, Junwei Quan, Chun-Yi Lee, Rahul G. Krishnan},
      year = {2026},
}
@article{chao2026dependency,
      title   = {{Dependency Breaks Validity of Loss Functions in Masked Diffusion Models}},
      author  = {Chao, Chen-Hao and Xu, Minkai and Geffner, Tomas and Vahdat, Arash and Krishnan, Rahul G.},
      journal = {chen-hao-chao.github.io},
      year    = {2026}
}
@inproceedings{chao2025mdmprime,
      title = {{Beyond Masked and Unmasked: Discrete Diffusion Models via Partial Masking}}, 
      author = {Chen-Hao Chao, Wei-Fang Sun, Hanwen Liang, Chun-Yi Lee, Rahul G. Krishnan},
      booktitle = {Proceedings of the Conference on Neural Information Processing Systems (NeurIPS)},
      year = {2025},
}

About

MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Scaling of Diffusion Language Models

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors