Implement a reasoning LLM in PyTorch from scratch, step by step
-
Updated
Jun 1, 2026 - Jupyter Notebook
Implement a reasoning LLM in PyTorch from scratch, step by step
(ArXiv25) Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning
Solving Inequality Proofs with Large Language Models.
An official implementation of "SPARK: Synergistic Policy And Reward Co-Evolving Framework"
ArmLLM 2025 solutions covering ViT from scratch, SigLIP–Qwen LaTeX OCR, GRPO reasoning post-training, inference-time reasoning strategies, and adversarial vision attacks.
AI Benchmark 知识库 — 全面收录各大 AI 公司用来测试模型性能的 Benchmark 题库完整集合
A minimal JEPA-based language model demonstrating latent-space reasoning on GSM8K using a single decoder-only Transformer.
Data cleaning and structuring pipeline for math reasoning tasks using Qwen3-0.6B for LLM post-training.
STaR × S1 math pipeline on Qwen2.5-1.5B. LoRA, strict Final: format, ~20–30% acc (OpenR1-Math split).
A controlled LoRA finetuning study on process supervision for mathematical reasoning with Qwen2.5-Math-7B-Instruct.
NLP course final project (2026), Nanjing Normal University, supervised by 孔力: GSM8K math QA with Seq2Seq, Transformer and LLMs.
GRPO (Group Relative Policy Optimization) implemented from scratch in PyTorch. 10 ablation experiments.
Comprehensive framework for mathematical reasoning research with dual research capabilities
Transforming weak prompts into reasoning machines using Textual Gradients and AdalFlow. Runs on Colab.
NDA-safe excerpts of math & economics modeling tasks for LLM reasoning evaluation and numerical verification.
Small-scale Implementation and Extension of “The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning” (NeurIPS '25)
An end-to-end pipeline for training and deploying a lightweight math reasoning language model (Qwen2.5-0.5B). Features CPU-compatible Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and an interactive web interface built with Flask and Streamlit.
GRPO reinforcement learning with verifiable rewards for sub-2B models
Tool-Integrated Reasoning for competition math — weighted voting, difficulty-aware allocation
Add a description, image, and links to the math-reasoning topic page so that developers can more easily learn about it.
To associate your repository with the math-reasoning topic, visit your repo's landing page and select "manage topics."