Skip to content

Add AIME26 task#1254

Open
Vedant-Agarwal wants to merge 1 commit into
huggingface:mainfrom
Vedant-Agarwal:add-aime26-task
Open

Add AIME26 task#1254
Vedant-Agarwal wants to merge 1 commit into
huggingface:mainfrom
Vedant-Agarwal:add-aime26-task

Conversation

@Vedant-Agarwal

Copy link
Copy Markdown

Closes #1167

Adds aime26, aime26_avg, and aime26_gpassk task configs to src/lighteval/tasks/tasks/aime.py, mirroring the aime25 trio exactly (same prompt function, record_to_sample, and metric choices) — only the dataset repo, splits, and version differ.

Dataset: math-ai/aime26 (confirmed in the issue comments), config default. One adaptation from the aime25 pattern: this dataset ships a single test split (30 problems) rather than train, so hf_avail_splits/evaluation_splits are ["test"]. Columns (problem, answer) match what aime_prompt expects — verified by loading the live dataset and running the prompt function on real rows.

Unlike the earlier attempt in #1218 (which targeted the multilingual LumiOpen/mAIME* datasets and bundled unrelated CI/dependency changes), this is a minimal single-file change doing exactly what the issue asks.

Verification:

  • Registry resolves all three tasks with correct metrics
  • pytest tests/unit/tasks/test_registry.py → 8 passed
  • ruff check / ruff format --check clean

Add aime26, aime26_avg, and aime26_gpassk task configs mirroring the
existing AIME25 definitions, backed by the math-ai/aime26 dataset
(default subset, test split, 30 problems).

Closes huggingface#1167
@Vedant-Agarwal

Vedant-Agarwal commented Jun 18, 2026

Copy link
Copy Markdown
Author

Friendly ping @NathanHB — this adds the AIME26 task (closes #1167), mirroring the existing aime25 configs exactly, so it's a small isolated addition. It's mergeable and I am happy to adjust if you would like anything changed. Thanks for taking a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[EVAL] Request support for AIME26

1 participant