[None][feat] Weight trtllm-bench AR/AL averages by output length#14998
[None][feat] Weight trtllm-bench AR/AL averages by output length#14998zhaoyangwang-nvidia wants to merge 1 commit into
Conversation
|
Hi @NVIDIA/trtllm-bench-reviewers please help to review this PR, just very small fix, thanks~ |
📝 WalkthroughWalkthroughThe PR modifies ChangesOutput-length Weighted Average Calculation
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tensorrt_llm/bench/dataclasses/reporting.py (1)
220-227: ⚡ Quick winAdd
strict=Truetozip()calls for safer iteration.Since this codebase targets Python 3.10+ and the
output_tokens,draft_acceptance_rate, andacceptance_lengthlists are populated in the same loop, they should always have the same length. Addingstrict=Trueto thezip()calls will catch any future bugs where the lengths diverge.✨ Proposed fix
if draft_acceptance_rate_percentiles is not None: draft_acceptance_rate_percentiles.average = sum( w * v - for w, v in zip(output_tokens, draft_acceptance_rate + for w, v in zip(output_tokens, draft_acceptance_rate, strict=True )) / total_output_tokens if acceptance_length_percentiles is not None: acceptance_length_percentiles.average = sum( - w * v for w, v in zip(output_tokens, acceptance_length) + w * v for w, v in zip(output_tokens, acceptance_length, strict=True) ) / total_output_tokens🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/bench/dataclasses/reporting.py` around lines 220 - 227, Update the two zip() usages that compute weighted averages so they include strict=True to guard against mismatched lengths: add strict=True to the zip(...) in the block that sets draft_acceptance_rate_percentiles.average (zipping output_tokens and draft_acceptance_rate) and to the zip(...) in the block that sets acceptance_length_percentiles.average (zipping output_tokens and acceptance_length); this makes zip raise if list lengths diverge (Python 3.10+), preserving the existing weighted-average logic and dividing by total_output_tokens as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tensorrt_llm/bench/dataclasses/reporting.py`:
- Around line 220-227: Update the two zip() usages that compute weighted
averages so they include strict=True to guard against mismatched lengths: add
strict=True to the zip(...) in the block that sets
draft_acceptance_rate_percentiles.average (zipping output_tokens and
draft_acceptance_rate) and to the zip(...) in the block that sets
acceptance_length_percentiles.average (zipping output_tokens and
acceptance_length); this makes zip raise if list lengths diverge (Python 3.10+),
preserving the existing weighted-average logic and dividing by
total_output_tokens as before.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 404f3cfa-a004-4b31-add9-a28cff4c94c8
📒 Files selected for processing (1)
tensorrt_llm/bench/dataclasses/reporting.py
When requests produce different output lengths, an equally-weighted mean over per-request acceptance rate (AR) and acceptance length (AL) biases the reported average toward short requests, which run fewer decoding iterations. Detect inconsistent output lengths and, in that case, report an output-length weighted average for the AR/AL .average statistic so longer requests contribute proportionally. When all output lengths match, behavior is unchanged. Signed-off-by: ZhaoyangWang <zhaoyangw@nvidia.com>
1eac86d to
d33b609
Compare
|
/bot run |
|
PR_Github #52312 [ run ] triggered by Bot. Commit: |
|
PR_Github #52312 [ run ] completed with state
|
|
/bot run |
|
PR_Github #52325 [ run ] triggered by Bot. Commit: |
|
PR_Github #52325 [ run ] completed with state
|
When requests produce different output lengths, an equally-weighted mean over per-request acceptance rate (AR) and acceptance length (AL) biases the reported average toward short requests, which run fewer decoding iterations. Detect inconsistent output lengths and, in that case, report an output-length weighted average for the AR/AL .average statistic so longer requests contribute proportionally. When all output lengths match, behavior is unchanged.
Summary by CodeRabbit
Summary
In
trtllm-bench, the reported average acceptance rate (AR) and acceptancelength (AL) for speculative decoding were computed as an equally-weighted
mean over per-request values. When requests produce different output
lengths, this biases the average toward short requests (which run fewer
decoding iterations).
This PR detects whether all requests share the same output length. When
they do not, the AR/AL
.averagestatistic is reported as anoutput-length weighted mean —
Σ(output_i · value_i) / Σ(output_i)— solonger requests contribute proportionally. When all output lengths match,
behavior is unchanged.
Changes
tensorrt_llm/bench/dataclasses/reporting.py: computeoutput_lengths_consistentand override the AR/AL average with anoutput-length weighted mean when lengths differ.
Test Coverage
N/A (statistics-only change; min/max/percentiles unchanged).
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either
api-compatibleorapi-breaking. Forapi-breaking, includeBREAKINGin the PR title.Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.