🔬Training-Free Metric Picks Best LLM Reasoning Data
TL;DR
Researchers propose High-Entropy Sum (HES), a training-free way to rank reasoning training data by summing entropy of only the top 0.5% highest-entropy tokens. Training on the top 20% of HES-ranked data matched full-dataset SFT performance.
Researchers propose High-Entropy Sum (HES), a training-free way to rank reasoning training data by summing entropy of only the top 0.5% highest-entropy tokens. Training on the top 20% of HES-ranked data matched full-dataset SFT performance.
Key Points
HES scores a sample using only its top ~0.5% highest-entropy tokens, with no extra training
In SFT, top-20% HES data matched full-dataset performance while lowest-HES data degraded it
Validated across supervised fine-tuning, rejection fine-tuning, and reinforcement learning
Submitted to arXiv on May 21, 2026 by a team including Qwen researcher Dayiheng Liu
Why It Matters
A cheap, training-free metric that cuts reasoning datasets to a fifth without losing quality lowers the cost of building strong reasoning models.
Quick Facts
Frequently Asked Questions
Why does this matter?
A cheap, training-free metric that cuts reasoning datasets to a fifth without losing quality lowers the cost of building strong reasoning models.
What happened?
Researchers propose High-Entropy Sum (HES), a training-free way to rank reasoning training data by summing entropy of only the top 0.5% highest-entropy tokens. Training on the top 20% of HES-ranked data matched full-dataset SFT performance.
Comments
Be the first to comment
Enjoyed this article?
Get it daily. 7am. Free. Reads in 5 minutes.