daily-hour-news·

🔬Training-Free Metric Picks Best LLM Reasoning Data

TL;DR

Researchers propose High-Entropy Sum (HES), a training-free way to rank reasoning training data by summing entropy of only the top 0.5% highest-entropy tokens. Training on the top 20% of HES-ranked data matched full-dataset SFT performance.

Researchers propose High-Entropy Sum (HES), a training-free way to rank reasoning training data by summing entropy of only the top 0.5% highest-entropy tokens. Training on the top 20% of HES-ranked data matched full-dataset SFT performance.

Training-Free Metric Picks Best LLM Reasoning Data — daily-hour-news

Key Points

1

HES scores a sample using only its top ~0.5% highest-entropy tokens, with no extra training

2

In SFT, top-20% HES data matched full-dataset performance while lowest-HES data degraded it

3

Validated across supervised fine-tuning, rejection fine-tuning, and reinforcement learning

4

Submitted to arXiv on May 21, 2026 by a team including Qwen researcher Dayiheng Liu

Why It Matters

A cheap, training-free metric that cuts reasoning datasets to a fifth without losing quality lowers the cost of building strong reasoning models.

Quick Facts

LLM reasoningdata selectionfine-tuningreinforcement learningentropytraining efficiency

Frequently Asked Questions

Why does this matter?

A cheap, training-free metric that cuts reasoning datasets to a fifth without losing quality lowers the cost of building strong reasoning models.

What happened?

Researchers propose High-Entropy Sum (HES), a training-free way to rank reasoning training data by summing entropy of only the top 0.5% highest-entropy tokens. Training on the top 20% of HES-ranked data matched full-dataset SFT performance.

Comments

Subscribe to join the conversation...

Be the first to comment

Enjoyed this article?

Get it daily. 7am. Free. Reads in 5 minutes.