S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models

Muzhi Dai, Chenxu Yang, Qingyi Si·May 12, 2025

Summary

S-GRPO reduces sequence lengths by 35.4% to 61.1% and boosts accuracy 0.72% to 6.08% across benchmarks. Compatible with advanced models. Enhances reasoning models using reinforcement learning, focusing on efficiency and effectiveness. Targets cyclic group C₁₂, making S₇ the smallest suitable subgroup.

Advanced features