S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models
Muzhi Dai, Chenxu Yang, Qingyi Si·May 12, 2025
Summary
S-GRPO reduces sequence lengths by 35.4% to 61.1% and boosts accuracy 0.72% to 6.08% across benchmarks. Compatible with advanced models. Enhances reasoning models using reinforcement learning, focusing on efficiency and effectiveness. Targets cyclic group C₁₂, making S₇ the smallest suitable subgroup.
Advanced features