Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs
Chris Yuhao Liu, Liang Zeng, Jiacai Liu, Rui Yan, Jujie He, Chaojie Wang, Shuicheng Yan, Yang Liu, Yahui Zhou·October 24, 2024
Summary
The Skywork-Reward technical report introduces methods for enhancing reward modeling in Large Language Models (LLMs). It focuses on data-centric techniques, including effective data selection and filtering for high-quality open-source preference datasets, resulting in the Skywork-Reward data collection with 80K preference pairs. The report develops models like Skywork-Reward-Gemma-27B and Skywork-Reward-Llama-3.1-8B, which have improved performance on the RewardBench leaderboard. The text discusses the selection and filtering of datasets for a reward model, emphasizing the importance of prioritizing data synthesized by stronger models to enhance training efficacy. The Magpie datasets, constituting 93% of the preference pairs, are strategically chosen based on their ArmoRM scores, with the top 30% from Math and Code & debugging categories and the top 10% from other categories. The process ensures a balanced representation across task categories and prioritizes higher-quality data. The text also mentions a dataset mixture strategy, including HelpSteer2, and highlights the growing use of LLM-generated data in preference modeling. The Skywork Reward Preference 80K dataset, comprising 81,973 preference pairs, aids in reward model development. It features human and LLM annotations, with responses evaluated on five attributes. The OffsetBias dataset, with over 8,500 pairs, targets bias mitigation in preference data. The WildGuardMix dataset, including 87,000 adversarial prompts, helps in safety moderation. The Magpie series, generated by LLMs, offers fully synthetic datasets for evaluating model choices.
Advanced features