Scalable Parameter and Memory Efficient Pretraining for LLM: Recent Algorithmic Advances and Benchmarking

Athanasios Glentis, Jiaxiang Li, Qiulin Shang, Andi Han, Ioannis Tsaknakis, Quan Wei, Mingyi Hong·May 28, 2025

Summary

A study introduces weight refactorization and momentum reset for efficient pretraining of large language models, reducing memory usage by 25%. It evaluates recent strategies, emphasizing full-rankness in pre-training. Techniques like LoRA, Stable-spam, and Low-rank adapters enhance training and inference. Three papers at ICML and the Conference on Empirical Methods in NLP aim to improve model efficiency. "Badam" and "Scalable efficient training" propose memory-efficient and scalable methods, respectively. The text also discusses structural pruning, memory-efficient training, and compute-efficient optimization.

Introduction
Background
Overview of large language models and their challenges
Importance of efficient pretraining methods
Objective
To introduce and evaluate weight refactorization and momentum reset techniques for reducing memory usage in pretraining large language models
Method
Data Collection
Techniques for collecting data for pretraining
Data Preprocessing
Methods for preprocessing data to enhance model training
Weight Refactorization
Explanation of weight refactorization
Benefits and implementation details
Momentum Reset
Explanation of momentum reset
Benefits and implementation details
Recent Strategies Emphasizing Full-rankness
Overview of strategies focusing on full-rankness in pre-training
Evaluation of LoRA, Stable-spam, and Low-rank adapters
Papers at ICML and the Conference on Empirical Methods in NLP
Summary of three papers aiming to improve model efficiency
"Badam" and "Scalable efficient training" methods
Structural Pruning
Explanation of structural pruning techniques
Benefits and application in model optimization
Memory-Efficient Training
Overview of memory-efficient training methods
Compute-Efficient Optimization
Explanation of compute-efficient optimization techniques
Importance in enhancing model training speed and efficiency
Evaluation
Performance Metrics
Metrics used to evaluate the effectiveness of the proposed methods
Results
Detailed results of the evaluation, including memory usage reduction by 25%
Comparison with Existing Methods
Comparison of the proposed methods with existing techniques in terms of efficiency and performance
Conclusion
Summary of Findings
Recap of the key findings and improvements achieved
Future Work
Suggestions for further research and potential improvements
Implications
Discussion on the broader implications for the field of large language model pretraining
Basic info
papers
machine learning
artificial intelligence
Advanced features
Insights
How do weight refactorization and momentum reset contribute to reducing memory usage during the pretraining of large language models?
What memory-efficient and scalable methods are proposed by 'Badam' and 'Scalable efficient training' respectively, and how do they improve model efficiency?
Besides weight refactorization, what other techniques like structural pruning and compute-efficient optimization are discussed in the context of efficient pretraining?
What recent strategies, such as LoRA and Stable-spam, are evaluated for enhancing training and inference in large language models, and what is their emphasis on full-rankness?