VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections
Roy Miles, Pradyumna Reddy, Ismail Elezi, Jiankang Deng·May 28, 2024
Summary
VeLoRA is a memory-efficient training method for large language models that addresses the issue of memory-intensive training by compressing intermediate activations without compromising performance. It achieves this by dividing tokens into sub-tokens, projecting them onto a low-dimensional subspace during the forward pass, and reconstructing them coarsely during backpropagation. VeLoRA outperforms QLoRA in fine-tuning LLaMA and shows competitive results on the C4 dataset, making it a complementary technique to state-of-the-art parameter-efficient fine-tuning methods. The method is computationally efficient, avoiding expensive operations like SVD, and is compatible with first-order optimizers. It reduces memory footprint, enabling larger models to be trained on devices with limited memory, and can be combined with quantization for further memory reduction. Experiments across various models and tasks demonstrate VeLoRA's effectiveness in improving memory efficiency and accuracy compared to existing methods like GaLore, LoRA, and full fine-tuning.
Advanced features