KD-LoRA: A Hybrid Approach to Efficient Fine-Tuning with LoRA and Knowledge Distillation
Rambod Azimi, Rishav Rishav, Marek Teichmann, Samira Ebrahimi Kahou·October 28, 2024
Summary
KD-LoRA combines LoRA and knowledge distillation for efficient fine-tuning, achieving performance comparable to full fine-tuning and LoRA while significantly reducing resource requirements. It retains 98% of LoRA's performance on the GLUE benchmark, being 40% more compact. KD-LoRA decreases GPU memory usage by 30% compared to LoRA and reduces inference time by 30% compared to both full fine-tuning and LoRA. Evaluated across BERT, RoBERTa, and DeBERTaV3, it offers a novel approach to managing large language models' computational demands.
Advanced features