INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation
May 1, 2023ยท,,,,ยท
1 min read
Yuji Chai*
John Gkountouras*
Glenn G. Ko
David Brooks
Gu-Yeon Wei
Abstract
We developed an Extremely Memory-Efficient Finetuning (EMEF) framework that combines low-rank adaptation with quantization, reducing memory requirement by 5.6x and enabling fine-tuning of LLMs on lower-resource devices. We proposed a quantization-agnostic error correction framework, Low-Rank Error Correction (LREC), that exploits additional floating-point parameters inserted for fine-tuning to mitigate downstream performance loss due to quantization, outperforming quantization baselines. Additionally, we introduced the first fully functional INT2 Large Language Model capable of generating coherent, human-level text, outperforming models compressed using prior techniques.
Type
This work is currently a pre-print.
*Equal contribution