INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation

May 1, 2023ยท
Yuji Chai*
,
John Gkountouras*
,
Glenn G. Ko
,
David Brooks
,
Gu-Yeon Wei
ยท 1 min read
Abstract
We developed an Extremely Memory-Efficient Finetuning (EMEF) framework that combines low-rank adaptation with quantization, reducing memory requirement by 5.6x and enabling fine-tuning of LLMs on lower-resource devices. We proposed a quantization-agnostic error correction framework, Low-Rank Error Correction (LREC), that exploits additional floating-point parameters inserted for fine-tuning to mitigate downstream performance loss due to quantization, outperforming quantization baselines. Additionally, we introduced the first fully functional INT2 Large Language Model capable of generating coherent, human-level text, outperforming models compressed using prior techniques.
Type

This work is currently a pre-print.


*Equal contribution