GPTQ Model VRAM Usage

hgloow

Jul 24, 2023 • 1 min read

Image generated by Stable Diffusion

From now on, my GPTQ Leaderboard Report will include VRAM usage of each models. VRAM usage is tested using two model loaders ExLLaMA_HF and AutoGPTQ. ExLLaMA_HF is used when the model is 4-bit and AutoGPTQ is used when the model is in 8-bit. This blog includes a few model VRAM usage, with more coming later integrated in the next GPTQ score report.

Generally, the 4-bit 7B model uses up to ~5 GB of VRAM with the 8-bit version using up to ~8.6 GB.

The 4-bit 13B model uses up to ~8.5 GB with the 8-bit branch using up to ~16.5 GB of GPU VRAM.

4-bit 30B model uses up to ~19.1 GB of VRAM.