GPTQ Model VRAM Usage

GPTQ Model VRAM Usage
Image generated by Stable Diffusion

From now on, my GPTQ Leaderboard Report will include VRAM usage of each models. VRAM usage is tested using two model loaders ExLLaMA_HF and AutoGPTQ. ExLLaMA_HF is used when the model is 4-bit and AutoGPTQ is used when the model is in 8-bit. This blog includes a few model VRAM usage, with more coming later integrated in the next GPTQ score report.

Generally, the 4-bit 7B model uses up to ~5 GB of VRAM with the 8-bit version using up to ~8.6 GB.

The 4-bit 13B model uses up to ~8.5 GB with the 8-bit branch using up to ~16.5 GB of GPU VRAM.

4-bit 30B model uses up to ~19.1 GB of VRAM.

VRAM Usage