GPTQ Model VRAM Usage
From now on, my GPTQ Leaderboard Report will include VRAM usage of each models. VRAM usage is tested using two model loaders ExLLaMA_HF and AutoGPTQ. ExLLaMA_HF is used when the model is 4-bit and AutoGPTQ is used when the model is in 8-bit. This blog includes a few model VRAM usage, with more coming later integrated in the next GPTQ score report.
Generally, the 4-bit 7B model uses up to ~5 GB of VRAM with the 8-bit version using up to ~8.6 GB.
The 4-bit 13B model uses up to ~8.5 GB with the 8-bit branch using up to ~16.5 GB of GPU VRAM.
4-bit 30B model uses up to ~19.1 GB of VRAM.