GPTQ LLM Leaderboard Report #6

An image generated with Stable Diffusion in the style of CS2's Ancient
Generated with Stable Diffusion with custom trained LoRA on CS2's Ancient

A lot of GPTQ models from TheBloke have come up recently. I'm gonna evaluate all of them slowly in the coming weeks/months. With this report, TheBloke/upstage-llama-30b-instruct-2048-GPTQ (gptq-4bit-128g-actorder_True) takes the crown as the highest rated model. From my testing, it's the best instruction following model that I can run. The model is also used on my fork of generative-agents, which simulates human behavior in the style of an rpg game.

Models Tested

  • TheBloke/Dolphin-Llama2-7B-GPTQ (main)
  • TheBloke/Dolphin-Llama2-7B-GPTQ (gptq-4bit-32g-actorder_True)
  • TheBloke/Dolphin-Llama2-7B-GPTQ (gptq-8bit-128g-actorder_True)
  • TheBloke/Chronolima-Airo-Grad-L2-13B-GPTQ (main)
  • TheBloke/Chronolima-Airo-Grad-L2-13B-GPTQ (gptq-4bit-32g-actorder_True)
  • TheBloke/Chronolima-Airo-Grad-L2-13B-GPTQ (gptq-8bit-64g-actorder_True)
  • TheBloke/Chronohermes-Grad-L2-13B-GPTQ (main)
  • TheBloke/Chronohermes-Grad-L2-13B-GPTQ (gptq-4bit-32g-actorder_True)
  • TheBloke/Chronohermes-Grad-L2-13B-GPTQ (gptq-8bit-64g-actorder_True)
  • TheBloke/Airolima-Chronos-Grad-L2-13B-GPTQ (main)
  • TheBloke/Airolima-Chronos-Grad-L2-13B-GPTQ (gptq-4bit-32g-actorder_True)
  • TheBloke/Airolima-Chronos-Grad-L2-13B-GPTQ (gptq-8bit-64g-actorder_True)
  • Austism/chronos-hermes-13b-v2-GPTQ (main)
  • TheBloke/qCammel-13-GPTQ (main)
  • TheBloke/qCammel-13-GPTQ (gptq-4bit-32g-actorder_True)
  • TheBloke/qCammel-13-GPTQ (gptq-8bit-64g-actorder_True)
  • TheBloke/airoboros-33B-GPT4-m2.0-GPTQ (main)
  • TheBloke/airoboros-33B-GPT4-m2.0-GPTQ (gptq-4bit-128g-actorder_True)
  • TheBloke/upstage-llama-30b-instruct-2048-GPTQ (gptq-4bit-128g-actorder_True)

Leave a comment to let me know what GPTQ model should I evaluate next :)

Results

You can see all the results here.

The most VRAM hungry model of this report is TheBloke/upstage-llama-30b-instruct-2048-GPTQ (gptq-4bit-128g-actorder_True), uses up to 20172 MBs of VRAM, which also achieves the highest score.
The least VRAM hungry model of this report is TheBloke/Dolphin-Llama2-7B-GPTQ (main), which uses up to 4925 MBs of VRAM. This model sits as the lowest achieving model of all the models tested so far.

Full Results

Dolphin-Llama2-7B

qCammel-13

Airolima-Chronos-Grad-L2-13B

Chronohermes-Grad-L2-13B

Chronolima-Airo-Grad-L2-13B

30B Models

I've also included older models to highlight the differences between main (gptq_model-4bit--1g) and gptq_model-4bit-128g. Main branch being gptq_model-4bit--1g seems to only apply to 30B parameter model.