GPTQ LLM Leaderboard Report #2

Tested more GPTQ models including newly released and chat models, along with various model branches.

GPTQ LLM Leaderboard Report #2
Image generated by Stable Diffusion

I have got a lot of GPTQ models tested since the previous blog including newly released and chat models. I also tried different model branches to see the difference between 4-bit and 8-bit. You can find all the details in my excel charts and the tools used in my previous report blog #1.

Models tested

  • TheBloke/Pygmalion-13B-SuperHOT-8K-GPTQ
  • TehVenom/Pygmalion-13b-8bit-GPTQ
  • TehVenom/Metharme-13b-4bit-GPTQ
  • TehVenom/Metharme-13b-8bit-GPTQ
  • TheBloke/manticore-13b-chat-pyg-GPTQ
  • digitous/13B-Chimera (4bit-128g)
  • TheBloke/UltraLM-13B-GPTQ (main)
  • TheBloke/UltraLM-13B-GPTQ (gptq-8bit-128g-actorder_True)
  • TheBloke/Vicuna-13B-1-3-SuperHOT-8K-GPTQ
  • TheBloke/vicuna-13b-v1.3.0-GPTQ (main)
  • TheBloke/vicuna-13b-v1.3.0-GPTQ (gptq_model-4bit-32g-actorder_True)
  • TheBloke/vicuna-13b-v1.3.0-GPTQ (gptq_model-8bit-128g-actorder_False)
  • TheBloke/orca_mini_v2_13b-GPTQ
  • TheBloke/OpenOrca-Preview1-13B-GPTQ (main)
  • TheBloke/OpenOrca-Preview1-13B-GPTQ (gptq-8bit-128g-actorder_True)
  • TheBloke/OpenOrca-Preview1-13B-GPTQ (gptq-8bit-128g-actorder_False)
  • TheBloke/WizardLM-13B-V1-1-SuperHOT-8K-GPTQ
  • TheBloke/WizardLM-13B-V1.1-GPTQ (main)
  • TheBloke/WizardLM-13B-V1.1-GPTQ (gptq-8bit--1g-actorder_True)
  • TheBloke/WizardLM-13B-V1.1-GPTQ (gptq-8bit-128g-actorder_False)
  • TheBloke/starcoderplus-GPTQ
  • TheBloke/GodziLLa-30B-GPTQ
  • TheBloke/SuperPlatty-30B-GPTQ (gptq-4bit-128g-actorder_True)

Results

Please correct me in the comment section if there is a wrong model segment (ex: Chimera is not a chat model).

Full Result

Chat Models

  • TheBloke/Pygmalion-13B-SuperHOT-8K-GPTQ
  • TehVenom/Pygmalion-13b-8bit-GPTQ
  • TehVenom/Metharme-13b-4bit-GPTQ
  • TehVenom/Metharme-13b-8bit-GPTQ
  • TheBloke/manticore-13b-chat-pyg-GPTQ
  • digitous/13B-Chimera (4bit-128g)

UltraLM-13B

  • TheBloke/UltraLM-13B-GPTQ (main)
  • TheBloke/UltraLM-13B-GPTQ (gptq-8bit-128g-actorder_True)

Vicuna-13B v.1.3.0

  • TheBloke/Vicuna-13B-1-3-SuperHOT-8K-GPTQ
  • TheBloke/vicuna-13b-v1.3.0-GPTQ (main)
  • TheBloke/vicuna-13b-v1.3.0-GPTQ (gptq_model-4bit-32g-actorder_True)
  • TheBloke/vicuna-13b-v1.3.0-GPTQ (gptq_model-8bit-128g-actorder_False)

Orca Models

OpenOrca model is still in its preview stage, trained with only 6% of the intended dataset.

  • TheBloke/orca_mini_v2_13b-GPTQ
  • TheBloke/OpenOrca-Preview1-13B-GPTQ (main)
  • TheBloke/OpenOrca-Preview1-13B-GPTQ (gptq-8bit-128g-actorder_True)
  • TheBloke/OpenOrca-Preview1-13B-GPTQ (gptq-8bit-128g-actorder_False)

WizardLM-13B v1.1

  • TheBloke/WizardLM-13B-V1-1-SuperHOT-8K-GPTQ
  • TheBloke/WizardLM-13B-V1.1-GPTQ (main)
  • TheBloke/WizardLM-13B-V1.1-GPTQ (gptq-8bit--1g-actorder_True)
  • TheBloke/WizardLM-13B-V1.1-GPTQ (gptq-8bit-128g-actorder_False)

Other Models

Added Godzilla-30B-GPTQ and starcoderplus-GPTQ out of curiosity.

SuperPlatty-30B with 4bit, 128g, and actorder_True achieved the highest score of all the models tested, although barely beating the main SuperPlatty model by 0.1. Small performance gains across the board except for BoolQ and OpenBookQA.