GPTQ LLM Leaderboard Report #2
Tested more GPTQ models including newly released and chat models, along with various model branches.
I have got a lot of GPTQ models tested since the previous blog including newly released and chat models. I also tried different model branches to see the difference between 4-bit and 8-bit. You can find all the details in my excel charts and the tools used in my previous report blog #1.
Models tested
- TheBloke/Pygmalion-13B-SuperHOT-8K-GPTQ
- TehVenom/Pygmalion-13b-8bit-GPTQ
- TehVenom/Metharme-13b-4bit-GPTQ
- TehVenom/Metharme-13b-8bit-GPTQ
- TheBloke/manticore-13b-chat-pyg-GPTQ
- digitous/13B-Chimera (4bit-128g)
- TheBloke/UltraLM-13B-GPTQ (main)
- TheBloke/UltraLM-13B-GPTQ (gptq-8bit-128g-actorder_True)
- TheBloke/Vicuna-13B-1-3-SuperHOT-8K-GPTQ
- TheBloke/vicuna-13b-v1.3.0-GPTQ (main)
- TheBloke/vicuna-13b-v1.3.0-GPTQ (gptq_model-4bit-32g-actorder_True)
- TheBloke/vicuna-13b-v1.3.0-GPTQ (gptq_model-8bit-128g-actorder_False)
- TheBloke/orca_mini_v2_13b-GPTQ
- TheBloke/OpenOrca-Preview1-13B-GPTQ (main)
- TheBloke/OpenOrca-Preview1-13B-GPTQ (gptq-8bit-128g-actorder_True)
- TheBloke/OpenOrca-Preview1-13B-GPTQ (gptq-8bit-128g-actorder_False)
- TheBloke/WizardLM-13B-V1-1-SuperHOT-8K-GPTQ
- TheBloke/WizardLM-13B-V1.1-GPTQ (main)
- TheBloke/WizardLM-13B-V1.1-GPTQ (gptq-8bit--1g-actorder_True)
- TheBloke/WizardLM-13B-V1.1-GPTQ (gptq-8bit-128g-actorder_False)
- TheBloke/starcoderplus-GPTQ
- TheBloke/GodziLLa-30B-GPTQ
- TheBloke/SuperPlatty-30B-GPTQ (gptq-4bit-128g-actorder_True)
Results
Please correct me in the comment section if there is a wrong model segment (ex: Chimera is not a chat model).
Full Result
Chat Models
- TheBloke/Pygmalion-13B-SuperHOT-8K-GPTQ
- TehVenom/Pygmalion-13b-8bit-GPTQ
- TehVenom/Metharme-13b-4bit-GPTQ
- TehVenom/Metharme-13b-8bit-GPTQ
- TheBloke/manticore-13b-chat-pyg-GPTQ
- digitous/13B-Chimera (4bit-128g)
UltraLM-13B
- TheBloke/UltraLM-13B-GPTQ (main)
- TheBloke/UltraLM-13B-GPTQ (gptq-8bit-128g-actorder_True)
Vicuna-13B v.1.3.0
- TheBloke/Vicuna-13B-1-3-SuperHOT-8K-GPTQ
- TheBloke/vicuna-13b-v1.3.0-GPTQ (main)
- TheBloke/vicuna-13b-v1.3.0-GPTQ (gptq_model-4bit-32g-actorder_True)
- TheBloke/vicuna-13b-v1.3.0-GPTQ (gptq_model-8bit-128g-actorder_False)
Orca Models
OpenOrca model is still in its preview stage, trained with only 6% of the intended dataset.
- TheBloke/orca_mini_v2_13b-GPTQ
- TheBloke/OpenOrca-Preview1-13B-GPTQ (main)
- TheBloke/OpenOrca-Preview1-13B-GPTQ (gptq-8bit-128g-actorder_True)
- TheBloke/OpenOrca-Preview1-13B-GPTQ (gptq-8bit-128g-actorder_False)
WizardLM-13B v1.1
- TheBloke/WizardLM-13B-V1-1-SuperHOT-8K-GPTQ
- TheBloke/WizardLM-13B-V1.1-GPTQ (main)
- TheBloke/WizardLM-13B-V1.1-GPTQ (gptq-8bit--1g-actorder_True)
- TheBloke/WizardLM-13B-V1.1-GPTQ (gptq-8bit-128g-actorder_False)
Other Models
Added Godzilla-30B-GPTQ and starcoderplus-GPTQ out of curiosity.
SuperPlatty-30B with 4bit, 128g, and actorder_True achieved the highest score of all the models tested, although barely beating the main SuperPlatty model by 0.1. Small performance gains across the board except for BoolQ and OpenBookQA.