GPTQ LLM Leaderboard Report #1
With tons of open-source large language models coming every single day, its hard to keep track.
In this blog series, I'll be sharing performance results for a bunch of LLMs that were assessed using EleutherAI's Language Model Evaluation Harness. The models were mainly sourced from HuggingFace. I also note all of the results to my excel. Tell me what GPTQ or 4-bit models should I test for the next LLM blog.
Tasks
I'm using GPT4All standard of tasks (no shot) which include
- BoolQ
- PIQA
- HellaSwag
- WinoGrande
- ARC-e and ARC-c (AI2's Reasoning Challenge)
- OBQA (OpenBook QA)
Arguments
python main.py \
--model hf-causal-experimental \
--model_args pretrained=<model_path>,quantized=<model_name>,gptq_use_triton=True,trust_remote_code=True \
--tasks boolq,piqa,winogrande,arc_easy,arc_challenge,openbookqa,hellaswag \
--device cuda:0 --no_cache
Models tested
- TheBloke/Manticore-13B-GPTQ
- TheBloke/Nous-Hermes-13B-GPTQ
- TheBloke/tulu-13B-GPTQ
- TheBloke/guanaco-13B-GPTQ
- digitous/13B-HyperMantis_GPTQ_4bit-128g
- TheBloke/guanaco-33B-GPTQ
- MetaIX/GPT4-X-Alpaca-30B-4bit
- TheBloke/tulu-30B-GPTQ
- TheBloke/SuperPlatty-30B-GPTQ
Results
Model | Average | BoolQ | PIQA | HellaSwag | WinoGrande | ARC-e | ARC-c | OBQA |
---|---|---|---|---|---|---|---|---|
TheBloke/Manticore-13B-GPTQ | 64.1 | 80 | 79.8 | 59.6 | 72.4 | 77.9 | 45.7 | 33 |
TheBloke/Nous-Hermes-13B-GPTQ | 64.8 | 79.9 | 78.9 | 60.9 | 71.2 | 77.8 | 48.4 | 36.6 |
TheBloke/tulu-13B-GPTQ | 64.9 | 84.04 | 79 | 60.27 | 72.77 | 77.1 | 47.35 | 34 |
TheBloke/guanaco-13B-GPTQ | 65 | 80.2 | 79.1 | 61.8 | 72.6 | 76.3 | 46.9 | 37.8 |
digitous/13B-HyperMantis_GPTQ_4bit-128g | 66 | 82.6 | 80 | 61.5 | 73.3 | 79.1 | 49.1 | 36.2 |
TheBloke/guanaco-33B-GPTQ | 66.9 | 81.7 | 80.6 | 63.3 | 74.2 | 80 | 51.3 | 37 |
MetaIX/GPT4-X-Alpaca-30B-4bit | 67.2 | 83.24 | 81.12 | 63.49 | 74.35 | 80.3 | 52.47 | 35.4 |
TheBloke/tulu-30B-GPTQ | 67.8 | 85.9 | 80.41 | 62.78 | 75.3 | 80.98 | 52.22 | 37.2 |
TheBloke/SuperPlatty-30B-GPTQ | 68.8 | 87 | 80.96 | 62.56 | 74.11 | 83.29 | 56.31 | 37.6 |