GPTQ LLM Leaderboard Report #6
A lot of GPTQ models from TheBloke have come up recently. I'm gonna evaluate all of them slowly in the coming weeks/months. With this report, TheBloke/upstage-llama-30b-instruct-2048-GPTQ (gptq-4bit-128g-actorder_True) takes the crown as the highest rated model. From my testing, it's the best instruction following model that I can run. The model is also used on my fork of generative-agents, which simulates human behavior in the style of an rpg game.
Models Tested
- TheBloke/Dolphin-Llama2-7B-GPTQ (main)
- TheBloke/Dolphin-Llama2-7B-GPTQ (gptq-4bit-32g-actorder_True)
- TheBloke/Dolphin-Llama2-7B-GPTQ (gptq-8bit-128g-actorder_True)
- TheBloke/Chronolima-Airo-Grad-L2-13B-GPTQ (main)
- TheBloke/Chronolima-Airo-Grad-L2-13B-GPTQ (gptq-4bit-32g-actorder_True)
- TheBloke/Chronolima-Airo-Grad-L2-13B-GPTQ (gptq-8bit-64g-actorder_True)
- TheBloke/Chronohermes-Grad-L2-13B-GPTQ (main)
- TheBloke/Chronohermes-Grad-L2-13B-GPTQ (gptq-4bit-32g-actorder_True)
- TheBloke/Chronohermes-Grad-L2-13B-GPTQ (gptq-8bit-64g-actorder_True)
- TheBloke/Airolima-Chronos-Grad-L2-13B-GPTQ (main)
- TheBloke/Airolima-Chronos-Grad-L2-13B-GPTQ (gptq-4bit-32g-actorder_True)
- TheBloke/Airolima-Chronos-Grad-L2-13B-GPTQ (gptq-8bit-64g-actorder_True)
- Austism/chronos-hermes-13b-v2-GPTQ (main)
- TheBloke/qCammel-13-GPTQ (main)
- TheBloke/qCammel-13-GPTQ (gptq-4bit-32g-actorder_True)
- TheBloke/qCammel-13-GPTQ (gptq-8bit-64g-actorder_True)
- TheBloke/airoboros-33B-GPT4-m2.0-GPTQ (main)
- TheBloke/airoboros-33B-GPT4-m2.0-GPTQ (gptq-4bit-128g-actorder_True)
- TheBloke/upstage-llama-30b-instruct-2048-GPTQ (gptq-4bit-128g-actorder_True)
Leave a comment to let me know what GPTQ model should I evaluate next :)
Results
You can see all the results here.
The most VRAM hungry model of this report is TheBloke/upstage-llama-30b-instruct-2048-GPTQ (gptq-4bit-128g-actorder_True), uses up to 20172 MBs of VRAM, which also achieves the highest score.
The least VRAM hungry model of this report is TheBloke/Dolphin-Llama2-7B-GPTQ (main), which uses up to 4925 MBs of VRAM. This model sits as the lowest achieving model of all the models tested so far.
Full Results
Dolphin-Llama2-7B
qCammel-13
Airolima-Chronos-Grad-L2-13B
Chronohermes-Grad-L2-13B
Chronolima-Airo-Grad-L2-13B
30B Models
I've also included older models to highlight the differences between main (gptq_model-4bit--1g) and gptq_model-4bit-128g. Main branch being gptq_model-4bit--1g seems to only apply to 30B parameter model.