Detailed testing of all the LLMs you probably use, and then some

December 5, 2024

Detailed testing of all the LLMs you probably use, and then some.

This is fantastic, useful work. Good benchmarking is hard! Thank you to @WolframRvnwlf for putting so many human- and GPU-hours into this. https://t.co/J7gfUVNaRs

Wolfram Ravenwolf@WolframRvnwlf

It's done - finally finished and published the detailed report of my latest LLM Comparison/Test on the HF Blog: 25 SOTA LLMs (including QwQ) through 59 MMLU-Pro CS benchmark runs.

Check out my findings - some of the results might surprise you just as much as they surprised me...