December 5, 2024
Detailed testing of all the LLMs you probably use, and then some.
This is fantastic, useful work. Good benchmarking is hard! Thank you to @WolframRvnwlf for putting so many human- and GPU-hours into this. https://t.co/J7gfUVNaRs
It's done - finally finished and published the detailed report of my latest LLM Comparison/Test on the HF Blog: 25 SOTA LLMs (including QwQ) through 59 MMLU-Pro CS benchmark runs.
Check out my findings - some of the results might surprise you just as much as they surprised me...