January 16, 2025
We are using next token prediction in the most insane ways, these days. (I do not mean that as a criticism. I am in the “we” set.) https://t.co/vi8tV4hNHa
When I use an LLM as a judge, I ask a True/False question and do (p(True)+p(true)) - (p(False)+p(false)). This works very well for me, but I don't think I've seen anyone else scoring answers this way