No lie, this is how I write most of my evals now

January 7, 2026

No lie, this is how I write most of my evals now. And since I'm made of money, I don't just use Claude as the judge, I use the *Claude Agents SDK* as the judge. https://t.co/PeB4IBOLVr

dex@dexhorthy

having Claude write new evals and then fix the prompt until they all pass is kinda dope

I know this is just poor man’s bad-slow-expensive gepa but so far it’s working