January 7, 2026
No lie, this is how I write most of my evals now. And since I'm made of money, I don't just use Claude as the judge, I use the *Claude Agents SDK* as the judge. https://t.co/PeB4IBOLVr
having Claude write new evals and then fix the prompt until they all pass is kinda dope
I know this is just poor man’s bad-slow-expensive gepa but so far it’s working