How do you guys manage regressions as a whole with every new model update? A massive test set of e2e...

aizk • today at 5:59 PM • 2 replies • view on HN

How do you guys manage regressions as a whole with every new model update? A massive test set of e2e problem solving seeing how the models compare?

try-working • today at 8:44 PM

I use a self-documenting recursive workflow: https://github.com/doubleuuser/rlm-workflow

bcherny • today at 6:03 PM

A mix of evals and vibes.

➕ show 3 replies

alt Hacker News