logoalt Hacker News

aizktoday at 5:59 PM2 repliesview on HN

How do you guys manage regressions as a whole with every new model update? A massive test set of e2e problem solving seeing how the models compare?


Replies

try-workingtoday at 8:44 PM

I use a self-documenting recursive workflow: https://github.com/doubleuuser/rlm-workflow

bchernytoday at 6:03 PM

A mix of evals and vibes.

show 3 replies