logoalt Hacker News

andaiyesterday at 7:06 PM1 replyview on HN

Fantastic. Could you share more details what it was like post-training a model?


Replies

dk189yesterday at 7:51 PM

The RL is easy to describe, hard to do. The nice thing about pen testing is the reward isn't a vibe like training for code quality, the exploit either lands or it doesn't. The day to day is not glamorous at all, mostly fighting for stable gpu access, watching a cluster sit half-idle with nodes you somehow can't book.