The RL is easy to describe, hard to do. The nice thing about pen testing is the reward isn't a vibe like training for code quality, the exploit either lands or it doesn't. The day to day is not glamorous at all, mostly fighting for stable gpu access, watching a cluster sit half-idle with nodes you somehow can't book.