logoalt Hacker News

kdrag0nlast Monday at 5:34 PM1 replyview on HN

what tasks can the model do out of the box? was each of the examples a different fine tuned model?


Replies

g413nlast Monday at 5:39 PM

it's a pretty general policy but this is all super early, it's great at exploring websites so fuzzing was easy, for CAD it has good enough base rates with the few-shot prompt when we do the repetitive stuff, and we gave it checkpoints on each step, the other stuff in the mosaic are just some of our favorite clips from internal evals