logoalt Hacker News

infectoyesterday at 10:31 PM0 repliesview on HN

The way I have read their benchmark results is that they trained a model to work insanely well in their coding workflow. It’s not a general purpose model.

One of the surprisingly hardest problems to solve is to get a model to use the tools you give it access to.