It implies that the agents could only do this because they could regurgitate previous browsers from their training data.
Anyone who's watched a coding agent work will see why that's unlikely to be what's happening. If that's all they were doing, why did it take three days and thousands of changes and tool calls to get to a working result?
I also know that AI labs treat regurgitation of training data as a bug and invest a lot of effort into making it unlikely to happen.
I recommend avoiding the temptation to look at things like this and say "yeah, that's not impressive, it saw that in the training data already". It's not a useful mental model to hold.
I don't buy this.
It implies that the agents could only do this because they could regurgitate previous browsers from their training data.
Anyone who's watched a coding agent work will see why that's unlikely to be what's happening. If that's all they were doing, why did it take three days and thousands of changes and tool calls to get to a working result?
I also know that AI labs treat regurgitation of training data as a bug and invest a lot of effort into making it unlikely to happen.
I recommend avoiding the temptation to look at things like this and say "yeah, that's not impressive, it saw that in the training data already". It's not a useful mental model to hold.