Possible, though you eventually run into types of issues that you recall the model just not having before. Like accessing a database or not following the SOP you have it read each time it performs X routine task. There are also patterns that are much less ambiguous like getting caught in loops or failing to execute a script it wrote after ten attempts.
yes but i keep wondering if that's just the game of chance doing its thing
like these models are nondeterministic right? (besides the fact that rng things like top k selection and temperature exist)
say with every prompt there is 2% odds the AI gets it massively wrong. what if i had just lucked out the past couple weeks and now i had a streak of bad luck?
and since my expectations are based on its previous (lucky) performance i now judge it even though it isn't different?
or is it giving you consistenly worse performance, not able to get it right even after clearing context and trying again, on the exact same problem etc?