You're using last week's model; Opus 4.7 is old news. Opus 6.9 is the new hotness; it is a better product manager than GPT, and has more X productivity. It replaced our junior dev team, and tells me my hair looks good.
Your research finding LLMs ineffective is invalid because you used 6.9. The current SOTA is 6.91 and it's leaps and bounds better that yesterday's 6.9
Your research finding LLMs ineffective is invalid because you used 6.9. The current SOTA is 6.91 and it's leaps and bounds better that yesterday's 6.9