logoalt Hacker News

storusyesterday at 9:33 PM0 repliesview on HN

That's why you can use latest open coding models locally that reportedly reached the performance of Sonet-4.5 so almost SOTA. And then you can think of tricks like I mentioned above to directly manipulate GPU RAM for context cleanup when needed which is not possible with cloud models unless their provider enables that.