Curious if this would help larger local models? Qwen 3.6 varieties of deepseek4?

__mharrison__ • yesterday at 10:02 PM • 1 reply • view on HN

Replies

Yes it does! I haven't published those evals yet, but I'm actually running 24-35B class models on a custom coding harness built on forge (even 120B class recently).

I just need more GPU wall clock time to get more evals done. ETA is...a few weeks? Got distracted by the coding harness.

But the results are the same. Reforged models do better than bare, even at those sizes. As for published results, I ran forge on Anthropic models and reforged doe better than bare for them as well :)

➕ show 3 replies

alt Hacker News

Replies