Yes it does! I haven't published those evals yet, but I'm actually running 24-35B class models on a custom coding harness built on forge (even 120B class recently).
I just need more GPU wall clock time to get more evals done. ETA is...a few weeks? Got distracted by the coding harness.
But the results are the same. Reforged models do better than bare, even at those sizes. As for published results, I ran forge on Anthropic models and reforged doe better than bare for them as well :)
Yes it does! I haven't published those evals yet, but I'm actually running 24-35B class models on a custom coding harness built on forge (even 120B class recently).
I just need more GPU wall clock time to get more evals done. ETA is...a few weeks? Got distracted by the coding harness.
But the results are the same. Reforged models do better than bare, even at those sizes. As for published results, I ran forge on Anthropic models and reforged doe better than bare for them as well :)