For similarly sized models, not looking very good on the slightly-less-benchmaxxed Terminal-Bench 2.0:
Laguna XS.2 33B-A3B params: 30.6
Qwen 3.6 35B-A3B : 51.5
Devstral 2 123B : 31.2
Quite a huge lead for Qwen... well, at least it's catching up to other smaller Western labs.
Need to look at SWEBench-Pro, it's super competitive. Suspect they'll catch up given the longer-tail on TB scores.