Opus 4.5 really is something else. I've been having a ton of fun throwing absurdly difficult problems at it recently and it keeps on surprising me.
A JavaScript interpreter written in Python? How about a WebAssembly runtime in Python? How about porting BurntSushi's absurdly great Rust optimized string search routines to C and making them faster?
And these are mostly just casual experiments, often run from my phone!
> How about porting BurntSushi's absurdly great Rust optimized string search routines to C and making them faster?
How did it do? :-)
I have tried to give it extreme problems like creating slime mold pathing algorithm and creating completely new shoe-lacing patterns and it starts struggling with problems which use visual reasoning and have very little consensus on how to solve them.
I'm not super surprised that these examples worked well. They are complex and a ton of work, but the problems are relatively well defined with tons of documentation online. Sounds ideal for an LLM no?
One of my first tests with it was "Write a Python 3 interpreter in JavaScript."
It produced tests, then wrote the interpreter, then ran the tests and worked until all of them passed. I was genuinely surprised that it worked.
Insanely difficult to you maybe because you stopped learning. What you cannot create you don't understand.
On the other hand when I tried it just yesterday, I couldn't really see a difference. As I wrote elsewhere: same crippled context window, same "I'll read 10 irrelevant lines from a file", same random changes etc.
Meanwhile half a year to a year ago I could already point whatever model was du jour at the time at pychromecast and tell it repeatedly "just convert the rest of functionality to Swift" and it did it. No idea about the quality of code, but it worked alongside with implementations for mDNS, and SwiftUI, see gif/video here: https://mastodon.nu/@dmitriid/114753811880082271 (doesn't include chromecast info in the video).
I think agents have become better, but models likely almost entirely plateaued.
>A JavaScript interpreter written in Python?
I'm assuming this refers to the python port of Bellard's MQJS [1]? It's impressive and very useful, but leaving out the "based on mqjs" part is misleading.
[1] https://github.com/simonw/micro-javascript?