logoalt Hacker News

PeterStuertoday at 1:51 PM1 replyview on HN

Switched to Fable 5 this morning, and after half a day I already don't want to go back to Opus.

Decided the best way to test this was to throw it a really meaty bone: a bug in lifecycle management of Chrome processes on Windows 10. Within the code-base I had developed workarounds over time with Sonnet and Opus, and while those reliably mitigated the problems, it always felt like a clutch and had some performance overhead as well as isolation requirements I would rather not have to take forward.

In comes Fable. Rather than examining the code base, and test a few fixes, Fable sets up an entire testing laboratory inclusive its own controllable webserver, fully instrumented to observe both Python as well as the whole OS kernel process environment, develops a suit of error reproduction tests, confirms the problem and the circumstances under which they reproduce, deep dives into the sources of project dependencies to look for the root cause(s), identifies these and confirms those hypothesis with further experiments. Looks for potential fixes in the later releases of the project where the bug originates, confirms this is not fixed, explores the documentation of said project to find other usage patters, expands its test suit to investigate these alternatives, confirms by crosschecking the source and running further tests that these alternatives do not fully solve the root problem, does a comparative experimental analysis of 3 different styles for using the project, checks the stated roadmap and developer activity in the commit history, recommends a switch to a different pattern that still requires a few of the process management workarounds (I told it not to patch external component), but that significantly simplifies the code-base ...

This is going to be a good 2 weeks, but what happens after? I can't afford this on a per token basis for my own projects.

P.S. An yes, midway the final implementation stretch I got the "Fable 5's safety measures flagged this message for cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Switched to Opus 4.8. Send feedback with /feedback or learn more"

Opus managed to finish the implementation, but they need to work on that false positive rate.


Replies

techblueberrytoday at 2:00 PM

> This is going to be a good 2 weeks, but what happens after? I can't afford this on a per token basis for my own projects.

It’s interesting these companies have trained us to think that disruptive intelligence should be affordable to laypersons.

What will happen after two weeks is that people and companies with means who can afford it will get it, and folks without means won’t.