Interesting things Dirac does:
1. Uses an optimized version of Hash-Anchored edits for file editing (https://dirac.run/posts/hash-anchors-myers-diff-single-token)
2. Utilizes language's AST to decide what to fetch into context, entirely avoids large code file reads
3. Batches all operations. Does large number of reads/edits simultaneously (you can see a video demo for deepseek-v4-flash here https://www.reddit.com/r/LocalLLaMA/comments/1suhdki/tested_...)
4. Allows the model to execute code to analyze things on the fly, so the model can simply write bash/python/perl script to accomplish things where appropriate
5. A lot of context curation and opportunistic context updates, i.e. put into context anything that you are certain model would ask next
Is there a complete list of the tools somewhere? I'm interested in how you chose to expose the AST specifically. In my own harness attempts I wanted to keep the number of tools absolutely minimal and briefly experimented with including an AST lib to use via an execute_python tool (plus some examples in the system prompt). Results were mixed though, with most models preferring ripgrep.
> Utilizes language's AST to decide what to fetch into context,
Does that mean that it's only going to work with certain langauges for which it has parsers available?
Anchor based editing requires injecting new anchors to the context, and dirac does so via a diff. So how is this more efficient (token-wise) than search and replace?? Even at a single token per hash. Also, code is read more than written so these just add up. I experimented once with stable anchors, albeit longer than a single token, and found it a downgrade.
My conclusion is that the efficiency dirac sees comes mainly from showing file skeleton by default
> Batches all operations. Does large number of reads/edits simultaneously...
I wasn't sure what this meant, so I looked at the source. It seems to be referring to tool APIs being designed around taking multiple targets as a list parameter, instead of hoping the model makes appropriately parallel tool calls. (This matches my experience btw, models are reluctant to make a large number of parallel calls simultaneously, and this seems more pronounced with weaker models.)
It would be really cool to do a causality investigation to determine which one of these boosts it so much / quantify how much each matters. Who knows, they may all interact in a sum-is-greater-than-parts way that only improves the score when shipped altogether.
Did you consider incorporating ast-grep or gritql?
Congratulations, great work.
[flagged]
I always wondered why AST's were not more of a part in both editing and scoping of changes/parsing code. I thought I read an article where they said 'grep' was just as effective. It kinda made sense for the case they were talking about.