Hi there, HN! We’re Jai and Sanket from DeepSource (YC W20), and today we’re launching Autofix Bot, a hybrid static analysis + AI agent purpose-built for in-the-loop use with AI coding agents.
AI coding agents have made code generation nearly free, and they’ve shifted the bottleneck to code review. Static-only analysis with a fixed set of checkers isn’t enough. LLM-only review has several limitations: non-deterministic across runs, low recall on security issues, expensive at scale, and a tendency to get ‘distracted’.
We spent the last 6 years building a deterministic, static-analysis-only code review product. Earlier this year, we started thinking about this problem from the ground up and realized that static analysis solves key blind spots of LLM-only reviews. Over the past six months, we built a new ‘hybrid’ agent loop that uses static analysis and frontier AI agents together to outperform both static-only and LLM-only tools in finding and fixing code quality and security issues. Today, we’re opening it up publicly.
Here’s how the hybrid architecture works:
- Static pass: 5,000+ deterministic checkers (code quality, security, performance) establish a high-precision baseline. A sub-agent suppresses context-specific false positives.
- AI review: The agent reviews code with static findings as anchors. Has access to AST, data-flow graphs, control-flow, import graphs as tools, not just grep and usual shell commands.
- Remediation: Sub-agents generate fixes. Static harness validates all edits before emitting a clean git patch.
Static solves key LLM problems: non-determinism across runs, low recall on security issues (LLMs get distracted by style), and cost (static narrowing reduces prompt size and tool calls).
On the OpenSSF CVE Benchmark [1] (200+ real JS/TS vulnerabilities), we hit 81.2% accuracy and 80.0% F1; vs Cursor Bugbot (74.5% accuracy, 77.42% F1), Claude Code (71.5% accuracy, 62.99% F1), CodeRabbit (59.4% accuracy, 36.19% F1), and Semgrep CE (56.9% accuracy, 38.26% F1). On secrets detection, 92.8% F1; vs Gitleaks (75.6%), detect-secrets (64.1%), and TruffleHog (41.2%). We use our open-source classification model for this. [2]
Full methodology and how we evaluated each tool: https://autofix.bot/benchmarks
You can use Autofix Bot interactively on any repository using our TUI, as a plugin in Claude Code, or with our MCP on any compatible AI client (like OpenAI Codex).[3] We’re specifically building for AI coding agent-first workflows, so you can ask your agent to run Autofix Bot on every checkpoint autonomously.
Give us a shot today: https://autofix.bot. We’d love to hear any feedback!
---
[1] https://github.com/ossf-cve-benchmark/ossf-cve-benchmark
Congratulations!! Anchoring is important. What about other parts of the code review like coding guidelines, perf issues etc?
How does this compare to gemini-code-assist? Rn its one of the best imo
What is the difference between this and let's say Claude Code using something like semgrep as a tool?
Also I don't think this tool should be in the developer flow as in my experience it is unlikely to run it on the regular. It should be something that is done as part of the QA process before PR acceptance.
I hope this helps and good luck.
"shifted bottleneck to code review"... understatement of decade.
$8/100k tokens strikes me as potentially a TON if the idea is that we're going to be running this as part of the iterative local development cycle (or god forbid letting agents run it whenever they decide). As you mentioned, one of the issues with AI generated code is often that it writes too much and needs direction on shrinking down.
I could easily see hitting 10k+ LOC on routine tickets if this is being run on each checkpoint. I have some tickets that require moving some files around, am I being charged on LOC for those files? Deleted files? Newly created test files that have 1k+ lines?